Commit Graph

108 Commits

Author SHA1 Message Date
Cameron
89f0cef807 Fix O(2^n) query parser regression for deeply-nested queries (#2905)
* Fix O(2^n) query parser regression for deeply-nested queries

The top-level `ast()` parser used `alt((boolean_expr, single_leaf))` at
every group level. When the group contained a single leaf with no
trailing operand, `boolean_expr` would parse `occur_leaf` (recursing
into the inner group), fail at `multispace1`, backtrack, and then
`single_leaf` would re-parse `occur_leaf` from scratch. Every nesting
level doubled the work, giving O(2^n) time for queries like
`(((((title:test)))))`.

Parse `occur_leaf` once and peek ahead for a trailing operand instead
of backtracking. This keeps parsing O(n) and also avoids the duplicate
parse for simple single-leaf queries.

Fixes #2498.

Measured on the issue reproducer (release build):

    depth   before     after
       20   0.87 s   <1 us
       25  28.23 s   <1 us
       60  (years)   ~5 us

Non-pathological queries are unaffected or slightly faster:

    query                     before     after
    hello                     650 ns     308 ns
    a AND b AND c            1380 ns    1364 ns
    title:rust AND (...)     3426 ns    3460 ns

All 53 existing grammar tests and 56 query_parser tests pass. Adds a
regression test at depth 60 that would not complete under the old
parser.

* Add ignored benchmark for nested query parsing at depth 20/21

Matches the depths from issue #2498 which reported 0.87 s / 1.72 s
under the regression. With the fix these parse in single-digit
microseconds. Runs via:

  cargo test -p tantivy-query-grammar --release bench_deeply_nested \
      -- --ignored --nocapture

* Propagate Err::Failure and Err::Incomplete from operand parser

`alt((boolean_expr, single_leaf))` only retried on `Err::Error` and
propagated `Err::Failure` and `Err::Incomplete`. The replacement was
catching all three with `Err(_)`, which would silently fall back to
a single leaf if any cut point were ever added to `operand_leaf` or
its descendants. Match specifically on `Err::Error` to preserve the
original `alt` semantics.

* Replace inline bench with binggan bench in benches/

Move the nested-query benchmark out of the query-grammar test module
and into a proper binggan benchmark at benches/query_parser_nested.rs,
registered as a harnessless bench in Cargo.toml. Keeps the correctness
regression test (depth 60) in place.

Run with: cargo bench --bench query_parser_nested

* Fix rustfmt import ordering in query_parser_nested bench
2026-04-24 03:54:00 -04:00
Pascal Seitz
5c344db1bf chore: Release 2026-03-31 17:15:34 +08:00
Darkheir
1fd30c62be fix(query-grammar): Fix regexes between parentheses
Signed-off-by: Darkheir <raphael.cohen@sekoia.io>
2026-01-28 10:37:51 +01:00
Evance Soumaoro
765c448945 uncomment commented code when testing 2026-01-27 13:19:41 +00:00
Evance Soumaoro
943594ebaa uncomment commented code when testing 2026-01-27 13:08:38 +00:00
Evance Soumaoro
df17daae0d fix closing parenthesis error on elastic range queries for lenient parser 2026-01-27 13:01:14 +00:00
Raphaël Cohen
f7f4b354d6 fix: Handle phrase prefixed with star (#2751)
Signed-off-by: Darkheir <raphael.cohen@sekoia.io>
2025-12-01 11:43:25 +01:00
PSeitz-dd
70da310b2d perf: deduplicate queries (#2698)
* deduplicate queries

Deduplicate queries in the UserInputAst after parsing queries

* add return type
2025-09-22 12:16:58 +02:00
PSeitz
85010b589a clippy (#2700)
* clippy

* clippy

* clippy

* clippy + fmt

---------

Co-authored-by: Pascal Seitz <pascal.seitz@datadoghq.com>
2025-09-19 18:04:25 +02:00
PSeitz-dd
2340dca628 fix compiler warnings (#2699)
* fix compiler warnings

* fix import
2025-09-19 15:55:04 +02:00
Raphaël Cohen
f4b374110f feat: Regex query grammar (#2677)
* feat: Regex query grammar

* feat: Disable regexes by default

* chore: Apply formatting
2025-09-03 10:07:04 +02:00
PSeitz
33794a114c chore: Release (#2686)
Co-authored-by: Pascal Seitz <pascal.seitz@datadoghq.com>
2025-08-20 18:29:37 +08:00
Darkheir
610091e2c4 feat: Applies PR review suggestion 2025-08-04 10:12:51 +02:00
Darkheir
d4b090124c feat: Support spaces between field name and value 2025-07-23 11:12:13 +02:00
PSeitz
4a6123d3ff release tantivy: bump versions (#2625)
* chore: Release

* chore: Release

---------

Co-authored-by: Pascal Seitz <pascal.seitz@datadoghq.com>
2025-06-10 15:34:39 +02:00
PSeitz
5379c99ea2 update edition to 2024 (#2620)
* update common to edition 2024

* update bitpacker to edition 2024

* update stacker to edition 2024

* update query-grammar to edition 2024

* update sstable to edition 2024 + fmt

* fmt

* update columnar to edition 2024

* cargo fmt

* use None instead of _
2025-04-18 04:56:31 +02:00
Pascal Seitz
6ab4102253 fix tantivy-query-grammar version 2025-04-09 14:35:23 +08:00
Kat Lim Ruiz
ffa7cdf397 agreed with Remi, about the final json structure, having "type" tag and using "clauses" is more accurate 2025-04-03 08:35:16 -05:00
Remi Dettai
fb12b7be28 Tag UserInputAst 2025-04-03 10:07:34 +02:00
Kat Lim Ruiz
6f77083493 create more complex unit test 2025-04-02 18:06:20 -05:00
Kat Lim Ruiz
cd7745da7a set Leaf untagged, leave clause and boost the same (with own property) 2025-04-02 17:52:18 -05:00
Kat Lim Ruiz
eb8304dee9 remove untitled file 2025-04-02 08:47:58 -05:00
Kat Lim Ruiz
e5638112a9 all json should be snake_case 2025-04-02 08:45:33 -05:00
Kat Lim Ruiz
81110152fb add unit test for unbounded 2025-04-01 18:08:04 -05:00
Kat Lim Ruiz
ae88a7ece5 add tag type and content value to UserInputBound 2025-04-01 18:06:40 -05:00
Kat Lim Ruiz
bdd5f80fd9 add clause unit test 2025-04-01 18:04:19 -05:00
Kat Lim Ruiz
3f62ef22e5 set tag=type only for Leaf 2025-04-01 17:52:36 -05:00
Kat Lim Ruiz
8102e19e48 set Error as serializable because is part of the possible outcomes (however, I think using this empty Error struct is not a good pattern) 2025-04-01 17:43:24 -05:00
Kat Lim Ruiz
175c853ea7 add serialization test for LenientError 2025-04-01 17:38:23 -05:00
Kat Lim Ruiz
c992cf3f37 Revert "set all enum to be snake_case when serializing"
This reverts commit 83f6c2f265.
2025-04-01 17:27:28 -05:00
Kat Lim Ruiz
83f6c2f265 set all enum to be snake_case when serializing 2025-04-01 17:13:04 -05:00
Kat Lim Ruiz
da2ff5712a fix fmt nightly 2025-03-31 08:21:54 -05:00
Kat Lim Ruiz
18da402e27 cargo fmt 2025-03-30 22:10:38 -05:00
Kat Lim Ruiz
0a37b7acaa update to latest serde and serde_json (and follow the pattern to use patch versions) 2025-03-30 11:35:58 -05:00
Kat Lim Ruiz
1a9fd885dd allow LenientError to be serializable too 2025-03-30 11:26:20 -05:00
Kat Lim Ruiz
3e660905a7 unit test parse_query_lenient 2025-03-30 11:22:22 -05:00
Kat Lim Ruiz
0c2b984cb4 add tests 2025-03-30 11:12:15 -05:00
Kat Lim Ruiz
a69b1c609c add error to be debuggable 2025-03-30 11:12:12 -05:00
Kat Lim Ruiz
8d4a6fcaba deserialize is not needed 2025-03-30 11:11:55 -05:00
Kat Lim Ruiz
0149317c5a set 0.23 2025-03-30 10:55:48 -05:00
Kat Lim Ruiz
3fcb6f9597 add unit tests 2025-03-30 10:41:43 -05:00
Kat Lim Ruiz
388fcd763b add serde, and allow UserInputAst to be json serialized/deserialized 2025-03-30 10:36:43 -05:00
trinity Pointard
5cea16ef9f improve handling of spcial char after exist query 2025-01-22 16:04:31 +01:00
trinity Pointard
4d4ee1b0ac allow term starting with wildcard in query parser 2025-01-15 10:27:48 +01:00
PSeitz
876a579e5d queryparser: add field respecification test (#2550) 2024-12-02 14:17:12 +01:00
PSeitz
21d057059e clippy (#2527)
* clippy

* clippy

* clippy

* clippy

* convert allow to expect and remove unused

* cargo fmt

* cleanup

* export sample

* clippy
2024-10-22 09:26:54 +08:00
Bruce Mitchener
c17e513377 Reduce typo count. (#2510) 2024-10-10 09:55:37 +08:00
trinity-1686a
85395d942a fix clippy lints from 1.80-1.81 (#2488)
* fix some clippy lints

* fix clippy::doc_lazy_continuation

* fix some lints for 1.82
2024-09-05 14:33:05 +02:00
PSeitz
27be6aed91 lift clauses in LogicalAst (#2449)
(a OR b) OR (c OR d) can be simplified to (a OR b OR c OR d)
(a AND b) AND (c AND d) can be simplified to (a AND b AND c AND d)

This directly affects how queries are executed

remove unused SumWithCoordsCombiner
the number of fields is unused and private
2024-08-14 19:21:26 +02:00
trinity-1686a
08b9fc0b31 fix de-escaping too much in query parser (#2427)
* fix de-escaping too much in query parser
2024-06-10 11:19:01 +02:00