tantivy

mirror of https://github.com/quickwit-oss/tantivy.git synced 2026-05-14 15:20:43 +00:00

Author	SHA1	Message	Date
Cameron	89f0cef807	Fix O(2^n) query parser regression for deeply-nested queries (#2905 ) * Fix O(2^n) query parser regression for deeply-nested queries The top-level `ast()` parser used `alt((boolean_expr, single_leaf))` at every group level. When the group contained a single leaf with no trailing operand, `boolean_expr` would parse `occur_leaf` (recursing into the inner group), fail at `multispace1`, backtrack, and then `single_leaf` would re-parse `occur_leaf` from scratch. Every nesting level doubled the work, giving O(2^n) time for queries like `(((((title:test)))))`. Parse `occur_leaf` once and peek ahead for a trailing operand instead of backtracking. This keeps parsing O(n) and also avoids the duplicate parse for simple single-leaf queries. Fixes #2498. Measured on the issue reproducer (release build): depth before after 20 0.87 s <1 us 25 28.23 s <1 us 60 (years) ~5 us Non-pathological queries are unaffected or slightly faster: query before after hello 650 ns 308 ns a AND b AND c 1380 ns 1364 ns title:rust AND (...) 3426 ns 3460 ns All 53 existing grammar tests and 56 query_parser tests pass. Adds a regression test at depth 60 that would not complete under the old parser. * Add ignored benchmark for nested query parsing at depth 20/21 Matches the depths from issue #2498 which reported 0.87 s / 1.72 s under the regression. With the fix these parse in single-digit microseconds. Runs via: cargo test -p tantivy-query-grammar --release bench_deeply_nested \ -- --ignored --nocapture * Propagate Err::Failure and Err::Incomplete from operand parser `alt((boolean_expr, single_leaf))` only retried on `Err::Error` and propagated `Err::Failure` and `Err::Incomplete`. The replacement was catching all three with `Err(_)`, which would silently fall back to a single leaf if any cut point were ever added to `operand_leaf` or its descendants. Match specifically on `Err::Error` to preserve the original `alt` semantics. * Replace inline bench with binggan bench in benches/ Move the nested-query benchmark out of the query-grammar test module and into a proper binggan benchmark at benches/query_parser_nested.rs, registered as a harnessless bench in Cargo.toml. Keeps the correctness regression test (depth 60) in place. Run with: cargo bench --bench query_parser_nested * Fix rustfmt import ordering in query_parser_nested bench	2026-04-24 03:54:00 -04:00
Darkheir	1fd30c62be	fix(query-grammar): Fix regexes between parentheses Signed-off-by: Darkheir <raphael.cohen@sekoia.io>	2026-01-28 10:37:51 +01:00
Evance Soumaoro	765c448945	uncomment commented code when testing	2026-01-27 13:19:41 +00:00
Evance Soumaoro	943594ebaa	uncomment commented code when testing	2026-01-27 13:08:38 +00:00
Evance Soumaoro	df17daae0d	fix closing parenthesis error on elastic range queries for lenient parser	2026-01-27 13:01:14 +00:00
Raphaël Cohen	f7f4b354d6	fix: Handle phrase prefixed with star (#2751 ) Signed-off-by: Darkheir <raphael.cohen@sekoia.io>	2025-12-01 11:43:25 +01:00
PSeitz-dd	70da310b2d	perf: deduplicate queries (#2698 ) * deduplicate queries Deduplicate queries in the UserInputAst after parsing queries * add return type	2025-09-22 12:16:58 +02:00
PSeitz	85010b589a	clippy (#2700 ) * clippy * clippy * clippy * clippy + fmt --------- Co-authored-by: Pascal Seitz <pascal.seitz@datadoghq.com>	2025-09-19 18:04:25 +02:00
PSeitz-dd	2340dca628	fix compiler warnings (#2699 ) * fix compiler warnings * fix import	2025-09-19 15:55:04 +02:00
Raphaël Cohen	f4b374110f	feat: Regex query grammar (#2677 ) * feat: Regex query grammar * feat: Disable regexes by default * chore: Apply formatting	2025-09-03 10:07:04 +02:00
Darkheir	610091e2c4	feat: Applies PR review suggestion	2025-08-04 10:12:51 +02:00
Darkheir	d4b090124c	feat: Support spaces between field name and value	2025-07-23 11:12:13 +02:00
PSeitz	5379c99ea2	update edition to 2024 (#2620 ) * update common to edition 2024 * update bitpacker to edition 2024 * update stacker to edition 2024 * update query-grammar to edition 2024 * update sstable to edition 2024 + fmt * fmt * update columnar to edition 2024 * cargo fmt * use None instead of _	2025-04-18 04:56:31 +02:00
trinity Pointard	5cea16ef9f	improve handling of spcial char after exist query	2025-01-22 16:04:31 +01:00
trinity Pointard	4d4ee1b0ac	allow term starting with wildcard in query parser	2025-01-15 10:27:48 +01:00
PSeitz	876a579e5d	queryparser: add field respecification test (#2550 )	2024-12-02 14:17:12 +01:00
PSeitz	21d057059e	clippy (#2527 ) * clippy * clippy * clippy * clippy * convert allow to expect and remove unused * cargo fmt * cleanup * export sample * clippy	2024-10-22 09:26:54 +08:00
Bruce Mitchener	c17e513377	Reduce typo count. (#2510 )	2024-10-10 09:55:37 +08:00
trinity-1686a	08b9fc0b31	fix de-escaping too much in query parser (#2427 ) * fix de-escaping too much in query parser	2024-06-10 11:19:01 +02:00
trinity-1686a	455156f51c	improve query parser (#2416 ) * support escape sequence in more place and fix bug with singlequoted strings * add query parser test for range query on default field	2024-05-30 17:29:27 +02:00
trinity-1686a	d2955a3fd2	extend field grouping (#2333 ) * extend field grouping	2024-04-15 10:36:32 +02:00
trinity-1686a	f6b0cc1aab	allow some mixing of occur and bool in strict query parser (#2323 ) * allow some mixing of occur and bool in strict query parser * allow all mixing of binary and occur in strict parser	2024-03-07 15:17:48 +01:00
trinity-1686a	108f30ba23	allow newline where we allow space in query parser (#2302 ) fix regression from the new parser	2024-01-17 14:38:35 +01:00
trinity-1686a	1dda2bb537	handle * inside term in query parser (#2228 )	2023-10-27 08:57:02 +02:00
trinity-1686a	0241a05b90	add support for exists query syntax in query parser (#2170 ) * add support for exists query syntax in query parser * rustfmt * make Exists require a field	2023-09-19 11:10:39 +02:00
Adam Reichold	22c35b1e00	Fix explanation of boost queries seeking beyond query result. (#2142 ) * Make current nightly Clippy happy. * Fix explanation of boost queries seeking beyond query result.	2023-08-14 11:59:11 +09:00
trinity-1686a	b92082b748	implement lenient parser (#2129 ) * move query parser to nom * add suupport for term grouping * initial work on infallible parser * fmt * add tests and fix minor parsing bugs * address review comments * add support for lenient queries in tantivy * make lenient parser report errors * allow mixing occur and bool in query	2023-08-08 15:41:29 +02:00
Adam Reichold	b325d569ad	Expose phrase-prefix queries via the built-in query parser (#2044 ) * Expose phrase-prefix queries via the built-in query parser This proposes the less-than-imaginative syntax `field:"phrase ter"` to perform a phrase prefix query against `field` using `phrase` and `ter` as the terms. The aim of this is to make this type of query more discoverable and simplify manual testing. I did consider exposing the `max_expansions` parameter similar to how slop is handled, but I think that this is rather something that should be configured via the querser parser (similar to `set_field_boost` and `set_field_fuzzy`) as choosing it requires rather intimiate knowledge of the backing index. Prevent construction of zero or one term phrase-prefix queries via the query parser. * Add example using phrase-prefix search via surface API to improve feature discoverability.	2023-06-01 13:03:16 +02:00
Paul Masurel	62709b8094	Change in the query grammar. (#2050 ) * Change in the query grammar. Quotation mark can now be used for phrase queries. The delimiter is part of the `UserInputLeaf`. That information is meant to be used in Quickwit to solve #3364. This PR also adds support for quotation marks escaping in phrase queries. * Apply suggestions from code review	2023-05-19 12:07:10 +09:00
Denis Bazhenov	e248a4959f	Enforcing "NOT" and "-" queries consistency in UserInputAst (#1609 ) * Enforcing "NOT" and "-" queries consistency in UserInputAst * Mutable implementation if rewrite_ast_clause()	2023-05-13 00:27:48 +09:00
Yuri Astrakhan	74275b76a6	Inline format arguments where makes sense (#2038 ) Applied this command to the code, making it a bit shorter and slightly more readable. ``` cargo +nightly clippy --all-features --benches --tests --workspace --fix -- -A clippy::all -W clippy::uninlined_format_args cargo +nightly fmt --all ```	2023-05-10 18:03:59 +09:00
trinity-1686a	e758080465	add support for TermSetQuery in query parser (#1683 )	2022-11-17 16:49:49 +01:00
Pascal Seitz	f2e5135870	allow more characters in range query closes #1642	2022-10-21 18:05:15 +08:00
Bruce Mitchener	6a88ac3fe3	Documentation improvements. Fix some linking, some grammar, some typos, etc.	2022-09-18 18:05:37 +07:00
Kian-Meng Ang	625bcb4877	Fix typos and markdowns Found via these commands: codespell -L crate,ser,panting,beauti,hart,ue,atleast,childs,ond,pris,hel,mot markdownlint .md doc/src/.md --disable MD013 MD025 MD033 MD001 MD024 MD036 MD041 MD003	2022-08-13 18:25:47 +08:00
Kanji Yomoda	af84e74284	Replace deprecated std package's constants on floats and integers (#1420 )	2022-07-22 08:05:08 +09:00
Evance Soumaoro	a4be239d38	Updated DateTime to hold timestamp in microseconds, while making date field precision configurable (#1396 )	2022-07-12 10:04:28 +09:00
Antoine G	437cd350a2	Add support for phrase slop in query language (#1393 ) Closes #1390	2022-06-28 13:55:47 +09:00
Paul Masurel	ed26552296	Minor changes in query parsing for quickwit#1334. (#1356 ) Quickwit's still heavily relies on generating field names containing a '.' for nested object, yet allows for user defined field names to contain a dot. In order to reuse tantivy query parser, we will end up using quickwit field names directly into tantivy. Only '.' will be escaped. This PR makes minor changes in how tantivy query parser parses a field name and resolves it to a field. Some of the new edge case behavior is hacky. Closes #1355	2022-05-06 13:20:10 +09:00
Uwe Klotz	125707dbe0	Replace `chrono` with `time` (#1307 ) For date values `chrono` has been replaced with `time` - The `time` crate is re-exported as `tantivy::time` instead of `tantivy::chrono`. - The type alias `tantivy::DateTime` has been removed. - `Value::Date` wraps `time::PrimitiveDateTime` without time zone information. - Internally date/time values are stored as seconds since UNIX epoch in UTC. - Converting a `time::OffsetDateTime` to `Value::Date` implicitly converts the value into UTC. If this is not desired do the time zone conversion yourself and use `time::PrimitiveDateTime` directly instead. Closes #1304	2022-03-21 10:50:19 +09:00
Paul Masurel	eca6628b3c	Minor refactoring (#1266 )	2022-01-28 15:55:55 +09:00
François Massot	f4b2e71800	Handle field names with any characters with a known set of special (#1109 ) * Handle field names with any characters with a known set of special characters and an escape one * Update field name validation rule to check only if it has at least one character and does not start with `-` Closes #1087.	2021-07-05 22:31:36 +09:00
Paul Masurel	39dd8cfe24	Cargo clippy. Acronym should not be full uppercase apparently.	2021-04-26 11:49:18 +09:00
Rihards Krišlauks	f518012656	Test flexible bounds in date range queries	2021-04-17 19:30:09 +03:00
Rihards Krišlauks	12fb9a95cb	Clean up leftower debug comments	2021-04-17 18:52:44 +03:00
Rihards Krišlauks	1649f31258	Make time zone parsing more strict to match rfc3339	2021-04-17 17:57:46 +03:00
Rihards Krišlauks	7849736d80	Move all of the datetime parsing code into a single function For readability	2021-04-17 17:23:47 +03:00
Rihards Krišlauks	e58401be78	Implement date range support in the query parser Tests pass but needs cleanup	2021-04-13 23:32:22 +03:00
Paul Masurel	3a72b1cb98	Accept dash within field names. (#874 ) Accept dash in field names and enforce field names constraint at the creation of the schema. Closes #796	2020-09-01 13:38:52 +09:00
Paul Masurel	2481c87be8	Block wand (#856 )	2020-08-19 22:36:36 +09:00

1 2

55 Commits