mirror of
https://github.com/quickwit-oss/tantivy.git
synced 2026-05-14 15:20:43 +00:00
* Fix O(2^n) query parser regression for deeply-nested queries The top-level `ast()` parser used `alt((boolean_expr, single_leaf))` at every group level. When the group contained a single leaf with no trailing operand, `boolean_expr` would parse `occur_leaf` (recursing into the inner group), fail at `multispace1`, backtrack, and then `single_leaf` would re-parse `occur_leaf` from scratch. Every nesting level doubled the work, giving O(2^n) time for queries like `(((((title:test)))))`. Parse `occur_leaf` once and peek ahead for a trailing operand instead of backtracking. This keeps parsing O(n) and also avoids the duplicate parse for simple single-leaf queries. Fixes #2498. Measured on the issue reproducer (release build): depth before after 20 0.87 s <1 us 25 28.23 s <1 us 60 (years) ~5 us Non-pathological queries are unaffected or slightly faster: query before after hello 650 ns 308 ns a AND b AND c 1380 ns 1364 ns title:rust AND (...) 3426 ns 3460 ns All 53 existing grammar tests and 56 query_parser tests pass. Adds a regression test at depth 60 that would not complete under the old parser. * Add ignored benchmark for nested query parsing at depth 20/21 Matches the depths from issue #2498 which reported 0.87 s / 1.72 s under the regression. With the fix these parse in single-digit microseconds. Runs via: cargo test -p tantivy-query-grammar --release bench_deeply_nested \ -- --ignored --nocapture * Propagate Err::Failure and Err::Incomplete from operand parser `alt((boolean_expr, single_leaf))` only retried on `Err::Error` and propagated `Err::Failure` and `Err::Incomplete`. The replacement was catching all three with `Err(_)`, which would silently fall back to a single leaf if any cut point were ever added to `operand_leaf` or its descendants. Match specifically on `Err::Error` to preserve the original `alt` semantics. * Replace inline bench with binggan bench in benches/ Move the nested-query benchmark out of the query-grammar test module and into a proper binggan benchmark at benches/query_parser_nested.rs, registered as a harnessless bench in Cargo.toml. Keeps the correctness regression test (depth 60) in place. Run with: cargo bench --bench query_parser_nested * Fix rustfmt import ordering in query_parser_nested bench
36 lines
1.2 KiB
Rust
36 lines
1.2 KiB
Rust
// Benchmark for the query grammar parsing deeply nested queries.
|
|
//
|
|
// Regression guard for https://github.com/quickwit-oss/tantivy/issues/2498:
|
|
// at depth 20/21 the old parser took 0.87 s / 1.72 s respectively because
|
|
// `ast()` retried `occur_leaf` on backtrack, giving O(2^n) time. With the
|
|
// fix parsing is linear and completes in microseconds.
|
|
//
|
|
// Run with: `cargo bench --bench query_parser_nested`.
|
|
|
|
use binggan::{black_box, BenchRunner};
|
|
use tantivy::query_grammar::parse_query;
|
|
|
|
fn nested_query(depth: usize, leading_plus: bool) -> String {
|
|
let leading = "(".repeat(depth);
|
|
let trailing = ")".repeat(depth);
|
|
let prefix = if leading_plus { "+" } else { "" };
|
|
format!("{prefix}{leading}title:test{trailing}")
|
|
}
|
|
|
|
fn main() {
|
|
let mut runner = BenchRunner::new();
|
|
|
|
for depth in [20, 21] {
|
|
for leading_plus in [false, true] {
|
|
let query = nested_query(depth, leading_plus);
|
|
let label = format!(
|
|
"parse_nested_depth_{depth}_{}",
|
|
if leading_plus { "plus" } else { "plain" },
|
|
);
|
|
runner.bench_function(&label, move |_| {
|
|
black_box(parse_query(black_box(&query)).unwrap());
|
|
});
|
|
}
|
|
}
|
|
}
|