mirror of
https://github.com/quickwit-oss/tantivy.git
synced 2026-05-27 13:40:49 +00:00
seek_exact + cost based intersection (#2538)
* seek_exact + cost based intersection Adds `seek_exact` and `cost` to `DocSet` for a more efficient intersection. Unlike `seek`, `seek_exact` does not require the DocSet to advance to the next hit, if the target does not exist. `cost` allows to address the different DocSet types and their cost model and is used to determine the DocSet that drives the intersection. E.g. fast field range queries may do a full scan. Phrase queries load the positions to check if a we have a hit. They both have a higher cost than their size_hint would suggest. Improves `size_hint` estimation for intersection and union, by having a estimation based on random distribution with a co-location factor. Refactor range query benchmark. Closes #2531 *Future Work* Implement `seek_exact` for BufferedUnionScorer and RangeDocSet (fast field range queries) Evaluate replacing `seek` with `seek_exact` to reduce code complexity * Apply suggestions from code review Co-authored-by: Paul Masurel <paul@quickwit.io> * add API contract verfication * impl seek_exact on union * rename seek_exact * add mixed AND OR test, fix buffered_union * Add a proptest of BooleanQuery. (#2690) * fix build * Increase the document count. * fix merge conflict * fix debug assert * Fix compilation errors after rebase - Remove duplicate proptest_boolean_query module - Remove duplicate cost() method implementations - Fix TopDocs API usage (add .order_by_score()) - Remove duplicate imports - Remove unused variable assignments --------- Co-authored-by: Paul Masurel <paul@quickwit.io> Co-authored-by: Pascal Seitz <pascal.seitz@datadoghq.com> Co-authored-by: Stu Hood <stuhood@gmail.com>
This commit is contained in:
@@ -483,7 +483,7 @@ mod tests {
|
||||
let checkpoints_for_each_pruning =
|
||||
compute_checkpoints_for_each_pruning(term_scorers.clone(), top_k);
|
||||
let checkpoints_manual =
|
||||
compute_checkpoints_manual(term_scorers.clone(), top_k, 100_000);
|
||||
compute_checkpoints_manual(term_scorers.clone(), top_k, max_doc as u32);
|
||||
assert_eq!(checkpoints_for_each_pruning.len(), checkpoints_manual.len());
|
||||
for (&(left_doc, left_score), &(right_doc, right_score)) in checkpoints_for_each_pruning
|
||||
.iter()
|
||||
|
||||
@@ -817,7 +817,6 @@ mod proptest_boolean_query {
|
||||
fn proptest_boolean_query() {
|
||||
// In the presence of optimizations around buffering, it can take large numbers of
|
||||
// documents to uncover some issues.
|
||||
let num_docs = 10000;
|
||||
let num_fields = 8;
|
||||
let num_docs = 1 << num_fields;
|
||||
let (index, fields, range_field) =
|
||||
|
||||
Reference in New Issue
Block a user