Commit Graph

30 Commits

Author SHA1 Message Date
Paul Masurel
3f448ecf79 Bugfix on intersection. (#2812)
The intersection algorithm made it possible for .seek(..) with values
lower than the current doc id, breaking the DocSet contract.

The fix removes the optimization that caused left.seek(..) to be replaced
by a simpler left.advance(..).

Simply doing so lead to a performance regression.
I therefore integrated that idea within SegmentPostings.seek.

We now attempt to check the next doc systematically on seek,
PROVIDED the block is already loaded.

Closes #2811

Co-authored-by: Paul Masurel <paul.masurel@datadoghq.com>
2026-01-27 09:21:09 +01:00
Paul Masurel
b86caeefe2 Major bugfix in intersection
A bug was added with the `seek_into_the_danger_zone()` optimization

(Spotted and fixed by Stu)

The contract says seek_into_the_danger_zone returns true if do is part of the docset.

The blanket implementation goes like this.

```
let current_doc = self.doc();
if current_doc < target {
     self.seek(target);
}
self.doc() == target
```

So it will return true if target is TERMINATED, where really TERMINATED does not belong to the docset.


The fix tries to clarify the contracts and fixes the intersection algorithm.
We observe a small but all over the board improvement in intersection performance.

---------

Co-authored-by: Stu Hood <stuhood@gmail.com>
Co-authored-by: Paul Masurel <paul.masurel@datadoghq.com>
2026-01-23 18:44:10 +01:00
PSeitz
735c588f4f fix union performance regression (#2790)
* add inlines

* fix union performance regression

Remove unwrap from hotpath generates better assembly.

closes #2788
2026-01-02 12:06:51 +01:00
PSeitz
923f0508f2 seek_exact + cost based intersection (#2538)
* seek_exact + cost based intersection

Adds `seek_exact` and `cost` to `DocSet` for a more efficient intersection.
Unlike `seek`, `seek_exact` does not require the DocSet to advance to the next hit, if the target does not exist.

`cost` allows to address the different DocSet types and their cost
model and is used to determine the DocSet that drives the intersection.
E.g. fast field range queries may do a full scan. Phrase queries load the positions to check if a we have a hit.
They both have a higher cost than their size_hint would suggest.

Improves `size_hint` estimation for intersection and union, by having a
estimation based on random distribution with a co-location factor.

Refactor range query benchmark.

Closes #2531

*Future Work*

Implement `seek_exact` for BufferedUnionScorer and RangeDocSet (fast field range queries)
Evaluate replacing `seek` with `seek_exact` to reduce code complexity

* Apply suggestions from code review

Co-authored-by: Paul Masurel <paul@quickwit.io>

* add API contract verfication

* impl seek_exact on union

* rename seek_exact

* add mixed AND OR test, fix buffered_union

* Add a proptest of BooleanQuery. (#2690)

* fix build

* Increase the document count.

* fix merge conflict

* fix debug assert

* Fix compilation errors after rebase

- Remove duplicate proptest_boolean_query module
- Remove duplicate cost() method implementations
- Fix TopDocs API usage (add .order_by_score())
- Remove duplicate imports
- Remove unused variable assignments

---------

Co-authored-by: Paul Masurel <paul@quickwit.io>
Co-authored-by: Pascal Seitz <pascal.seitz@datadoghq.com>
Co-authored-by: Stu Hood <stuhood@gmail.com>
2025-12-30 14:43:25 +01:00
PSeitz
33835b6a01 Add DocSet::cost() (#2707)
* query: add DocSet cost hint and use it for intersection ordering

- Add DocSet::cost()
- Use cost() instead of size_hint() to order scorers in intersect_scorers

This isolates cost-related changes without the new seek APIs from
PR #2538

* add comments

---------

Co-authored-by: Pascal Seitz <pascal.seitz@datadoghq.com>
2025-10-13 16:25:49 +02:00
Bruce Mitchener
cb252a42af docs: "associated to" -> "associated with" (#1557)
This reads better this way.
2022-09-26 20:23:37 +09:00
Paul Masurel
eca6628b3c Minor refactoring (#1266) 2022-01-28 15:55:55 +09:00
Paul Masurel
2481c87be8 Block wand (#856) 2020-08-19 22:36:36 +09:00
Paul Masurel
68fe406924 Removed asserts (#850) 2020-07-16 12:24:55 +09:00
Paul Masurel
f71b04acb0 Bugfix. (#849)
go_to_first_doc was typically calling seek with a target smaller than
doc.

Since SegmentPostings typically do a linear search on the full block,
regardless of the current position, it could have our segment postings
go backward.
2020-07-16 10:57:51 +09:00
Paul Masurel
baf015fc57 Simplification of the segment postings seek implementation. (#834) 2020-05-27 08:49:47 +09:00
Paul Masurel
e25284bafe Major change in the DocSet/Scorer API (#824)
- Change in the DocSet and Scorer API. (@fulmicoton). 
A freshly created DocSet point directly to their first doc. A sentinel value called TERMINATED marks the end of a DocSet.
`.advance()` returns the new DocId. `Scorer::skip(target)` has been replaced by `Scorer::seek(target)` and returns the resulting DocId.
As a result, iterating through DocSet now looks as follows
```rust
let mut doc = docset.doc();
while doc != TERMINATED {
   // ...
   doc = docset.advance();
}
```
The change made it possible to greatly simplify a lot of the docset's code.
- Misc internal optimization and introduction of the `Scorer::for_each_pruning` function. (@fulmicoton)
2020-05-16 16:33:36 +09:00
fdb-hiroshima
1a817f117f fix documentation error (#654)
Union missdocumented as doing an intersection
Union and Intersection can hold more than 2 DocSets
2019-09-11 17:12:08 +09:00
Paul Masurel
498057c5b7 Refactor deletes (#597)
* Refactor deletes

* Removing generation from SegmentUpdater. These have been obsolete for a long time

* Number literal clippy

* Removed clippy useless allow statement
2019-07-17 13:06:44 +09:00
Paul Masurel
462774b15c Tiqb feature/2018 (#583)
* rust 2018

* Added CHANGELOG comment
2019-07-01 10:01:46 +09:00
Paul Masurel
a8cc5208f1 Linear simd (#519)
* linear simd search within block
2019-03-20 22:10:05 +09:00
Paul Masurel
4c93b096eb Rustfmt 2019-01-29 11:45:30 +01:00
Paul Masurel
6a547b0b5f Issue/483 (#484)
* Downcast_ref

* fixing unit test
2019-01-28 11:43:42 +01:00
Paul Masurel
1fd46c1e9b Clippy 2019-01-28 03:46:23 +01:00
Paul Masurel
06e7bd18e7 Clippy (#421)
* Cargo Format

* Clippy

* bugfix

* still clippy stuff

* clippy step 2
2018-09-15 14:56:14 +09:00
Paul Masurel
8ccbfdea5d Preparing for release 2018-06-22 14:27:46 +09:00
Paul Masurel
78673172d0 Cargo fmt 2018-04-21 20:05:36 +09:00
Paul Masurel
e44782bf14 No more 2018-04-12 13:01:11 +09:00
Paul Masurel
8006f1df11 Added comments 2018-03-28 08:28:49 +09:00
Paul Masurel
ffa03bad71 TermScorer does not handle deletes 2018-03-27 17:35:20 +09:00
Paul Masurel
4d65771e04 field norm reader is not an option anymore. 2018-03-26 13:25:29 +09:00
Paul Masurel
2c20759829 removed unsafecell for position computer 2018-02-24 12:07:55 +09:00
Paul Masurel
be830b03c5 Bugfix in intersection.advance and impl skip_next 2018-02-23 11:55:23 +09:00
Paul Masurel
1b94a3e382 Phrase query optimisation 2018-02-23 00:00:22 +09:00
Paul Masurel
2f242d5f52 Moving docset around 2018-02-19 12:07:05 +09:00