Major bugfix in intersection

A bug was added with the `seek_into_the_danger_zone()` optimization

(Spotted and fixed by Stu)

The contract says seek_into_the_danger_zone returns true if do is part of the docset.

The blanket implementation goes like this.

```
let current_doc = self.doc();
if current_doc < target {
     self.seek(target);
}
self.doc() == target
```

So it will return true if target is TERMINATED, where really TERMINATED does not belong to the docset.


The fix tries to clarify the contracts and fixes the intersection algorithm.
We observe a small but all over the board improvement in intersection performance.

---------

Co-authored-by: Stu Hood <stuhood@gmail.com>
Co-authored-by: Paul Masurel <paul.masurel@datadoghq.com>
This commit is contained in:
Paul Masurel
2026-01-23 18:44:10 +01:00
committed by GitHub
parent abf1e64f4d
commit b86caeefe2
10 changed files with 273 additions and 104 deletions

View File

@@ -17,6 +17,9 @@ pub struct VecDocSet {
impl From<Vec<DocId>> for VecDocSet {
fn from(doc_ids: Vec<DocId>) -> VecDocSet {
// We do not use `slice::is_sorted`, as we want to check for doc ids to be strictly
// sorted.
assert!(doc_ids.windows(2).all(|w| w[0] < w[1]));
VecDocSet { doc_ids, cursor: 0 }
}
}