Extract trait interfaces from tantivy's core reader types so that
alternative storage backends (e.g. Quickwit) can provide their own
implementations while tantivy's query engine works through dynamic
dispatch.
Reader trait extraction:
- SegmentReader is now a trait; the concrete implementation is renamed
to TantivySegmentReader.
- DynInvertedIndexReader trait for object-safe dynamic dispatch, plus
a typed InvertedIndexReader trait with associated Postings/DocSet
types for static dispatch. The concrete reader becomes
TantivyInvertedIndexReader.
- StoreReader is now a trait; the concrete implementation is renamed
to TantivyStoreReader. get() returns TantivyDocument directly
instead of requiring a generic DocumentDeserialize bound.
Typed downcast for performance-critical paths:
- try_downcast_and_call() + TypedInvertedIndexReaderCb allow query
weights (TermWeight, PhraseWeight) to attempt a downcast to the
concrete TantivyInvertedIndexReader, obtaining typed postings for
zero-cost scoring, and falling back to the dynamic path otherwise.
- TermScorer<TPostings> is now generic over its postings type.
- PostingsWithBlockMax trait enables block-max WAND acceleration
through the trait boundary.
- block_wand() and block_wand_single_scorer() are generic over
PostingsWithBlockMax, and for_each_pruning is dispatched through
the SegmentReader trait so custom backends can provide their own
block-max implementations.
Searcher decoupled from Index:
- New SearcherContext holds schema, executor, and tokenizers.
- Searcher can be constructed from Vec<Arc<dyn SegmentReader>>
via Searcher::from_segment_readers(), without needing an Index.
- Searcher::index() is deprecated in favor of Searcher::context().
Postings and DocSet changes:
- Postings trait gains doc_freq() -> DocFreq (Exact/Approximate)
and has_freq().
- RawPostingsData struct carries raw postings bytes across the trait
boundary for custom reader implementations.
- BlockSegmentPostings::open() takes OwnedBytes instead of FileSlice.
- DocSet gains fill_bitset() method.
Scorer improvements:
- Scorer trait absorbs for_each, for_each_pruning, and explain
(previously free functions or on Weight).
- box_scorer() helper avoids double-boxing Box<dyn Scorer>.
- BoxedTermScorer wraps a type-erased term scorer.
- BufferedUnionScorer initialization fixed to avoid an extra
advance() on construction.
Other changes:
- Document::to_json() now returns serde_json::Value; the old
string serialization is renamed to to_serialized_json().
- DocumentDeserialize removed from the store reader public API.
* add method to fetch block of first vals in columnar
add method to fetch block of first vals in columnar (this is way faster
than single calls for full columns)
add benchmark
fix import warnings
```
test bench_get_block_first_on_full_column ... bench: 56 ns/iter (+/- 26)
test bench_get_block_first_on_full_column_single_calls ... bench: 311 ns/iter (+/- 6)
test bench_get_block_first_on_multi_column ... bench: 378 ns/iter (+/- 15)
test bench_get_block_first_on_multi_column_single_calls ... bench: 546 ns/iter (+/- 13)
test bench_get_block_first_on_optional_column ... bench: 291 ns/iter (+/- 6)
test bench_get_block_first_on_optional_column_single_calls ... bench: 362 ns/iter (+/- 8)
```
* use remainder
With tantivy 0.20 the minimum memory consumption per SegmentWriter increased to
12MB. 7MB are for the different fast field collectors types (they could be
lazily created). Increase the minimum memory from 3MB to 15MB.
Change memory variable naming from arena to budget.
closes#2156
Tantivy used to assume that all files could be somehow memory mapped. After this change, Directory return a `FileSlice` that can be reduced and eventually read into an `OwnedBytes` object. Long and blocking io operation are still required by they do not span over the entire file.
* Alternative take on boosted queries
* Fixing unit test
* Added boosting to the query grammar.
* Made BoostQuery public.
* Added support for boosting field in QueryParser
Closes#547