* Use a more general but still object-safe signature for Query::query_terms.
* Further constraint the generalized Query::query_terms signature to allow extracting references to terms.
* Removes all usage of block_on, and use a oneshot channel instead.
Calling `block_on` panics in certain context.
For instance, it panics when it is called in a the context of another
call to block.
Using it in tantivy is unnecessary. We replace it by a thin wrapper
around a oneshot channel that supports both async/sync.
* Removing needless uses of async in the API.
Co-authored-by: PSeitz <PSeitz@users.noreply.github.com>
* Added sstable and enabling it by default, and parallel boolean query.
* Added async API for FileSlice.
* Added async get_doc
* Reduce blocksize to 32_000
* Added debug logs
Quickwit specific feature a hidden behind the quickwit feature flag.
* Term are now typed.
This change is backward compatible:
While the Term has a byte representation that is modified, a Term itself
is a transient object that is not serialized as is in the index.
Its .field() and .value_bytes() on the other hand are unchanged.
This change offers better Debug information for terms.
While not necessary it also will help in the support for JSON types.
* Renamed Hierarchical Facet -> Facet
* Add a NORMED options on field
Make fieldnorm indexation optional:
* for all types except text => added a NORMED options
* for text field
** if STRING, field has not fieldnorm retained
** if TEXT, field has fieldnorm computed
* Finalize making fieldnorm optional for all field types.
- Using Option for fieldnorm readers.
* Added optimisation using block wand for single TermScorer.
A proptest was also added.
* Fix block wand algorithm by taking the last doc id of scores until the pivot scorer (included).
* In block wand, when block max score is lower than the threshold, advance the scorer with best score.
* Fix wrong condition in block_wand_single_scorer and add debug_assert to have an equality check on doc to break the loop.
Tantivy used to assume that all files could be somehow memory mapped. After this change, Directory return a `FileSlice` that can be reduced and eventually read into an `OwnedBytes` object. Long and blocking io operation are still required by they do not span over the entire file.
go_to_first_doc was typically calling seek with a target smaller than
doc.
Since SegmentPostings typically do a linear search on the full block,
regardless of the current position, it could have our segment postings
go backward.
- Change in the DocSet and Scorer API. (@fulmicoton).
A freshly created DocSet point directly to their first doc. A sentinel value called TERMINATED marks the end of a DocSet.
`.advance()` returns the new DocId. `Scorer::skip(target)` has been replaced by `Scorer::seek(target)` and returns the resulting DocId.
As a result, iterating through DocSet now looks as follows
```rust
let mut doc = docset.doc();
while doc != TERMINATED {
// ...
doc = docset.advance();
}
```
The change made it possible to greatly simplify a lot of the docset's code.
- Misc internal optimization and introduction of the `Scorer::for_each_pruning` function. (@fulmicoton)
* Alternative take on boosted queries
* Fixing unit test
* Added boosting to the query grammar.
* Made BoostQuery public.
* Added support for boosting field in QueryParser
Closes#547
* Clippy comments
Clippy complaints that about the cast of &[u32] to a *const __m128i,
because of the lack of alignment constraints.
This commit passes the OutputBuffer object (which enforces proper
alignment) instead of `&[u32]`.
* Clippy. Block alignment
* Code simplification
* Added comment. Code simplification
* Removed the extraneous freq block len hack.