* Refactoring of the score tweaker into `SortKeyComputer`s to unlock two features.
- Allow lazy evaluation of score. As soon as we identified that a doc won't
reach the topK threshold, we can stop the evaluation.
- Allow for a different segment level score, segment level score and their conversion.
This PR breaks public API, but fixing code is straightforward.
* Bumping tantivy version
---------
Co-authored-by: Paul Masurel <paul.masurel@datadoghq.com>
* fix windows build (#1)
* Fix windows build
* Add doc traits
* Add field value iter
* Add value and serialization
* Adjust order
* Fix bug
* Correct type
* Fix generic bugs
* Reformat code
* Add generic to index writer which I forgot about
* Fix missing generics on single segment writer
* Add missing type export
* Add default methods for convenience
* Cleanup
* Fix more-like-this query to use standard types
* Update API and fix tests
* Add doc traits
* Add field value iter
* Add value and serialization
* Adjust order
* Fix bug
* Correct type
* Rebase main and fix conflicts
* Reformat code
* Merge upstream
* Fix missing generics on single segment writer
* Add missing type export
* Add default methods for convenience
* Cleanup
* Fix more-like-this query to use standard types
* Update API and fix tests
* Add tokenizer improvements from previous commits
* Add tokenizer improvements from previous commits
* Reformat
* Fix unit tests
* Fix unit tests
* Use enum in changes
* Stage changes
* Add new deserializer logic
* Add serializer integration
* Add document deserializer
* Implement new (de)serialization api for existing types
* Fix bugs and type errors
* Add helper implementations
* Fix errors
* Reformat code
* Add unit tests and some code organisation for serialization
* Add unit tests to deserializer
* Add some small docs
* Add support for deserializing serde values
* Reformat
* Fix typo
* Fix typo
* Change repr of facet
* Remove unused trait methods
* Add child value type
* Resolve comments
* Fix build
* Fix more build errors
* Fix more build errors
* Fix the tests I missed
* Fix examples
* fix numerical order, serialize PreTok Str
* fix coverage
* rename Document to TantivyDocument, rename DocumentAccess to Document
add Binary prefix to binary de/serialization
* fix coverage
---------
Co-authored-by: Pascal Seitz <pascal.seitz@gmail.com>
* tokenizer-api: reduce Tokenizer overhead
Previously a new `Token` for each text encountered was created, which
contains `String::with_capacity(200)`
In the new API the token_stream gets mutable access to the tokenizer,
this allows state to be shared (in this PR Token is shared).
Ideally the allocation for the BoxTokenStream would also be removed, but
this may require some lifetime tricks.
* simplify api
* move lowercase and ascii folding buffer to global
* empty Token text as default
Applied this command to the code, making it a bit shorter and slightly
more readable.
```
cargo +nightly clippy --all-features --benches --tests --workspace --fix -- -A clippy::all -W clippy::uninlined_format_args
cargo +nightly fmt --all
```
* Split Collector into an overall Collector and a per-segment SegmentCollector. Precursor to cross-segment parallelism, and as a side benefit cleans up any per-segment fields from being Option<T> to just T.
* Attempt to add MultiCollector back
* working. Chained collector is broken though
* Fix chained collector
* Fix test
* Make Weight Send+Sync for parallelization purposes
* Expose parameters of RangeQuery for external usage
* Removed &mut self
* fixing tests
* Restored TestCollectors
* blop
* multicollector working
* chained collector working
* test broken
* fixing unit test
* blop
* blop
* Blop
* simplifying APi
* blop
* better syntax
* Simplifying top_collector
* refactoring
* blop
* Sync with master
* Added multithread search
* Collector refactoring
* Schema::builder
* CR and rustdoc
* CR comments
* blop
* Added an executor
* Sorted the segment readers in the searcher
* Update searcher.rs
* Fixed unit testst
* changed the place where we have the sort-segment-by-count heuristic
* using crossbeam::channel
* inlining
* Comments about panics propagating
* Added unit test for executor panicking
* Readded default
* Removed Default impl
* Added unit test for executor