- improve performance of vint
vint serialization shows up in performance profiles during indexing.
It would also make sense to limit the value space to u29 and operate on 4 bytes only.
- remove unused code
- add missing inlines
- fix regex test
* Drop additional Arc-layer as the automaton itself is now cheap-to-clone.
* Drop state ID type parameter as it is not exposed by the library any more.
closes#1766
Finding tantivy tokenizers is a frustrating experience currently, since
they need be updated for each tantivy version. That's unnecessary since
the API is rather stable anyway.
* Make nightly Clippy mostly happy.
* Document how to produce TermSetQuery queries using QueryParser.
* Enable construction of queries using FuzzyTermQuery via the QueryParser
* Use FxHashMap instead of HashMap in the QueryParser as these hash tables are not exposed to DoS attacks.
* Use a struct instead of a tuple to improve readability.
This fixes the TODO in two ways: If the stemmer already yields an owned string,
it is used directly as the new text of the token. Otherwise, a temporary buffer
is used to copy the stemmed text (just as before) and then swapping it into the
token to reuse its existing buffer.
* Removes all usage of block_on, and use a oneshot channel instead.
Calling `block_on` panics in certain context.
For instance, it panics when it is called in a the context of another
call to block.
Using it in tantivy is unnecessary. We replace it by a thin wrapper
around a oneshot channel that supports both async/sync.
* Removing needless uses of async in the API.
Co-authored-by: PSeitz <PSeitz@users.noreply.github.com>
* Term are now typed.
This change is backward compatible:
While the Term has a byte representation that is modified, a Term itself
is a transient object that is not serialized as is in the index.
Its .field() and .value_bytes() on the other hand are unchanged.
This change offers better Debug information for terms.
While not necessary it also will help in the support for JSON types.
* Renamed Hierarchical Facet -> Facet
* Added handling of pre-tokenized text fields (#642).
* * Updated changelog and examples concerning #642.
* Added tokenized_text method to Value implementation.
* Implemented From<TokenizedString> for TokenizedStream.
* * Removed tokenized flag from TextOptions and code reliance on the flag.
* Changed naming to use word "pre-tokenized" instead of "tokenized".
* Updated example code.
* Fixed comments.
* Minor code refactoring. Test improvements.
* Refactor deletes
* Removing generation from SegmentUpdater. These have been obsolete for a long time
* Number literal clippy
* Removed clippy useless allow statement