closes#1766
Finding tantivy tokenizers is a frustrating experience currently, since
they need be updated for each tantivy version. That's unnecessary since
the API is rather stable anyway.
* Term are now typed.
This change is backward compatible:
While the Term has a byte representation that is modified, a Term itself
is a transient object that is not serialized as is in the index.
Its .field() and .value_bytes() on the other hand are unchanged.
This change offers better Debug information for terms.
While not necessary it also will help in the support for JSON types.
* Renamed Hierarchical Facet -> Facet
* Added handling of pre-tokenized text fields (#642).
* * Updated changelog and examples concerning #642.
* Added tokenized_text method to Value implementation.
* Implemented From<TokenizedString> for TokenizedStream.
* * Removed tokenized flag from TextOptions and code reliance on the flag.
* Changed naming to use word "pre-tokenized" instead of "tokenized".
* Updated example code.
* Fixed comments.
* Minor code refactoring. Test improvements.
* add ascii folding support
* Minor change and added Changelog.
* add additional tests
* Add tests for ascii folding (#533)
* first tests for ascii folding
* use a `RawTokenizer` for tokens using punctuation
* add test for all (?) folding, inspired by Lucene
* Simplification of the unit test code
* Split Collector into an overall Collector and a per-segment SegmentCollector. Precursor to cross-segment parallelism, and as a side benefit cleans up any per-segment fields from being Option<T> to just T.
* Attempt to add MultiCollector back
* working. Chained collector is broken though
* Fix chained collector
* Fix test
* Make Weight Send+Sync for parallelization purposes
* Expose parameters of RangeQuery for external usage
* Removed &mut self
* fixing tests
* Restored TestCollectors
* blop
* multicollector working
* chained collector working
* test broken
* fixing unit test
* blop
* blop
* Blop
* simplifying APi
* blop
* better syntax
* Simplifying top_collector
* refactoring
* blop
* Sync with master
* Added multithread search
* Collector refactoring
* Schema::builder
* CR and rustdoc
* CR comments
* blop
* Added an executor
* Sorted the segment readers in the searcher
* Update searcher.rs
* Fixed unit testst
* changed the place where we have the sort-segment-by-count heuristic
* using crossbeam::channel
* inlining
* Comments about panics propagating
* Added unit test for executor panicking
* Readded default
* Removed Default impl
* Added unit test for executor
* A working version
* optimize the ngram parsing
* Decoding codepoint only once.
* Closes#429
* using leading_zeros to make code less cryptic
* lookup in a table
* Implement StopWords Filter
- added example doctest for alphanum_only.rs so that I could
drive my own test of the stopword filter
* Style Cop
* Switch HashSet Hasher to FNV for speed
* Update Change Log
* fix missed location renaming
* Simple Implementation of NGram Tokenizer
It does not yet support edges
It could probably be better in many "rusty" ways
But the test is passing, so I'll call this a good stopping point for
the day.
* Remove Ngram from manager. Too many variations
* Basic configuration model
Should the extensive tests exist here?
* Add Sample to provide an End to End testing
* Basic Edgegram support
* cleanup
* code feedback
* More code review feedback processed