* Added handling of pre-tokenized text fields (#642).
* * Updated changelog and examples concerning #642.
* Added tokenized_text method to Value implementation.
* Implemented From<TokenizedString> for TokenizedStream.
* * Removed tokenized flag from TextOptions and code reliance on the flag.
* Changed naming to use word "pre-tokenized" instead of "tokenized".
* Updated example code.
* Fixed comments.
* Minor code refactoring. Test improvements.
* Refactor deletes
* Removing generation from SegmentUpdater. These have been obsolete for a long time
* Number literal clippy
* Removed clippy useless allow statement
* add ascii folding support
* Minor change and added Changelog.
* add additional tests
* Add tests for ascii folding (#533)
* first tests for ascii folding
* use a `RawTokenizer` for tokens using punctuation
* add test for all (?) folding, inspired by Lucene
* Simplification of the unit test code
* Closes 471
Removing writing_segments in the segment manager as it is now useless.
Removing the target merged segment id as it is useless as well.
* RAII for tracking which segment is in merge.
Closes#471
* fmt
* Using Inventory::default().
* Split Collector into an overall Collector and a per-segment SegmentCollector. Precursor to cross-segment parallelism, and as a side benefit cleans up any per-segment fields from being Option<T> to just T.
* Attempt to add MultiCollector back
* working. Chained collector is broken though
* Fix chained collector
* Fix test
* Make Weight Send+Sync for parallelization purposes
* Expose parameters of RangeQuery for external usage
* Removed &mut self
* fixing tests
* Restored TestCollectors
* blop
* multicollector working
* chained collector working
* test broken
* fixing unit test
* blop
* blop
* Blop
* simplifying APi
* blop
* better syntax
* Simplifying top_collector
* refactoring
* blop
* Sync with master
* Added multithread search
* Collector refactoring
* Schema::builder
* CR and rustdoc
* CR comments
* blop
* Added an executor
* Sorted the segment readers in the searcher
* Update searcher.rs
* Fixed unit testst
* changed the place where we have the sort-segment-by-count heuristic
* using crossbeam::channel
* inlining
* Comments about panics propagating
* Added unit test for executor panicking
* Readded default
* Removed Default impl
* Added unit test for executor
* A working version
* optimize the ngram parsing
* Decoding codepoint only once.
* Closes#429
* using leading_zeros to make code less cryptic
* lookup in a table
* add position_length to Token
refer #291
* Add term offset to `PhraseQuery`
ref #291
* Add new constructor for `PhraseQuery` that allows custom offset
* fix the method name as per pr comment
* Closes#291
Added unit test.
Using offsets from the analyzer in QueryParser.
* Changed the heap to a paged memory arena.
* Trying to simplify the indexing term hashmap
* Exploding datastruct
* Removed some complexity in bitpacker
* Implement StopWords Filter
- added example doctest for alphanum_only.rs so that I could
drive my own test of the stopword filter
* Style Cop
* Switch HashSet Hasher to FNV for speed
* Update Change Log
* fix missed location renaming
* Simple Implementation of NGram Tokenizer
It does not yet support edges
It could probably be better in many "rusty" ways
But the test is passing, so I'll call this a good stopping point for
the day.
* Remove Ngram from manager. Too many variations
* Basic configuration model
Should the extensive tests exist here?
* Add Sample to provide an End to End testing
* Basic Edgegram support
* cleanup
* code feedback
* More code review feedback processed