* WIP implemented is_compatible
hide Footer::from_bytes from public consumption - only found Footer::extract
used outside the module
Add a new error type for IncompatibleIndex
add a prototypical call to footer.is_compatible() in ManagedDirectory::open_read
to make sure we error before reading it further
* Make error handling more ergonomic
Add an error subtype for OpenReadError and converters to TantivyError
* Remove an unnecessary assert
it's follower by the same check that Errors instead of panicking
* Correct the compatibility check logic
Leave a defensive versioned footer check to make sure we add new logic handling
when we add possible footer versions
Restricted VersionedFooter::from_bytes to be used inside the crate only
remove a half-baked test
* WIP.
* Return an error if index incompatible - closes#662
Enrich the error type with incompatibility
Change return type to Result<bool, TantivyError>, instead of bool
Add an Incompatibility enum that enriches the IncompatibleIndex error variant
with information, which then allows us to generate a developer-friendly hint how
to upgrade library version or switch feature flags for a different compression
algorithm
Updated changelog
Change the signature of is_compatible
Added documentation to the Incompatibility
Added a conditional test on a Footer with lz4 erroring
* Added handling of pre-tokenized text fields (#642).
* * Updated changelog and examples concerning #642.
* Added tokenized_text method to Value implementation.
* Implemented From<TokenizedString> for TokenizedStream.
* * Removed tokenized flag from TextOptions and code reliance on the flag.
* Changed naming to use word "pre-tokenized" instead of "tokenized".
* Updated example code.
* Fixed comments.
* Minor code refactoring. Test improvements.
* TopDocs: ensure stable sorting on equal score
When selecting the top K documents by score, we need to ensure stable
sorting. Until now, for documents with the same score, we were relying
on the (arbitrary) order returned by the BinaryHeap used to implement
the collectors.
This patch fixes the problem by explicitly using the doc address when
harvesting the `TopSegmentCollector` and when merging the results in
`TopCollector::merge_fruits()`.
This is important (for example) to implement pagination correctly using
the TopDocs collector. If sorting isn't stable, documents that have the
same score might be ranked in different positions depending on the
specific K that was used, thus appearing in two different pages, or in
none at all.
Fixes gh-671
* TMP: alternative solution (see previous commit)
If we add the constrait that D is also PartialOrd in ComparableDoc<T,
D>, then we can move the comparison by doc address directly in the cmp
implementation of ComparableDoc.
* TMP rebase as first commit: add benchmarks for TopSegmentCollector
* fixup! TMP: alternative solution (see previous commit)
* TMP add changelog entry
* TMP run cargo fmt
* add checksum check in ManagedDirectory
fix#400
* flush after writing checksum
* don't checksum atomic file access and clone managed_paths
* implement a footer storing metadata about a file
this is more of a poc, it require some refactoring into multiple files
`terminate(self)` is implemented, but not used anywhere yet
* address comments and simplify things with new contract
use BitOrder for integer to raw byte conversion
consider atomic write imply atomic read, which might not actually be true
use some indirection to have a boxable terminating writer
* implement TerminatingWrite and make terminate() be called where it should
add dependancy to drop_bomb to help find where terminate() should be called
implement TerminatingWrite for wrapper writers
make tests pass
/!\ some tests seems to pass where they shouldn't
* remove usage of drop_bomb
* fmt
* add test for checksum
* address some review comments
* update changelog
* fmt
* small docs cleanup
* only compile a regex once per RegexQuery
Building a `Regex` is an expensive operation. Users of `RegexQuery`
need to cache and reuse regexes when searching across multiple fields.
This is the first step towards allowing that: we can store the `Regex`
directly in the `RegexQuery`, instead of the string pattern.
* RegexQuery: account for possible failure in the constructor
When building a regex from a str pattern, we have to account for the
possibility that the pattern is invalid. Before the previous commit, the
failure would happen in the `specialized_weight` method. Now that we
store a compiled `Regex` in `RegexQuery`, `specialized_weight` doesn't
fail anymore, and we can fail early while constructing `RegexQuery` if
the pattern is invalid.
This is a breaking change for users of `RegexQuery::new`.
* add RegexQuery::from_regex method
This builds a `RegexQuery` from an already compiled `Regex`. The use of
`Into<Arc<Regex>>` is to allow the caller to either simply pass a
`Regex`, or an `Arc<Regex>`, in case it needs to be cached and shared on
the caller's side.
* Using an Arc in AutomatonWeight
Closes#639
* Tidy up
fmt
remove unneccessary -> Result<()> followed by run.unwrap() in a test
* Adding support for elasticsearch-style unbounded queries
Extend the UserInputBound to include Unbounded, so we can reuse formatting and
internal query format
* Still working on elastic-style range queries
Fixes#498
Merge the elastic_range into range
Reformat to make code easier to follow, use optional() macro to return Some
* Fixed bugs
Made the range parser insensitive to whitespace between the ":" and the range.
Removed optional parsing of field.
Added a unit test for the range parser.
Derived PartialEq to compare the results of parsing as structs, instead of
strings. Found a bug with that unit test - "*}" was parsed as an
UserInputBound::Exclusive, instead of UserInputBound::Unbounded. Added an early
detection-and-return for * in the original range parser
* Correct failing test
Assume that we will use "{*" for Unbounded ranges
* Add a note in the changelog
cargo-fmt
* Moved parenthesis to a newline to make nested if-else more visible
* add basic support for float
as for i64, they are mapped to u64 for indexing
query parser don't work yet
* Update value.rs
* implement support for float in query parser
* Update README.md
* Enables clearing the index
Closes#510
* Adds an examples to clear and rebuild index
* Addressing code review
Moved the example from examples/ to docstring above `clear`
* Corrected minor typos and missed/duplicate words
* Added stamper.revert method to be used for rollback
Added type alias for Opstamp
Moved to AtomicU64 on stable rust (since 1.34)
* Change the method name and doc-string
* Remove rollback from delete_all_documents
test_add_then_delete_all_documents fails with --test-threads 2
* Passes all the tests with any number of test-threads
(ran locally 5 times)
* Addressed code review
Deleted comments with debug info
changed ReloadPolicy to Manual
* Removing useless garbage_collect call and updated CHANGELOG
* add ascii folding support
* Minor change and added Changelog.
* add additional tests
* Add tests for ascii folding (#533)
* first tests for ascii folding
* use a `RawTokenizer` for tokens using punctuation
* add test for all (?) folding, inspired by Lucene
* Simplification of the unit test code
* Clippy comments
Clippy complaints that about the cast of &[u32] to a *const __m128i,
because of the lack of alignment constraints.
This commit passes the OutputBuffer object (which enforces proper
alignment) instead of `&[u32]`.
* Clippy. Block alignment
* Code simplification
* Added comment. Code simplification
* Removed the extraneous freq block len hack.
* initial version, still a work in progress
* remove redudant or
* add chrono::DateTime and index i64
* add more tests
* fix tests
* pass DateTime by ptr
* remove println!
* document query_parser rfc 3339 date support
* added some more docs about implementation to schema.rs
* enforce DateTime is UTC, and re-export chrono
* added DateField to changelog
* fixed conflict
* use INDEXED instead of INT_INDEXED for date fields
* Split Collector into an overall Collector and a per-segment SegmentCollector. Precursor to cross-segment parallelism, and as a side benefit cleans up any per-segment fields from being Option<T> to just T.
* Attempt to add MultiCollector back
* working. Chained collector is broken though
* Fix chained collector
* Fix test
* Make Weight Send+Sync for parallelization purposes
* Expose parameters of RangeQuery for external usage
* Removed &mut self
* fixing tests
* Restored TestCollectors
* blop
* multicollector working
* chained collector working
* test broken
* fixing unit test
* blop
* blop
* Blop
* simplifying APi
* blop
* better syntax
* Simplifying top_collector
* refactoring
* blop
* Sync with master
* Added multithread search
* Collector refactoring
* Schema::builder
* CR and rustdoc
* CR comments
* blop
* Added an executor
* Sorted the segment readers in the searcher
* Update searcher.rs
* Fixed unit testst
* changed the place where we have the sort-segment-by-count heuristic
* using crossbeam::channel
* inlining
* Comments about panics propagating
* Added unit test for executor panicking
* Readded default
* Removed Default impl
* Added unit test for executor
* Moving Range and All to Leaves
* Parsing OR/AND
* Simplify user input ast
* AND and OR supported. Returning an error when mixing syntax
Closes#246
* Added support for NOT
* Updated changelog