In order to make `IndexWriter::run` callable from outside of the create,
the `UserOperation` type needs to be publicly available.
Since the `indexer` module is private, we just export the `UserOperation`
type directly.
* WIP implemented is_compatible
hide Footer::from_bytes from public consumption - only found Footer::extract
used outside the module
Add a new error type for IncompatibleIndex
add a prototypical call to footer.is_compatible() in ManagedDirectory::open_read
to make sure we error before reading it further
* Make error handling more ergonomic
Add an error subtype for OpenReadError and converters to TantivyError
* Remove an unnecessary assert
it's follower by the same check that Errors instead of panicking
* Correct the compatibility check logic
Leave a defensive versioned footer check to make sure we add new logic handling
when we add possible footer versions
Restricted VersionedFooter::from_bytes to be used inside the crate only
remove a half-baked test
* WIP.
* Return an error if index incompatible - closes#662
Enrich the error type with incompatibility
Change return type to Result<bool, TantivyError>, instead of bool
Add an Incompatibility enum that enriches the IncompatibleIndex error variant
with information, which then allows us to generate a developer-friendly hint how
to upgrade library version or switch feature flags for a different compression
algorithm
Updated changelog
Change the signature of is_compatible
Added documentation to the Incompatibility
Added a conditional test on a Footer with lz4 erroring
* Make u64_lenient() handle f64 fast fields too
Without this, we get a panic during merge since the merger will
get a `None` where it expects something.
Prior to this patch, you can reproduce the panic with:
use tantivy::{
self,
schema::{SchemaBuilder, FAST},
Document, Index, Result,
};
#[test]
fn pass() -> Result<()> {
let mut builder = SchemaBuilder::new();
let field = builder.add_f64_field("f64", FAST);
let index = Index::create_in_ram(builder.build());
let mut writer = index.writer_with_num_threads(1, 50_000_000)?;
for i in 0..1000 {
let mut doc = Document::new();
doc.add_f64(field, 0.42);
writer.add_document(doc);
if i % 5 == 0 {
writer.commit()?;
}
}
writer.commit()?;
Ok(())
}
* Add test to verify that f64 fields are merged
* Ensure multi-valued fast fields can be merged too
* Prevent tokens from being stored in the document store.
Commit adds prepare_for_store method to Document, which changes all
PreTokenizedString values into String values. The method is called
before adding document to the document store to prevent tokens from
being saved there. Commit also adds small changes to comments in
pre_tokenized_text example.
* Avoid storing the pretokenized text.
* code tidy-up
Replace `20` magic constant with COMMON_FOOTER_SIZE
Add a docstring showing how footer is serialised
Add a test for footer length checking
* Add more tests for VersionedFooter
successful and panicking .to_bytes() calls
* Minor changes in footer.rs
* Added handling of pre-tokenized text fields (#642).
* * Updated changelog and examples concerning #642.
* Added tokenized_text method to Value implementation.
* Implemented From<TokenizedString> for TokenizedStream.
* * Removed tokenized flag from TextOptions and code reliance on the flag.
* Changed naming to use word "pre-tokenized" instead of "tokenized".
* Updated example code.
* Fixed comments.
* Minor code refactoring. Test improvements.
* Use `slice::iter` instead of `into_iter` to avoid future breakage
`an_array.into_iter()` currently just works because of the autoref
feature, which then calls `<[T] as IntoIterator>::into_iter`. But
in the future, arrays will implement `IntoIterator`, too. In order
to avoid problems in the future, the call is replaced by `iter()`
which is shorter and more explicit.
* cargo fmt
* TopDocs: ensure stable sorting on equal score
When selecting the top K documents by score, we need to ensure stable
sorting. Until now, for documents with the same score, we were relying
on the (arbitrary) order returned by the BinaryHeap used to implement
the collectors.
This patch fixes the problem by explicitly using the doc address when
harvesting the `TopSegmentCollector` and when merging the results in
`TopCollector::merge_fruits()`.
This is important (for example) to implement pagination correctly using
the TopDocs collector. If sorting isn't stable, documents that have the
same score might be ranked in different positions depending on the
specific K that was used, thus appearing in two different pages, or in
none at all.
Fixes gh-671
* TMP: alternative solution (see previous commit)
If we add the constrait that D is also PartialOrd in ComparableDoc<T,
D>, then we can move the comparison by doc address directly in the cmp
implementation of ComparableDoc.
* TMP rebase as first commit: add benchmarks for TopSegmentCollector
* fixup! TMP: alternative solution (see previous commit)
* TMP add changelog entry
* TMP run cargo fmt