* Prevent tokens from being stored in the document store.
Commit adds prepare_for_store method to Document, which changes all
PreTokenizedString values into String values. The method is called
before adding document to the document store to prevent tokens from
being saved there. Commit also adds small changes to comments in
pre_tokenized_text example.
* Avoid storing the pretokenized text.
* code tidy-up
Replace `20` magic constant with COMMON_FOOTER_SIZE
Add a docstring showing how footer is serialised
Add a test for footer length checking
* Add more tests for VersionedFooter
successful and panicking .to_bytes() calls
* Minor changes in footer.rs
* Added handling of pre-tokenized text fields (#642).
* * Updated changelog and examples concerning #642.
* Added tokenized_text method to Value implementation.
* Implemented From<TokenizedString> for TokenizedStream.
* * Removed tokenized flag from TextOptions and code reliance on the flag.
* Changed naming to use word "pre-tokenized" instead of "tokenized".
* Updated example code.
* Fixed comments.
* Minor code refactoring. Test improvements.
* Use `slice::iter` instead of `into_iter` to avoid future breakage
`an_array.into_iter()` currently just works because of the autoref
feature, which then calls `<[T] as IntoIterator>::into_iter`. But
in the future, arrays will implement `IntoIterator`, too. In order
to avoid problems in the future, the call is replaced by `iter()`
which is shorter and more explicit.
* cargo fmt
* TopDocs: ensure stable sorting on equal score
When selecting the top K documents by score, we need to ensure stable
sorting. Until now, for documents with the same score, we were relying
on the (arbitrary) order returned by the BinaryHeap used to implement
the collectors.
This patch fixes the problem by explicitly using the doc address when
harvesting the `TopSegmentCollector` and when merging the results in
`TopCollector::merge_fruits()`.
This is important (for example) to implement pagination correctly using
the TopDocs collector. If sorting isn't stable, documents that have the
same score might be ranked in different positions depending on the
specific K that was used, thus appearing in two different pages, or in
none at all.
Fixes gh-671
* TMP: alternative solution (see previous commit)
If we add the constrait that D is also PartialOrd in ComparableDoc<T,
D>, then we can move the comparison by doc address directly in the cmp
implementation of ComparableDoc.
* TMP rebase as first commit: add benchmarks for TopSegmentCollector
* fixup! TMP: alternative solution (see previous commit)
* TMP add changelog entry
* TMP run cargo fmt
* Add a doctest to BooleanQuery
Closes#446
Mark a function that is only used in tests to be compiled for tests only
Fix doc-comments in a couple of related files
* Minor corrections
remove whitespace, fix typos, add explicit dyn marker
* WIP: BooleanQuery doc test
Trying to nest several BooleanQueries together
* Addressed old review
rust 2018 edition + make function available to everyone
* Box the previous query to resolve the type error
* Rework wording in DocAdress document strings
* Reworded and restructured the docstring
* add checksum check in ManagedDirectory
fix#400
* flush after writing checksum
* don't checksum atomic file access and clone managed_paths
* implement a footer storing metadata about a file
this is more of a poc, it require some refactoring into multiple files
`terminate(self)` is implemented, but not used anywhere yet
* address comments and simplify things with new contract
use BitOrder for integer to raw byte conversion
consider atomic write imply atomic read, which might not actually be true
use some indirection to have a boxable terminating writer
* implement TerminatingWrite and make terminate() be called where it should
add dependancy to drop_bomb to help find where terminate() should be called
implement TerminatingWrite for wrapper writers
make tests pass
/!\ some tests seems to pass where they shouldn't
* remove usage of drop_bomb
* fmt
* add test for checksum
* address some review comments
* update changelog
* fmt