* store DateTime as nanoseconds in doc store
The doc store DateTime was truncated to microseconds previously. This
removes this truncation, while still keeping backwards compatibility.
This is done by adding the trait `ConfigurableBinarySerializable`, which
works like `BinarySerializable`, but with a config that allows de/serialize
as different date time precision currently.
bump version format to 7.
add compat test to check the date time truncation.
* remove configurable binary serialize, add enum for doc store version
* test doc store version ord
* compact doc
* add any value type
* pass references when building CompactDoc
* remove OwnedValue from API
* clippy
* clippy
* fail on large documents
* fmt
* cleanup
* cleanup
* implement Value for different types
fix serde_json date Value implementation
* fmt
* cleanup
* fmt
* cleanup
* store positions instead of pos+len
* remove nodes array
* remove mediumvec
* cleanup
* infallible serialize into vec
* remove positions indirection
* remove 24MB limitation in document
use u32 for Addr
Remove the 3 byte addressing limitation and use VInt instead
* cleanup
* extend test
* cleanup, add comments
* rename, remove pub
* Fix trait bound of StoreReader::iter
Similar to `StoreReader::get`, `StoreReader::iter` should only require
`DocumentDeserialize` and not `Document`.
* Mark the iterator returned by SegmentReader::doc_ids_alive as Send so it can be used in impls of Stream/AsyncIterator.
* remove Document: DocumentDeserialize dependency
The dependency requires users to implement an API they may not use.
* remove unnecessary Document bounds
* fix windows build (#1)
* Fix windows build
* Add doc traits
* Add field value iter
* Add value and serialization
* Adjust order
* Fix bug
* Correct type
* Fix generic bugs
* Reformat code
* Add generic to index writer which I forgot about
* Fix missing generics on single segment writer
* Add missing type export
* Add default methods for convenience
* Cleanup
* Fix more-like-this query to use standard types
* Update API and fix tests
* Add doc traits
* Add field value iter
* Add value and serialization
* Adjust order
* Fix bug
* Correct type
* Rebase main and fix conflicts
* Reformat code
* Merge upstream
* Fix missing generics on single segment writer
* Add missing type export
* Add default methods for convenience
* Cleanup
* Fix more-like-this query to use standard types
* Update API and fix tests
* Add tokenizer improvements from previous commits
* Add tokenizer improvements from previous commits
* Reformat
* Fix unit tests
* Fix unit tests
* Use enum in changes
* Stage changes
* Add new deserializer logic
* Add serializer integration
* Add document deserializer
* Implement new (de)serialization api for existing types
* Fix bugs and type errors
* Add helper implementations
* Fix errors
* Reformat code
* Add unit tests and some code organisation for serialization
* Add unit tests to deserializer
* Add some small docs
* Add support for deserializing serde values
* Reformat
* Fix typo
* Fix typo
* Change repr of facet
* Remove unused trait methods
* Add child value type
* Resolve comments
* Fix build
* Fix more build errors
* Fix more build errors
* Fix the tests I missed
* Fix examples
* fix numerical order, serialize PreTok Str
* fix coverage
* rename Document to TantivyDocument, rename DocumentAccess to Document
add Binary prefix to binary de/serialization
* fix coverage
---------
Co-authored-by: Pascal Seitz <pascal.seitz@gmail.com>
LZ4 provides fast and simple compression whereas Zstd is exceptionally flexible
so that the additional support for Brotli and Snappy does not really add
any distinct functionality on top of those two algorithms.
Removing them reduces our maintenance burden and reduces the number of choices
users have to make when setting up their project based on Tantivy.
* Include only built-in compression algorithms as enum variants
This enables compile-time errors when a compression algorithm is requested which
is not actually enabled for the current Cargo project. The cost is that indexes
using other compression algorithms cannot even be loaded (even though they
are not fully accessible in any case).
As a drive-by, this also fixes `--no-default-features` on `cfg(unix)`.
* Provide more instructive error messages for unsupported, but not unknown compression variants.
Applied this command to the code, making it a bit shorter and slightly
more readable.
```
cargo +nightly clippy --all-features --benches --tests --workspace --fix -- -A clippy::all -W clippy::uninlined_format_args
cargo +nightly fmt --all
```
* add memory limit for aggregations
introduce AggregationLimits to set memory consumption limit and bucket limits
memory limit is checked during aggregation, bucket limit is checked before returning the aggregation request.
* Apply suggestions from code review
Co-authored-by: Paul Masurel <paul@quickwit.io>
* add ByteCount with human readable format
---------
Co-authored-by: Paul Masurel <paul@quickwit.io>
Fixes an issue in the skip list deserialization, which deserialized the byte start offset incorrectly as u32.
`get_doc` will fail for any docs that live in a block with start offset larger than u32::MAX (~4GB).
Causes index corruption, if a segment with a doc store larger 4GB is merged.
tantivy version 0.19 is affected
In addition, it isolates the doc compressor logic,
better reports io::Result.
In the case of the same-thread doc compressor,
the blocks are also not copied.