* compact doc
* add any value type
* pass references when building CompactDoc
* remove OwnedValue from API
* clippy
* clippy
* fail on large documents
* fmt
* cleanup
* cleanup
* implement Value for different types
fix serde_json date Value implementation
* fmt
* cleanup
* fmt
* cleanup
* store positions instead of pos+len
* remove nodes array
* remove mediumvec
* cleanup
* infallible serialize into vec
* remove positions indirection
* remove 24MB limitation in document
use u32 for Addr
Remove the 3 byte addressing limitation and use VInt instead
* cleanup
* extend test
* cleanup, add comments
* rename, remove pub
Achieved by moving the boxes out of the temporary reference wrappers which are
cloneable themselves, i.e. if required the caller can clone them already or
consume them to reuse existing allocations.
* fix ReferenceValue API flaw
Remove `Facet` and `TokenizedString` values from the `ReferenceValue` API,
as this requires the trait value to have them stored somewhere.
Since `TokenizedString` is quite niche, I just copy it into a Box,
instead of designing a reference API around it.
* fix comment link
* add fields_metadata to SegmentReader, add columnar docs
* use schema to resolve field, add test
* normalize paths
* merge for FieldsMetadata, add fields_metadata on Index
* Update src/core/segment_reader.rs
Co-authored-by: Paul Masurel <paul@quickwit.io>
* merge code paths
* add Hash
* move function oustide
---------
Co-authored-by: Paul Masurel <paul@quickwit.io>
* remove Document: DocumentDeserialize dependency
The dependency requires users to implement an API they may not use.
* remove unnecessary Document bounds
* fix windows build (#1)
* Fix windows build
* Add doc traits
* Add field value iter
* Add value and serialization
* Adjust order
* Fix bug
* Correct type
* Fix generic bugs
* Reformat code
* Add generic to index writer which I forgot about
* Fix missing generics on single segment writer
* Add missing type export
* Add default methods for convenience
* Cleanup
* Fix more-like-this query to use standard types
* Update API and fix tests
* Add doc traits
* Add field value iter
* Add value and serialization
* Adjust order
* Fix bug
* Correct type
* Rebase main and fix conflicts
* Reformat code
* Merge upstream
* Fix missing generics on single segment writer
* Add missing type export
* Add default methods for convenience
* Cleanup
* Fix more-like-this query to use standard types
* Update API and fix tests
* Add tokenizer improvements from previous commits
* Add tokenizer improvements from previous commits
* Reformat
* Fix unit tests
* Fix unit tests
* Use enum in changes
* Stage changes
* Add new deserializer logic
* Add serializer integration
* Add document deserializer
* Implement new (de)serialization api for existing types
* Fix bugs and type errors
* Add helper implementations
* Fix errors
* Reformat code
* Add unit tests and some code organisation for serialization
* Add unit tests to deserializer
* Add some small docs
* Add support for deserializing serde values
* Reformat
* Fix typo
* Fix typo
* Change repr of facet
* Remove unused trait methods
* Add child value type
* Resolve comments
* Fix build
* Fix more build errors
* Fix more build errors
* Fix the tests I missed
* Fix examples
* fix numerical order, serialize PreTok Str
* fix coverage
* rename Document to TantivyDocument, rename DocumentAccess to Document
add Binary prefix to binary de/serialization
* fix coverage
---------
Co-authored-by: Pascal Seitz <pascal.seitz@gmail.com>
* move query parser to nom
* add suupport for term grouping
* initial work on infallible parser
* fmt
* add tests and fix minor parsing bugs
* address review comments
* add support for lenient queries in tantivy
* make lenient parser report errors
* allow mixing occur and bool in query
* enable tokenizer on json fields
enable tokenizer on json fields for type text
* Avoid making the tokenizer within the TextAnalyzer pub(crate)
* Moving BoxableTokenizer to tantivy.
---------
Co-authored-by: Paul Masurel <paul@quickwit.io>
Applied this command to the code, making it a bit shorter and slightly
more readable.
```
cargo +nightly clippy --all-features --benches --tests --workspace --fix -- -A clippy::all -W clippy::uninlined_format_args
cargo +nightly fmt --all
```
- improve performance of vint
vint serialization shows up in performance profiles during indexing.
It would also make sense to limit the value space to u29 and operate on 4 bytes only.
- remove unused code
- add missing inlines
- fix regex test