68 Commits

Author SHA1 Message Date
PSeitz
2f2db16ec1 store DateTime as nanoseconds in doc store (#2486)
* store DateTime as nanoseconds in doc store

The doc store DateTime was truncated to microseconds previously. This
removes this truncation, while still keeping backwards compatibility.

This is done by adding the trait `ConfigurableBinarySerializable`, which
works like `BinarySerializable`, but with a config that allows de/serialize
as different date time precision currently.

bump version format to 7.
add compat test to check the date time truncation.

* remove configurable binary serialize, add enum for doc store version

* test doc store version ord
2024-10-18 10:50:20 +08:00
PSeitz
a206c3ccd3 add compat tests (#2485) 2024-09-04 18:26:57 +08:00
Harrison Burt
1c7c6fd591 POC: Tantivy documents as a trait (#2071)
* fix windows build (#1)

* Fix windows build

* Add doc traits

* Add field value iter

* Add value and serialization

* Adjust order

* Fix bug

* Correct type

* Fix generic bugs

* Reformat code

* Add generic to index writer which I forgot about

* Fix missing generics on single segment writer

* Add missing type export

* Add default methods for convenience

* Cleanup

* Fix more-like-this query to use standard types

* Update API and fix tests

* Add doc traits

* Add field value iter

* Add value and serialization

* Adjust order

* Fix bug

* Correct type

* Rebase main and fix conflicts

* Reformat code

* Merge upstream

* Fix missing generics on single segment writer

* Add missing type export

* Add default methods for convenience

* Cleanup

* Fix more-like-this query to use standard types

* Update API and fix tests

* Add tokenizer improvements from previous commits

* Add tokenizer improvements from previous commits

* Reformat

* Fix unit tests

* Fix unit tests

* Use enum in changes

* Stage changes

* Add new deserializer logic

* Add serializer integration

* Add document deserializer

* Implement new (de)serialization api for existing types

* Fix bugs and type errors

* Add helper implementations

* Fix errors

* Reformat code

* Add unit tests and some code organisation for serialization

* Add unit tests to deserializer

* Add some small docs

* Add support for deserializing serde values

* Reformat

* Fix typo

* Fix typo

* Change repr of facet

* Remove unused trait methods

* Add child value type

* Resolve comments

* Fix build

* Fix more build errors

* Fix more build errors

* Fix the tests I missed

* Fix examples

* fix numerical order, serialize PreTok Str

* fix coverage

* rename Document to TantivyDocument, rename DocumentAccess to Document

add Binary prefix to binary de/serialization

* fix coverage

---------

Co-authored-by: Pascal Seitz <pascal.seitz@gmail.com>
2023-10-02 10:01:16 +02:00
PSeitz
2d7390341c increase min memory to 15MB for indexing (#2176)
With tantivy 0.20 the minimum memory consumption per SegmentWriter increased to
12MB. 7MB are for the different fast field collectors types (they could be
lazily created). Increase the minimum memory from 3MB to 15MB.

Change memory variable naming from arena to budget.

closes #2156
2023-09-13 07:38:34 +02:00
Paul Masurel
bd5eea9852 Integrated columnar work. 2023-02-09 13:14:31 +01:00
Paul Masurel
eca6628b3c Minor refactoring (#1266) 2022-01-28 15:55:55 +09:00
Paul Masurel
9679c5f306 Rename quickwit-inc -> quickwit-oss 2022-01-27 15:37:09 +09:00
Paul Masurel
7234bef0eb Issue/1198 (#1201)
* Unit test reproducing #1198
* Fixing unit test to handle the error from add_document.
* Bump project version
2021-11-11 16:42:19 +09:00
Pascal Seitz
99cd25beae use <T: Into<Box<dyn Directory>>> as parameter to open/create an Index
This is done in order to support Box<dyn Directory> additionally to generic implementations of the trait Directory.
Remove boxing in ManagedDirectory.
2021-10-25 12:34:40 +08:00
Pascal Seitz
0062fe705d cargo fmt 2021-07-01 18:17:08 +02:00
Pascal Seitz
1e4df54ab3 fix clippy 2021-07-01 17:41:53 +02:00
Paul Masurel
39dd8cfe24 Cargo clippy. Acronym should not be full uppercase apparently. 2021-04-26 11:49:18 +09:00
Adrien Guillo
7fd6054145 Modified Directory::exists API to return Result<bool, OpenReadError> 2020-11-09 18:00:14 -08:00
Paul Masurel
c23a03ad81 Large API Change in the Directory API. (#901)
Tantivy used to assume that all files could be somehow memory mapped. After this change, Directory return a `FileSlice` that can be reduced and eventually read into an `OwnedBytes` object. Long and blocking io operation are still required by they do not span over the entire file.
2020-10-08 16:36:51 +09:00
Paul Masurel
42756c7474 Removing futures-cpupool and upgrading to futures-0.3 2019-11-15 18:35:31 +09:00
Paul Masurel
7b21b3f25a Refactoring around Field (#673)
* Refactoring around Field

Removing the contract about the order of the field, and the
field id allocation.

* Update delete_queue.rs

* Update field.rs
2019-10-25 09:06:44 +09:00
fdb-hiroshima
d8894f0bd2 add checksum check in ManagedDirectory (#605)
* add checksum check in ManagedDirectory

fix #400

* flush after writing checksum

* don't checksum atomic file access and clone managed_paths

* implement a footer storing metadata about a file

this is more of a poc, it require some refactoring into multiple files
`terminate(self)` is implemented, but not used anywhere yet

* address comments and simplify things with new contract

use BitOrder for integer to raw byte conversion
consider atomic write imply atomic read, which might not actually be true
use some indirection to have a boxable terminating writer

* implement TerminatingWrite and make terminate() be called where it should

add dependancy to drop_bomb to help find where terminate() should be called
implement TerminatingWrite for wrapper writers
make tests pass
/!\ some tests seems to pass where they shouldn't

* remove usage of drop_bomb

* fmt

* add test for checksum

* address some review comments

* update changelog

* fmt
2019-09-18 18:26:25 +09:00
Kornel
754b55eee5 Bump deps (#613)
* Bump crossbeam

* Warnings--

* Remove outdated tempdir
2019-08-05 22:21:22 +09:00
Paul Masurel
7211df6719 Failrs (#600)
* Single thread tests

* Isolating fail tests into a different binary
2019-07-22 13:17:21 +09:00
Paul Masurel
ca55b7ef58 blop 2016-03-03 09:57:04 +09:00
Paul Masurel
1008747fc3 blop 2016-03-02 09:21:26 +09:00
Paul Masurel
20fa812218 blop 2016-03-01 23:45:04 +09:00
Paul Masurel
345db8e62d renamed directory to index 2016-03-01 18:59:13 +09:00
Paul Masurel
914a79372b blop 2016-02-26 10:08:58 +09:00
Paul Masurel
1ddee0eccf added benchmark 2016-02-25 09:55:47 +09:00
Paul Masurel
4152de6c0d bugfix 2016-02-24 08:57:39 +09:00
Paul Masurel
5d4c3ba065 removed error, all ioError 2016-02-23 23:02:40 +09:00
Paul Masurel
34ba3e8d06 fixed args for simd stuff. 2016-02-22 09:40:00 +09:00
Paul Masurel
8677d7dd96 cleaning up imports 2016-02-22 00:19:26 +09:00
Paul Masurel
107d3c0244 werwer 2016-02-20 20:13:14 +09:00
Paul Masurel
e13262e70b refactoring toward adding stored values. 2016-02-19 12:38:58 +09:00
Paul Masurel
b78e5320c3 reader wokring with compressed data. 2016-02-18 20:31:19 +09:00
Paul Masurel
c0c0a2c579 added decode simd compressed postings 2016-02-18 19:03:06 +09:00
Paul Masurel
bc0ea4cbcb trying to add schema 2016-02-14 15:31:57 +09:00
Paul Masurel
71ed7d3b52 werwer 2016-02-14 00:21:09 +09:00
Paul Masurel
b8e4a63d73 toto 2016-02-13 23:46:19 +09:00
Paul Masurel
114945a828 blopwer 2016-02-13 22:07:01 +09:00
Paul Masurel
cdbd772204 advancing on skip 2016-02-13 19:33:28 +09:00
Paul Masurel
c1da85d1f9 blop 2016-02-13 17:39:19 +09:00
Paul Masurel
83c5ce8a28 skip 2016-02-13 15:25:06 +09:00
Paul Masurel
e7f77d9866 blip 2016-02-04 10:46:45 +09:00
Paul Masurel
f790425679 beeeee 2016-02-03 22:33:16 +09:00
Paul Masurel
52f601b1b9 intersection kinda working 2016-02-01 21:26:03 +09:00
Paul Masurel
484bafd144 *werwer* 2016-01-31 21:11:07 +09:00
Paul Masurel
f9dd119489 aaa 2016-01-31 16:16:11 +09:00
Paul Masurel
a5950ed5a7 vup 2016-01-30 19:07:12 +09:00
Paul Masurel
0f83ecbf31 blop 2016-01-29 17:18:19 +09:00
Paul Masurel
a515294b8d Added Directory from temp_dir 2016-01-29 10:12:22 +09:00
Paul Masurel
d7da57a605 samllish stuff 2016-01-22 18:09:55 +09:00
Paul Masurel
1e61cefc99 using tempdir. unit test working. 2016-01-22 13:44:07 +09:00