Commit Graph

1943 Commits

Author SHA1 Message Date
Pascal Seitz
abb5624af2 add contributing guidelines, add codec comparer binary
add contributing guidelines
add codec comparer binary to test codec compressions with different test data sets
2021-06-14 13:56:40 +02:00
Pascal Seitz
1d41b96d32 rename, add codec_tester 2021-06-14 13:56:40 +02:00
Pascal Seitz
ef4665945f rename file 2021-06-14 13:56:40 +02:00
Pascal Seitz
294cd5fd0b streamline traits and tests 2021-06-14 13:56:40 +02:00
Pascal Seitz
f4d271177c add inline, add readme 2021-06-14 13:56:40 +02:00
Pascal Seitz
451538fecf add serialize for bool 2021-06-14 13:56:40 +02:00
Pascal Seitz
e78e0fec59 add multilinearinterpolation
add multilinearinterpolation, which compresses blocks of size 512
add checks for linear interpolation
2021-06-14 13:56:40 +02:00
Pascal Seitz
2e639cebf8 fix bitpacker bug, reset internal value 2021-06-14 13:56:40 +02:00
Pascal Seitz
e296da7ade add debug and failsafes 2021-06-14 13:56:40 +02:00
Pascal Seitz
3b3e26c4b8 use f64 precision for slope calculation 2021-06-14 13:56:40 +02:00
Pascal Seitz
6a4883ac69 use uniform distribution sampling 2021-06-14 13:56:40 +02:00
Pascal Seitz
0ba05df545 add f32::MAX to disable a compressor 2021-06-14 13:56:40 +02:00
Pascal Seitz
aa3c4d4029 use f32 precision, add inline 2021-06-14 13:56:40 +02:00
Pascal Seitz
60df629725 cargo.toml license desc and author 2021-06-14 13:56:40 +02:00
Pascal Seitz
2570b005ac fix estimation test 2021-06-14 13:56:40 +02:00
Pascal Seitz
d5212cd19d fix clippy 2021-06-14 13:56:40 +02:00
Pascal Seitz
2193d85622 fix clippy and common crate tests 2021-06-14 13:56:40 +02:00
Pascal Seitz
dfdbfe9eff add benchmark for fast field codecs
test tests::bench_fastfield_bitpack_create        ... bench:      57,628 ns/iter (+/- 23,486)
test tests::bench_fastfield_bitpack_get           ... bench:      43,323 ns/iter (+/- 4,286)
test tests::bench_fastfield_linearinterpol_create ... bench:     223,625 ns/iter (+/- 33,563)
test tests::bench_fastfield_linearinterpol_get    ... bench:      82,839 ns/iter (+/- 9,575)
2021-06-14 13:56:40 +02:00
Pascal Seitz
b999e836b2 replace BitpackedFastFieldReader, delete FastFieldSerializer trait 2021-06-14 13:56:40 +02:00
Pascal Seitz
be2dd41e69 add interface to create and read codecs
add CodecReader as common interface in fastfield codec crate
add LinearInterpolation to DynamicFastFieldReader
calc estimation and choose best codec
cleanup
2021-06-14 13:56:40 +02:00
Pascal Seitz
483fdb79cc add linear interpolation estimation
add estimation tests
add codec test data in tests
2021-06-14 13:56:40 +02:00
Pascal Seitz
aefd0fc907 refactor, add fastfield metadata to footer
change api to fastfield reader in codec crate
add fastfield metadata to footer
remove old code
merge codec files
2021-06-14 13:56:40 +02:00
Pascal Seitz
3298d6cb71 move common to common crate, create fastfield_codecs crate
move common to common crate
create fastfield_codecs crate
add bitpacker to fast field codecs
add linear interpolation to fast field codecs
add tests
2021-06-14 13:56:40 +02:00
Pascal Seitz
c02c78ea73 implement linear interpol serializer 2021-06-14 13:56:40 +02:00
Pascal Seitz
6bf4fee1ba support multiple codecs
support multiple codes
prepend codec id to all fast fields
add new api to create fastfields with access to all data
use new fastfield creation api in initial creation and merge
remove unused collect of data in doc_id_mapping
2021-06-14 13:56:40 +02:00
PSeitz
5209238c1b use github actions for tests 0.15.1 2021-06-14 12:51:46 +02:00
Paul Masurel
7ef25ec400 Bump to 0.15.1 to publish bugfix 2021-06-14 18:45:38 +09:00
PSeitz
221e7cbb55 Merge pull request #1076 from appaquet/fix/store-reader-iterator
Fix panic in store reader raw document iterator during segment merge
2021-06-14 11:22:58 +02:00
Pascal Seitz
873ac1a3ac cleanup import 2021-06-14 10:31:45 +02:00
Pascal Seitz
ebe55a7ae1 refactor test, fixes #1077
replace test with smaller test in doc_store
2021-06-14 10:10:05 +02:00
Bernard Swart
9f32d40b27 Misspelling of misspelled was fixed (#1078) 2021-06-14 16:29:12 +09:00
Andre-Philippe Paquet
8ae10a930a fix formatting 2021-06-13 17:23:40 -04:00
Andre-Philippe Paquet
473a346814 remove debugging 2021-06-13 16:49:44 -04:00
Andre-Philippe Paquet
3a8a0fe79a add fuzzy merge test 2021-06-13 16:42:24 -04:00
Andre-Philippe Paquet
511dc8f87f fix store reader iterator 2021-06-13 16:00:13 -04:00
Paul Masurel
3901295329 Bumped query-grammar version 2021-06-07 10:00:14 +09:00
Paul Masurel
f5918c6c74 Completed bitpacker README 2021-06-07 09:57:17 +09:00
Paul Masurel
abe6b4baec Bumped tantivy version to 0.15 2021-06-07 09:52:48 +09:00
Paul Masurel
6e4b61154f Issue/1070 (#1071)
Add a boolean flag in the Query::query_terms informing on whether
position information is required.

Closes #1070
2021-06-03 22:33:20 +09:00
PSeitz
2aad0ced77 add inline to bitpacker (#1064) 2021-05-31 23:15:41 +09:00
Stéphane Campinas
41ea14840d add benchmark of term streams merge (#1024)
* add benchmark of term streams merge
* use union based on FST for merging the term dictionaries
* Rename TermMerger benchmark
2021-05-31 23:15:01 +09:00
PSeitz
dff0ffd38a prepare for multiple fastfield codecs (#1063)
* prepare for multiple fastfield codecs

 prepare for multiple fastfield codecs by wrapping the codecs in an enum #1042

* add FastFieldSerializer trait, add DynamicFastFieldSerializer

add FastFieldSerializer trait
add DynamicFastFieldSerializer enum to wrap all implementors of the FastFieldSerializer trait

* add estimation for fastfield bitpacker
2021-05-31 23:14:14 +09:00
PSeitz
8d32c3ba3a Change Footer version handling, Make compression dynamic (#1060)
Change Footer version handling, Make compression dynamic

Change Footer version handling
Simplify version handling by switching to JSON instead of binary serialization.
fixes #1058

Make compression dynamic
Instead of choosing the compression during compile time via a feature flag, you can now have multiple compression algorithms enabled and decide during runtime which one to choose via IndexSettings. Changing the compression algorithm on an index is also supported. The information which algorithm was used in the doc store is stored in the DocStoreFooter. The default is the lz4 block format.
fixes #904

Handle merging of different compressors
Fix feature flag names
Add doc store test for all compressors
2021-05-28 14:57:20 +09:00
Moriyoshi Koizumi
4afba005f9 Provide a means to deal with malformed facet text representation for the query parser (#1056)
* Provide a means to deal with malformed facet text representation for the query parser.
* Specific error enum for the facet parse error.
2021-05-27 12:16:49 +09:00
PSeitz
85fb0cc20a cache field norm reader in merge (#1061) 2021-05-25 21:48:02 +09:00
PSeitz
5ef2d56ec2 Avoid docstore stacking for small segments, fixes #1053 (#1055) 2021-05-24 15:38:49 +09:00
Paul Masurel
fd8e5bdf57 Rename more like this 2021-05-21 16:32:39 +09:00
PSeitz
4f8481a1e4 Detect if segments are stackackable with sorting, fixes #1038 (#1054)
* Detect if segments are stackackable with sorting, fixes #1038

Detect if segments are stackable when their data ranges on the sort property are disjunct.
Presort segments by thei min value on merge, to enable easier stacking.

* move code to function
2021-05-21 15:23:17 +09:00
PSeitz
bcd72e5c14 fix and refactor log merge policy, fixes #1035 (#1043)
* fix and refactor log merge policy, fixes #1035

fixes a bug in log merge policy where an index was wrongly referenced by its index

* cleanup

* fix sort order, improve method names

* use itertools groupby, fix serialization test

* minor improvments

* update names
2021-05-19 10:48:46 +09:00
PSeitz
249bc6cf72 upgrade lz4_flex to 0.8 (#1049)
* upgrade lz4_flex to 0.8

* fix set_len
2021-05-19 10:46:01 +09:00