PSeitz
080fa4d1f4
add docs/example and Vec<u32> values to sstable ( #2660 )
2025-07-01 15:40:02 +02:00
Parth
5a2fe42c24
make zstd optional in sstable ( #2633 )
...
* make zstd truly optional
* changelog notes
* make sure we write
* resolve comments
* make this a default feature
* remove changelog notes
2025-05-14 17:16:41 +02:00
PSeitz
5379c99ea2
update edition to 2024 ( #2620 )
...
* update common to edition 2024
* update bitpacker to edition 2024
* update stacker to edition 2024
* update query-grammar to edition 2024
* update sstable to edition 2024 + fmt
* fmt
* update columnar to edition 2024
* cargo fmt
* use None instead of _
2025-04-18 04:56:31 +02:00
Paul Masurel
519e5d2ed1
clippy warnings
2025-03-05 11:15:06 +01:00
trinity-1686a
d281ca3e65
Merge pull request #2559 from quickwit-oss/trinity/sstable-partial-automaton
...
allow warming partially an sstable for an automaton
2025-01-08 16:35:35 +01:00
Remi Dettai
71cf19870b
Exist queries match subpath fields ( #2558 )
...
* Exist queries match subpath fields
* Make subpath check optional
* Add async subpath listing
2025-01-06 10:17:39 +01:00
trinity Pointard
fe0c7c5408
change rangebound style
2025-01-02 11:56:05 +01:00
trinity Pointard
dfff5f3bcb
rename merge_holes_under => merge_holes_under_bytes
2024-12-23 16:17:44 +01:00
trinity-1686a
ebf4d84553
add comment about cpu-intensive operation in async context
2024-12-20 12:23:49 +01:00
trinity-1686a
42efc7f7c8
clippy
2024-12-20 11:00:11 +01:00
trinity-1686a
192395c311
attempt at simplifying can_block_match_automaton
2024-12-20 10:25:38 +01:00
trinity-1686a
a1447cc9c2
remove breaking change in sstable public api
2024-12-19 17:30:05 +01:00
trinity-1686a
24c5dc2398
allow warming up automaton
2024-12-10 13:32:12 +01:00
trinity-1686a
9e2ddec4b3
merge adjacent block when building delta for automaton
2024-12-10 13:32:12 +01:00
trinity-1686a
1f6a8e74bb
support iterating over partially loaded sstable
2024-12-10 13:32:12 +01:00
trinity-1686a
7e901f523b
get iter for blocks of sstable matching automaton
2024-12-10 13:32:12 +01:00
trinity-1686a
3c30a41c14
add helper to figure if block can match automaton
2024-12-10 13:32:12 +01:00
PSeitz
4c52499622
clippy ( #2549 )
2024-11-29 16:08:21 +08:00
PSeitz
21d057059e
clippy ( #2527 )
...
* clippy
* clippy
* clippy
* clippy
* convert allow to expect and remove unused
* cargo fmt
* cleanup
* export sample
* clippy
2024-10-22 09:26:54 +08:00
Bruce Mitchener
c17e513377
Reduce typo count. ( #2510 )
2024-10-10 09:55:37 +08:00
PSeitz
3d1c4b313a
support ff range queries on json fields ( #2456 )
...
* support ff range queries on json fields
* fix term date truncation
* use inverted index range query for phrase prefix queries
* rename to InvertedIndexRangeQuery
* fix column filter, add mixed column test
2024-08-02 00:06:50 +08:00
PSeitz
7ebcc15b17
add support for str fast field range query ( #2453 )
...
* add support for str fast field range query
Add support for range queries on fast fields, by converting term bounds to
term ordinals bounds.
closes https://github.com/quickwit-oss/tantivy/issues/2023
* extend tests, rename
* update comment
* update comment
2024-07-17 09:31:42 +08:00
PSeitz
13e9885dfd
faster term aggregation fetch terms ( #2447 )
...
big impact for term aggregations with large `size` parameter (e.g. 1000)
add top 1000 term agg bench
full
terms_few Memory: 27.3 KB (+79.09%) Avg: 3.8058ms (+2.40%) Median: 3.7192ms (+3.47%) [3.6224ms .. 4.3721ms]
terms_many Memory: 6.9 MB Avg: 12.6102ms (-4.70%) Median: 12.1389ms (-6.58%) [10.2847ms .. 15.4857ms]
terms_many_top_1000 Memory: 6.9 MB Avg: 15.8216ms (-83.19%) Median: 15.4899ms (-83.46%) [13.4250ms .. 20.6897ms]
terms_many_order_by_term Memory: 6.9 MB Avg: 14.7820ms (-3.95%) Median: 14.2236ms (-4.28%) [12.6669ms .. 21.0968ms]
terms_many_with_top_hits Memory: 58.2 MB Avg: 551.6218ms (+7.18%) Median: 549.8826ms (+11.01%) [496.7371ms .. 592.1299ms]
terms_many_with_avg_sub_agg Memory: 27.8 MB Avg: 197.7029ms (+2.66%) Median: 190.1564ms (+0.64%) [167.9226ms .. 245.6651ms]
terms_many_json_mixed_type_with_avg_sub_agg Memory: 42.0 MB (+0.00%) Avg: 242.0121ms (+0.92%) Median: 237.7084ms (-2.85%) [201.9959ms .. 302.2136ms]
terms_few_with_cardinality_agg Memory: 10.6 MB Avg: 122.6036ms (+1.21%) Median: 119.0033ms (+2.60%) [109.2859ms .. 161.5858ms]
range_agg_with_term_agg_few Memory: 45.4 KB (+39.75%) Avg: 24.5454ms (+2.14%) Median: 24.2861ms (+2.44%) [23.5109ms .. 27.8406ms]
range_agg_with_term_agg_many Memory: 6.9 MB Avg: 56.8049ms (+3.01%) Median: 50.9706ms (+1.52%) [41.4517ms .. 90.3934ms]
dense
terms_few Memory: 28.8 KB (+81.74%) Avg: 8.9092ms (-2.24%) Median: 8.7143ms (-1.31%) [8.6148ms .. 10.3868ms]
terms_many Memory: 6.9 MB (-0.00%) Avg: 17.9604ms (-10.18%) Median: 17.1552ms (-11.93%) [14.8979ms .. 26.2779ms]
terms_many_top_1000 Memory: 6.9 MB Avg: 21.4963ms (-78.90%) Median: 21.2924ms (-78.98%) [18.2033ms .. 28.0087ms]
terms_many_order_by_term Memory: 6.9 MB Avg: 20.4167ms (-9.13%) Median: 19.5596ms (-11.37%) [17.5153ms .. 29.5987ms]
terms_many_with_top_hits Memory: 58.2 MB Avg: 518.4474ms (-6.41%) Median: 514.9180ms (-9.44%) [471.5550ms .. 579.0220ms]
terms_many_with_avg_sub_agg Memory: 27.8 MB Avg: 263.6702ms (-2.78%) Median: 260.8775ms (-2.55%) [239.5754ms .. 304.6669ms]
terms_many_json_mixed_type_with_avg_sub_agg Memory: 42.0 MB Avg: 299.9791ms (-2.01%) Median: 302.2180ms (-3.08%) [239.2080ms .. 346.3649ms]
terms_few_with_cardinality_agg Memory: 10.6 MB Avg: 136.3303ms (-3.12%) Median: 132.3831ms (-2.88%) [123.7564ms .. 164.7914ms]
range_agg_with_term_agg_few Memory: 47.1 KB (+37.81%) Avg: 35.4538ms (+0.66%) Median: 34.8754ms (-0.56%) [34.2287ms .. 40.0884ms]
range_agg_with_term_agg_many Memory: 6.9 MB Avg: 72.2269ms (-4.38%) Median: 66.1174ms (-4.98%) [55.5125ms .. 124.1622ms]
sparse
terms_few Memory: 27.3 KB (+69.68%) Avg: 19.6053ms (-1.15%) Median: 19.4543ms (-0.38%) [19.3056ms .. 24.0547ms]
terms_many Memory: 1.8 MB Avg: 21.2886ms (-6.28%) Median: 21.1287ms (-6.65%) [20.6640ms .. 24.6144ms]
terms_many_top_1000 Memory: 2.6 MB Avg: 23.4869ms (-85.53%) Median: 23.3393ms (-85.61%) [22.7789ms .. 25.0896ms]
terms_many_order_by_term Memory: 1.8 MB Avg: 21.7437ms (-7.78%) Median: 21.6272ms (-7.66%) [21.0409ms .. 23.6517ms]
terms_many_with_top_hits Memory: 13.1 MB Avg: 43.7926ms (-2.76%) Median: 44.3602ms (+0.01%) [37.8039ms .. 51.0451ms]
terms_many_with_avg_sub_agg Memory: 7.5 MB Avg: 34.6307ms (+3.72%) Median: 33.4522ms (+1.16%) [32.4418ms .. 41.4196ms]
terms_many_json_mixed_type_with_avg_sub_agg Memory: 7.4 MB Avg: 46.4318ms (+1.16%) Median: 46.4050ms (+2.03%) [44.5986ms .. 48.5142ms]
terms_few_with_cardinality_agg Memory: 680.0 KB (-0.04%) Avg: 35.4410ms (+2.05%) Median: 35.1384ms (+1.19%) [34.4402ms .. 39.1082ms]
range_agg_with_term_agg_few Memory: 45.7 KB (+39.44%) Avg: 22.7760ms (+0.44%) Median: 22.5152ms (-0.35%) [22.3078ms .. 26.1567ms]
range_agg_with_term_agg_many Memory: 1.8 MB Avg: 25.7696ms (-4.45%) Median: 25.4009ms (-5.61%) [24.7874ms .. 29.6434ms]
multivalue
terms_few Memory: 244.4 KB Avg: 15.1253ms (-2.85%) Median: 15.0988ms (-0.54%) [14.8790ms .. 15.8193ms]
terms_many Memory: 6.9 MB (-0.00%) Avg: 26.3019ms (-6.24%) Median: 26.3662ms (-4.94%) [21.3553ms .. 31.0564ms]
terms_many_top_1000 Memory: 6.9 MB Avg: 29.5212ms (-72.90%) Median: 29.4257ms (-72.84%) [24.2645ms .. 35.1607ms]
terms_many_order_by_term Memory: 6.9 MB Avg: 28.6076ms (-4.93%) Median: 28.1059ms (-6.64%) [24.0845ms .. 34.1493ms]
terms_many_with_top_hits Memory: 58.3 MB Avg: 570.1548ms (+1.52%) Median: 572.7759ms (+0.53%) [525.9567ms .. 617.0862ms]
terms_many_with_avg_sub_agg Memory: 27.8 MB Avg: 305.5207ms (+0.24%) Median: 296.0101ms (-0.22%) [277.8579ms .. 373.5914ms]
terms_many_json_mixed_type_with_avg_sub_agg Memory: 42.0 MB (-0.00%) Avg: 324.7342ms (-2.51%) Median: 319.0025ms (-2.58%) [298.7122ms .. 368.6144ms]
terms_few_with_cardinality_agg Memory: 10.8 MB Avg: 151.6126ms (-2.54%) Median: 149.0616ms (-0.32%) [136.5592ms .. 181.8942ms]
range_agg_with_term_agg_few Memory: 248.2 KB Avg: 49.5225ms (+3.11%) Median: 48.3994ms (+3.18%) [46.4134ms .. 60.5989ms]
range_agg_with_term_agg_many Memory: 6.9 MB Avg: 85.9824ms (-3.66%) Median: 78.4266ms (-3.85%) [64.1231ms .. 128.5279ms]
2024-07-03 12:42:59 +08:00
PSeitz
56d79cb203
fix cardinality aggregation performance ( #2446 )
...
* fix cardinality aggregation performance
fix cardinality performance by fetching multiple terms at once. This
avoids decompressing the same block and keeps the buffer state between
terms.
add cardinality aggregation benchmark
bump rust version to 1.66
Performance comparison to before (AllQuery)
```
full
cardinality_agg Memory: 3.5 MB (-0.00%) Avg: 21.2256ms (-97.78%) Median: 21.0042ms (-97.82%) [20.4717ms .. 23.6206ms]
terms_few_with_cardinality_agg Memory: 10.6 MB Avg: 81.9293ms (-97.37%) Median: 81.5526ms (-97.38%) [79.7564ms .. 88.0374ms]
dense
cardinality_agg Memory: 3.6 MB (-0.00%) Avg: 25.9372ms (-97.24%) Median: 25.7744ms (-97.25%) [24.7241ms .. 27.8793ms]
terms_few_with_cardinality_agg Memory: 10.6 MB Avg: 93.9897ms (-96.91%) Median: 92.7821ms (-96.94%) [90.3312ms .. 117.4076ms]
sparse
cardinality_agg Memory: 895.4 KB (-0.00%) Avg: 22.5113ms (-95.01%) Median: 22.5629ms (-94.99%) [22.1628ms .. 22.9436ms]
terms_few_with_cardinality_agg Memory: 680.2 KB Avg: 26.4250ms (-94.85%) Median: 26.4135ms (-94.86%) [26.3210ms .. 26.6774ms]
```
* clippy
* assert for sorted ordinals
2024-07-02 15:29:00 +08:00
Adam Reichold
4708171a32
Fix some of the things current Clippy complains about ( #2363 )
2024-04-16 04:27:06 +02:00
trinity-1686a
9ebc5ed053
use fst for sstable index ( #2268 )
...
* read path for new fst based index
* implement BlockAddrStoreWriter
* extract slop/derivation computation
* use better linear approximator and allow negative correction to approximator
* document format and reorder some fields
* optimize single block sstable size
* plug backward compat
2023-12-04 15:13:15 +01:00
Adam Reichold
42acd334f4
Fixes the new deny-by-default incorrect_partial_ord_impl_on_ord_type Clippy lint ( #2131 )
2023-07-21 11:36:17 +09:00
Yuri Astrakhan
74275b76a6
Inline format arguments where makes sense ( #2038 )
...
Applied this command to the code, making it a bit shorter and slightly
more readable.
```
cargo +nightly clippy --all-features --benches --tests --workspace --fix -- -A clippy::all -W clippy::uninlined_format_args
cargo +nightly fmt --all
```
2023-05-10 18:03:59 +09:00
trinity-1686a
9c93bfeb51
optimise warmup code path ( #2007 )
...
* optimise warmup code path
* better function naming
2023-04-21 11:23:09 +02:00
trinity-1686a
780e26331d
sstable compression ( #1946 )
...
* compress sstable with zstd
* add some details to sstable readme
* compress only block which benefit from it
* multiple changes to sstable
make compression optional
use OwnedBytes instead of impl Read in sstable, required for next point
use zstd bulk api, which is much faster on small records
* cleanup and use bulk api for compression
* use dedicated byte for compression
* switch block len and compression flag
* change default zstd level in sstable
2023-04-14 16:25:50 +02:00
trinity-1686a
205e8a0a92
encode dictionary type in fst footer ( #1968 )
...
* encode additional footer for dictionary kind in fst
2023-04-12 09:43:01 +02:00
Paul Masurel
5eb12173d6
Proptest merge columnar ( #1976 )
...
* Added proptest on columnar merge with a shuffle
Made column serialization more explicit.
Bugfix when a bytes column is missing, and with a shuffle.
Improved the cardinality detection logic / column detection.
* Code review
* CR comments
* Following CR
2023-04-04 11:28:42 +09:00
trinity-1686a
482b4155e8
fix bug with new sstable index format ( #1953 )
2023-03-22 10:22:36 +01:00
trinity-1686a
e5e50603a8
new sstable format ( #1943 )
...
* document a new sstable format
* add support for changing target block size
* use new format for sstable index
* handle sstable version errror
* use very small blocks for proptests
* add a footer structure
2023-03-21 15:03:52 +01:00
trinity-1686a
fcf5a25d93
use DeltaReader directly to implement Dictionnary::ord_to_term ( #1928 )
2023-03-08 11:15:56 +09:00
trinity-1686a
a4f7ca8309
use DeltaReader directly to implement Dictionnary::term_ord ( #1925 )
...
* use DeltaReader directly to implement Dictionnary::term_ord
* add some additional test case for Dictionary::term_ord
2023-03-06 09:45:22 +01:00
Paul Masurel
d25fc155b2
Making some of the column/termdict operations async-friendly ( #1902 )
2023-02-27 15:34:47 +09:00
PSeitz
111f25a8f7
clippy ( #1879 )
...
* fix clippy
* fix clippy
* fmt
2023-02-17 11:34:21 +01:00
Paul Masurel
097fd6138d
Fix clippy comments ( #1872 )
2023-02-14 23:12:45 +09:00
Paul Masurel
bd5eea9852
Integrated columnar work.
2023-02-09 13:14:31 +01:00
trinity-1686a
d72ea7d353
modify getters for sstable metadata ( #1793 )
...
* add way to get up to `limit` terms from sstable
* make some function of sstable load less data
* add some tests to sstable
* add tests on sstable dictionary
* fix some bugs with sstable
2023-01-18 14:42:55 +01:00
trinity-1686a
16b704e190
make file_slice_for_range on sstable public ( #1784 )
2023-01-16 13:59:57 +09:00
Paul Masurel
4f9efe654c
Support for columnar ( #1734 )
...
* Added support for dynamic fast field.
See README for more information.
* Apply suggestions from code review
Co-authored-by: PSeitz <PSeitz@users.noreply.github.com >
2023-01-07 17:37:00 +09:00
Paul Masurel
3f915925af
Fixing unit tests
2022-12-27 12:02:16 +09:00
Paul Masurel
9c5fef5af7
Fixing sstable proptest ( #1743 )
2022-12-26 16:29:33 +09:00
Paul Masurel
bb48c3e488
Refactoring to prepare for the addition of dynamic fast field ( #1730 )
...
* Refactoring to prepare for the addition of dynamic fast field
- Exposing insert_key / insert_value
- Renamed SSTable::{Reader/Writer}-> SSTable::{ValueReader/ValueWriter}
- Added a generic Dictionary object in the sstable crate
- Removing the TermDictionary wrapper from tantivy, relying directly on
an alias of the generic Dictionary object.
- dropped the use of byteorder in sstable.
- Stopped scanning / reading the entire dictionary when streaming a range.
* Added a benchmark for streaming sstable ranges.
* CR comments.
Rename deserialize_u64 -> deserialize_vint_u64
* Removed needless allocation, split serialize into serialize and clear.
2022-12-22 12:25:46 +09:00
PSeitz
f9171a3981
fix clippy ( #1725 )
...
* fix clippy
* fix clippy fastfield codecs
* fix clippy bitpacker
* fix clippy common
* fix clippy stacker
* fix clippy sstable
* fmt
2022-12-20 07:30:06 +01:00
Paul Masurel
136a8f4124
Isolating sstable and stacker in independant crates. ( #1718 )
...
Both crate will be used in the new (optional + dynamic) fastfield work.
2022-12-13 11:44:17 +09:00