PSeitz-dd
c6912ce89a
Handle JSON fields and columnar in space_usage ( #2761 )
...
return field names in space_usage instead of `Field`
more detailed info for columns
2025-12-10 20:33:33 +08:00
Ang
08a92675dc
Fix typos again ( #2753 )
...
Found via `codespell -S benches,stopwords.rs -L
womens,parth,abd,childs,ond,ser,ue,mot,hel,atleast,pris,claus,allo`
2025-12-01 12:15:41 +01:00
PSeitz
43a784671a
clippy ( #2741 )
...
Co-authored-by: Pascal Seitz <pascal.seitz@datadoghq.com >
2025-11-21 18:07:03 +01:00
PSeitz
85010b589a
clippy ( #2700 )
...
* clippy
* clippy
* clippy
* clippy + fmt
---------
Co-authored-by: Pascal Seitz <pascal.seitz@datadoghq.com >
2025-09-19 18:04:25 +02:00
PSeitz-dd
2340dca628
fix compiler warnings ( #2699 )
...
* fix compiler warnings
* fix import
2025-09-19 15:55:04 +02:00
PSeitz
33794a114c
chore: Release ( #2686 )
...
Co-authored-by: Pascal Seitz <pascal.seitz@datadoghq.com >
2025-08-20 18:29:37 +08:00
Stu Hood
d9eb093368
Attempt to clarify sorted_ords_to_term_cb.
2025-07-29 21:56:31 -07:00
Eric Ridge
2f01152a3c
adjust Dictionary::sorted_ords_to_term_cb() to allow duplicates
2025-07-16 13:38:43 -07:00
PSeitz
080fa4d1f4
add docs/example and Vec<u32> values to sstable ( #2660 )
2025-07-01 15:40:02 +02:00
PSeitz
4a6123d3ff
release tantivy: bump versions ( #2625 )
...
* chore: Release
* chore: Release
---------
Co-authored-by: Pascal Seitz <pascal.seitz@datadoghq.com >
2025-06-10 15:34:39 +02:00
Parth
5a2fe42c24
make zstd optional in sstable ( #2633 )
...
* make zstd truly optional
* changelog notes
* make sure we write
* resolve comments
* make this a default feature
* remove changelog notes
2025-05-14 17:16:41 +02:00
PSeitz
5379c99ea2
update edition to 2024 ( #2620 )
...
* update common to edition 2024
* update bitpacker to edition 2024
* update stacker to edition 2024
* update query-grammar to edition 2024
* update sstable to edition 2024 + fmt
* fmt
* update columnar to edition 2024
* cargo fmt
* use None instead of _
2025-04-18 04:56:31 +02:00
Paul Masurel
519e5d2ed1
clippy warnings
2025-03-05 11:15:06 +01:00
dependabot[bot]
43c89b4360
Update itertools requirement from 0.13.0 to 0.14.0 ( #2563 )
...
Updates the requirements on [itertools](https://github.com/rust-itertools/itertools ) to permit the latest version.
- [Changelog](https://github.com/rust-itertools/itertools/blob/master/CHANGELOG.md )
- [Commits](https://github.com/rust-itertools/itertools/compare/v0.13.0...v0.14.0 )
---
updated-dependencies:
- dependency-name: itertools
dependency-type: direct:production
...
Signed-off-by: dependabot[bot] <support@github.com >
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-01-08 17:11:46 +01:00
trinity-1686a
d281ca3e65
Merge pull request #2559 from quickwit-oss/trinity/sstable-partial-automaton
...
allow warming partially an sstable for an automaton
2025-01-08 16:35:35 +01:00
Remi Dettai
71cf19870b
Exist queries match subpath fields ( #2558 )
...
* Exist queries match subpath fields
* Make subpath check optional
* Add async subpath listing
2025-01-06 10:17:39 +01:00
trinity Pointard
fe0c7c5408
change rangebound style
2025-01-02 11:56:05 +01:00
trinity Pointard
dfff5f3bcb
rename merge_holes_under => merge_holes_under_bytes
2024-12-23 16:17:44 +01:00
trinity-1686a
ebf4d84553
add comment about cpu-intensive operation in async context
2024-12-20 12:23:49 +01:00
trinity-1686a
42efc7f7c8
clippy
2024-12-20 11:00:11 +01:00
trinity-1686a
192395c311
attempt at simplifying can_block_match_automaton
2024-12-20 10:25:38 +01:00
trinity-1686a
a1447cc9c2
remove breaking change in sstable public api
2024-12-19 17:30:05 +01:00
trinity-1686a
24c5dc2398
allow warming up automaton
2024-12-10 13:32:12 +01:00
trinity-1686a
9e2ddec4b3
merge adjacent block when building delta for automaton
2024-12-10 13:32:12 +01:00
trinity-1686a
1f6a8e74bb
support iterating over partially loaded sstable
2024-12-10 13:32:12 +01:00
trinity-1686a
7e901f523b
get iter for blocks of sstable matching automaton
2024-12-10 13:32:12 +01:00
trinity-1686a
3c30a41c14
add helper to figure if block can match automaton
2024-12-10 13:32:12 +01:00
PSeitz
4c52499622
clippy ( #2549 )
2024-11-29 16:08:21 +08:00
PSeitz
21d057059e
clippy ( #2527 )
...
* clippy
* clippy
* clippy
* clippy
* convert allow to expect and remove unused
* cargo fmt
* cleanup
* export sample
* clippy
2024-10-22 09:26:54 +08:00
Bruce Mitchener
c17e513377
Reduce typo count. ( #2510 )
2024-10-10 09:55:37 +08:00
PSeitz
3d1c4b313a
support ff range queries on json fields ( #2456 )
...
* support ff range queries on json fields
* fix term date truncation
* use inverted index range query for phrase prefix queries
* rename to InvertedIndexRangeQuery
* fix column filter, add mixed column test
2024-08-02 00:06:50 +08:00
PSeitz
7ebcc15b17
add support for str fast field range query ( #2453 )
...
* add support for str fast field range query
Add support for range queries on fast fields, by converting term bounds to
term ordinals bounds.
closes https://github.com/quickwit-oss/tantivy/issues/2023
* extend tests, rename
* update comment
* update comment
2024-07-17 09:31:42 +08:00
PSeitz
13e9885dfd
faster term aggregation fetch terms ( #2447 )
...
big impact for term aggregations with large `size` parameter (e.g. 1000)
add top 1000 term agg bench
full
terms_few Memory: 27.3 KB (+79.09%) Avg: 3.8058ms (+2.40%) Median: 3.7192ms (+3.47%) [3.6224ms .. 4.3721ms]
terms_many Memory: 6.9 MB Avg: 12.6102ms (-4.70%) Median: 12.1389ms (-6.58%) [10.2847ms .. 15.4857ms]
terms_many_top_1000 Memory: 6.9 MB Avg: 15.8216ms (-83.19%) Median: 15.4899ms (-83.46%) [13.4250ms .. 20.6897ms]
terms_many_order_by_term Memory: 6.9 MB Avg: 14.7820ms (-3.95%) Median: 14.2236ms (-4.28%) [12.6669ms .. 21.0968ms]
terms_many_with_top_hits Memory: 58.2 MB Avg: 551.6218ms (+7.18%) Median: 549.8826ms (+11.01%) [496.7371ms .. 592.1299ms]
terms_many_with_avg_sub_agg Memory: 27.8 MB Avg: 197.7029ms (+2.66%) Median: 190.1564ms (+0.64%) [167.9226ms .. 245.6651ms]
terms_many_json_mixed_type_with_avg_sub_agg Memory: 42.0 MB (+0.00%) Avg: 242.0121ms (+0.92%) Median: 237.7084ms (-2.85%) [201.9959ms .. 302.2136ms]
terms_few_with_cardinality_agg Memory: 10.6 MB Avg: 122.6036ms (+1.21%) Median: 119.0033ms (+2.60%) [109.2859ms .. 161.5858ms]
range_agg_with_term_agg_few Memory: 45.4 KB (+39.75%) Avg: 24.5454ms (+2.14%) Median: 24.2861ms (+2.44%) [23.5109ms .. 27.8406ms]
range_agg_with_term_agg_many Memory: 6.9 MB Avg: 56.8049ms (+3.01%) Median: 50.9706ms (+1.52%) [41.4517ms .. 90.3934ms]
dense
terms_few Memory: 28.8 KB (+81.74%) Avg: 8.9092ms (-2.24%) Median: 8.7143ms (-1.31%) [8.6148ms .. 10.3868ms]
terms_many Memory: 6.9 MB (-0.00%) Avg: 17.9604ms (-10.18%) Median: 17.1552ms (-11.93%) [14.8979ms .. 26.2779ms]
terms_many_top_1000 Memory: 6.9 MB Avg: 21.4963ms (-78.90%) Median: 21.2924ms (-78.98%) [18.2033ms .. 28.0087ms]
terms_many_order_by_term Memory: 6.9 MB Avg: 20.4167ms (-9.13%) Median: 19.5596ms (-11.37%) [17.5153ms .. 29.5987ms]
terms_many_with_top_hits Memory: 58.2 MB Avg: 518.4474ms (-6.41%) Median: 514.9180ms (-9.44%) [471.5550ms .. 579.0220ms]
terms_many_with_avg_sub_agg Memory: 27.8 MB Avg: 263.6702ms (-2.78%) Median: 260.8775ms (-2.55%) [239.5754ms .. 304.6669ms]
terms_many_json_mixed_type_with_avg_sub_agg Memory: 42.0 MB Avg: 299.9791ms (-2.01%) Median: 302.2180ms (-3.08%) [239.2080ms .. 346.3649ms]
terms_few_with_cardinality_agg Memory: 10.6 MB Avg: 136.3303ms (-3.12%) Median: 132.3831ms (-2.88%) [123.7564ms .. 164.7914ms]
range_agg_with_term_agg_few Memory: 47.1 KB (+37.81%) Avg: 35.4538ms (+0.66%) Median: 34.8754ms (-0.56%) [34.2287ms .. 40.0884ms]
range_agg_with_term_agg_many Memory: 6.9 MB Avg: 72.2269ms (-4.38%) Median: 66.1174ms (-4.98%) [55.5125ms .. 124.1622ms]
sparse
terms_few Memory: 27.3 KB (+69.68%) Avg: 19.6053ms (-1.15%) Median: 19.4543ms (-0.38%) [19.3056ms .. 24.0547ms]
terms_many Memory: 1.8 MB Avg: 21.2886ms (-6.28%) Median: 21.1287ms (-6.65%) [20.6640ms .. 24.6144ms]
terms_many_top_1000 Memory: 2.6 MB Avg: 23.4869ms (-85.53%) Median: 23.3393ms (-85.61%) [22.7789ms .. 25.0896ms]
terms_many_order_by_term Memory: 1.8 MB Avg: 21.7437ms (-7.78%) Median: 21.6272ms (-7.66%) [21.0409ms .. 23.6517ms]
terms_many_with_top_hits Memory: 13.1 MB Avg: 43.7926ms (-2.76%) Median: 44.3602ms (+0.01%) [37.8039ms .. 51.0451ms]
terms_many_with_avg_sub_agg Memory: 7.5 MB Avg: 34.6307ms (+3.72%) Median: 33.4522ms (+1.16%) [32.4418ms .. 41.4196ms]
terms_many_json_mixed_type_with_avg_sub_agg Memory: 7.4 MB Avg: 46.4318ms (+1.16%) Median: 46.4050ms (+2.03%) [44.5986ms .. 48.5142ms]
terms_few_with_cardinality_agg Memory: 680.0 KB (-0.04%) Avg: 35.4410ms (+2.05%) Median: 35.1384ms (+1.19%) [34.4402ms .. 39.1082ms]
range_agg_with_term_agg_few Memory: 45.7 KB (+39.44%) Avg: 22.7760ms (+0.44%) Median: 22.5152ms (-0.35%) [22.3078ms .. 26.1567ms]
range_agg_with_term_agg_many Memory: 1.8 MB Avg: 25.7696ms (-4.45%) Median: 25.4009ms (-5.61%) [24.7874ms .. 29.6434ms]
multivalue
terms_few Memory: 244.4 KB Avg: 15.1253ms (-2.85%) Median: 15.0988ms (-0.54%) [14.8790ms .. 15.8193ms]
terms_many Memory: 6.9 MB (-0.00%) Avg: 26.3019ms (-6.24%) Median: 26.3662ms (-4.94%) [21.3553ms .. 31.0564ms]
terms_many_top_1000 Memory: 6.9 MB Avg: 29.5212ms (-72.90%) Median: 29.4257ms (-72.84%) [24.2645ms .. 35.1607ms]
terms_many_order_by_term Memory: 6.9 MB Avg: 28.6076ms (-4.93%) Median: 28.1059ms (-6.64%) [24.0845ms .. 34.1493ms]
terms_many_with_top_hits Memory: 58.3 MB Avg: 570.1548ms (+1.52%) Median: 572.7759ms (+0.53%) [525.9567ms .. 617.0862ms]
terms_many_with_avg_sub_agg Memory: 27.8 MB Avg: 305.5207ms (+0.24%) Median: 296.0101ms (-0.22%) [277.8579ms .. 373.5914ms]
terms_many_json_mixed_type_with_avg_sub_agg Memory: 42.0 MB (-0.00%) Avg: 324.7342ms (-2.51%) Median: 319.0025ms (-2.58%) [298.7122ms .. 368.6144ms]
terms_few_with_cardinality_agg Memory: 10.8 MB Avg: 151.6126ms (-2.54%) Median: 149.0616ms (-0.32%) [136.5592ms .. 181.8942ms]
range_agg_with_term_agg_few Memory: 248.2 KB Avg: 49.5225ms (+3.11%) Median: 48.3994ms (+3.18%) [46.4134ms .. 60.5989ms]
range_agg_with_term_agg_many Memory: 6.9 MB Avg: 85.9824ms (-3.66%) Median: 78.4266ms (-3.85%) [64.1231ms .. 128.5279ms]
2024-07-03 12:42:59 +08:00
PSeitz
56d79cb203
fix cardinality aggregation performance ( #2446 )
...
* fix cardinality aggregation performance
fix cardinality performance by fetching multiple terms at once. This
avoids decompressing the same block and keeps the buffer state between
terms.
add cardinality aggregation benchmark
bump rust version to 1.66
Performance comparison to before (AllQuery)
```
full
cardinality_agg Memory: 3.5 MB (-0.00%) Avg: 21.2256ms (-97.78%) Median: 21.0042ms (-97.82%) [20.4717ms .. 23.6206ms]
terms_few_with_cardinality_agg Memory: 10.6 MB Avg: 81.9293ms (-97.37%) Median: 81.5526ms (-97.38%) [79.7564ms .. 88.0374ms]
dense
cardinality_agg Memory: 3.6 MB (-0.00%) Avg: 25.9372ms (-97.24%) Median: 25.7744ms (-97.25%) [24.7241ms .. 27.8793ms]
terms_few_with_cardinality_agg Memory: 10.6 MB Avg: 93.9897ms (-96.91%) Median: 92.7821ms (-96.94%) [90.3312ms .. 117.4076ms]
sparse
cardinality_agg Memory: 895.4 KB (-0.00%) Avg: 22.5113ms (-95.01%) Median: 22.5629ms (-94.99%) [22.1628ms .. 22.9436ms]
terms_few_with_cardinality_agg Memory: 680.2 KB Avg: 26.4250ms (-94.85%) Median: 26.4135ms (-94.86%) [26.3210ms .. 26.6774ms]
```
* clippy
* assert for sorted ordinals
2024-07-02 15:29:00 +08:00
Adam Reichold
4708171a32
Fix some of the things current Clippy complains about ( #2363 )
2024-04-16 04:27:06 +02:00
PSeitz
17d5869ad6
update CHANGELOG, use github API in cliff ( #2354 )
...
* update CHANGELOG, use github API in cliff
* reset version to 0.21.1, before release
* chore: Release
* remove unreleased from CHANGELOG
2024-04-15 10:07:20 +02:00
PSeitz
74940e9345
clippy ( #2349 )
...
* fix clippy
* fix clippy
* fix duplicate imports
2024-04-09 07:54:44 +02:00
trinity-1686a
9ebc5ed053
use fst for sstable index ( #2268 )
...
* read path for new fst based index
* implement BlockAddrStoreWriter
* extract slop/derivation computation
* use better linear approximator and allow negative correction to approximator
* document format and reorder some fields
* optimize single block sstable size
* plug backward compat
2023-12-04 15:13:15 +01:00
PSeitz
07573a7f19
update fst ( #2267 )
...
update fst to 0.5 (deduplicates regex-syntax in the dep tree)
deps cleanup
2023-11-21 16:06:57 +01:00
PSeitz
5a2397d57e
add sstable ord_to_term benchmark ( #2242 )
2023-11-10 07:27:48 +01:00
dependabot[bot]
22aa4daf19
Update zstd requirement from 0.12 to 0.13 ( #2214 )
...
Updates the requirements on [zstd](https://github.com/gyscos/zstd-rs ) to permit the latest version.
- [Release notes](https://github.com/gyscos/zstd-rs/releases )
- [Commits](https://github.com/gyscos/zstd-rs/compare/v0.12.0...v0.13.0 )
---
updated-dependencies:
- dependency-name: zstd
dependency-type: direct:production
...
Signed-off-by: dependabot[bot] <support@github.com >
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-10-12 04:24:44 +02:00
PSeitz
49448b31c6
chore: Release ( #2168 )
...
* chore: Release
* update CHANGELOG
2023-09-01 13:58:58 +02:00
Adam Reichold
42acd334f4
Fixes the new deny-by-default incorrect_partial_ord_impl_on_ord_type Clippy lint ( #2131 )
2023-07-21 11:36:17 +09:00
Adam Reichold
ebc78127f3
Add BytesFilterCollector to support filtering based on a bytes fast field ( #2075 )
...
* Do some Clippy- and Cargo-related boy-scouting.
* Add BytesFilterCollector to support filtering based on a bytes fast field
This is basically a copy of the existing FilterCollector but modified and
specialised to work on a bytes fast field.
* Changed semantics of filter collectors to consider multi-valued fields
2023-06-13 14:19:58 +09:00
PSeitz
e3eacb4388
release tantivy ( #2083 )
...
* prerelease
* chore: Release
2023-06-09 10:47:46 +02:00
dependabot[bot]
4be6f83b0a
Update criterion requirement from 0.4 to 0.5 ( #2056 )
...
Updates the requirements on [criterion](https://github.com/bheisler/criterion.rs ) to permit the latest version.
- [Changelog](https://github.com/bheisler/criterion.rs/blob/master/CHANGELOG.md )
- [Commits](https://github.com/bheisler/criterion.rs/compare/0.4.0...0.5.0 )
---
updated-dependencies:
- dependency-name: criterion
dependency-type: direct:production
...
Signed-off-by: dependabot[bot] <support@github.com >
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-05-24 15:59:51 +09:00
Yuri Astrakhan
74275b76a6
Inline format arguments where makes sense ( #2038 )
...
Applied this command to the code, making it a bit shorter and slightly
more readable.
```
cargo +nightly clippy --all-features --benches --tests --workspace --fix -- -A clippy::all -W clippy::uninlined_format_args
cargo +nightly fmt --all
```
2023-05-10 18:03:59 +09:00
trinity-1686a
9c93bfeb51
optimise warmup code path ( #2007 )
...
* optimise warmup code path
* better function naming
2023-04-21 11:23:09 +02:00
trinity-1686a
780e26331d
sstable compression ( #1946 )
...
* compress sstable with zstd
* add some details to sstable readme
* compress only block which benefit from it
* multiple changes to sstable
make compression optional
use OwnedBytes instead of impl Read in sstable, required for next point
use zstd bulk api, which is much faster on small records
* cleanup and use bulk api for compression
* use dedicated byte for compression
* switch block len and compression flag
* change default zstd level in sstable
2023-04-14 16:25:50 +02:00
trinity-1686a
205e8a0a92
encode dictionary type in fst footer ( #1968 )
...
* encode additional footer for dictionary kind in fst
2023-04-12 09:43:01 +02:00