PSeitz
4c52499622
clippy ( #2549 )
2024-11-29 16:08:21 +08:00
dependabot[bot]
c66af2c0a9
Update binggan requirement from 0.12.0 to 0.14.0 ( #2530 )
...
* Update binggan requirement from 0.12.0 to 0.14.0
---
updated-dependencies:
- dependency-name: binggan
dependency-type: direct:production
...
Signed-off-by: dependabot[bot] <support@github.com >
* fix build
---------
Signed-off-by: dependabot[bot] <support@github.com >
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Pascal Seitz <pascal.seitz@gmail.com >
2024-10-24 09:41:35 +08:00
PSeitz
21d057059e
clippy ( #2527 )
...
* clippy
* clippy
* clippy
* clippy
* convert allow to expect and remove unused
* cargo fmt
* cleanup
* export sample
* clippy
2024-10-22 09:26:54 +08:00
PSeitz
7b65ad922d
use binggan for stacker bench ( #2492 )
...
* use binggan for stacker bench
```
alice (num terms: 174693)
hashmap Memory: 1.3 MB Avg: 367.19 MiB/s (-1.34%) Median: 368.10 MiB/s (-1.34%) [378.75 MiB/s .. 352.81 MiB/s]
hasmap with postings Memory: 2.4 MB Avg: 237.29 MiB/s (-2.19%) Median: 240.22 MiB/s (-1.61%) [248.26 MiB/s .. 210.66 MiB/s]
fxhashmap ref postings Memory: 2.9 MB Avg: 171.94 MiB/s (-3.22%) Median: 174.13 MiB/s (-2.69%) [185.94 MiB/s .. 152.43 MiB/s]
fxhasmap owned postings Memory: 3.5 MB Avg: 96.993 MiB/s (-4.20%) Median: 97.410 MiB/s (-4.48%) [102.78 MiB/s .. 82.745 MiB/s]
numbers unique 100k
hashmap Memory: 5.2 MB Avg: 334.17 MiB/s (-3.06%) Median: 352.61 MiB/s (+0.77%) [362.60 MiB/s .. 213.03 MiB/s]
hasmap with postings Memory: 6.3 MB Avg: 316.96 MiB/s (-0.02%) Median: 325.16 MiB/s (-0.04%) [338.36 MiB/s .. 218.60 MiB/s]
zipfs numbers 100k
hashmap Memory: 1.3 MB Avg: 1.2342 GiB/s (+2.87%) Median: 1.2677 GiB/s (+4.66%) [1.3130 GiB/s .. 915.93 MiB/s]
hasmap with postings Memory: 2.4 MB Avg: 485.16 MiB/s (+2.68%) Median: 494.70 MiB/s (+4.42%) [505.31 MiB/s .. 413.14 MiB/s]
numbers unique 1mio
hashmap Memory: 35.7 MB Avg: 169.68 MiB/s (-1.08%) Median: 166.80 MiB/s (-3.87%) [201.33 MiB/s .. 154.26 MiB/s]
hasmap with postings Memory: 39.8 MB Avg: 149.49 MiB/s (-3.07%) Median: 150.85 MiB/s (-1.45%) [160.76 MiB/s .. 130.94 MiB/s]
zipfs numbers 1mio
hashmap Memory: 1.3 MB Avg: 1.2185 GiB/s (-2.33%) Median: 1.2291 GiB/s (-2.33%) [1.2905 GiB/s .. 1.0742 GiB/s]
hasmap with postings Memory: 5.5 MB Avg: 358.43 MiB/s (-11.63%) Median: 356.95 MiB/s (-12.85%) [444.94 MiB/s .. 302.46 MiB/s]
numbers unique 2mio
hashmap Memory: 70.3 MB Avg: 163.65 MiB/s (+8.37%) Median: 162.83 MiB/s (+8.80%) [190.20 MiB/s .. 144.70 MiB/s]
hasmap with postings Memory: 78.6 MB Avg: 148.00 MiB/s (+7.75%) Median: 151.53 MiB/s (+9.11%) [166.92 MiB/s .. 120.09 MiB/s]
zipfs numbers 2mio
hashmap Memory: 1.3 MB Avg: 1.2535 GiB/s (+2.59%) Median: 1.2654 GiB/s (+0.36%) [1.2938 GiB/s .. 1.0592 GiB/s]
hasmap with postings Memory: 9.7 MB Avg: 377.96 MiB/s (-4.94%) Median: 381.82 MiB/s (-3.67%) [426.14 MiB/s .. 335.66 MiB/s]
numbers unique 5mio
hashmap Memory: 277.9 MB Avg: 121.30 MiB/s (+2.00%) Median: 121.99 MiB/s (+2.99%) [132.51 MiB/s .. 110.32 MiB/s]
hasmap with postings Memory: 295.7 MB Avg: 114.23 MiB/s (+2.13%) Median: 115.26 MiB/s (+2.94%) [124.08 MiB/s .. 103.38 MiB/s]
zipfs numbers 5mio
hashmap Memory: 1.3 MB Avg: 1.2326 GiB/s (+0.63%) Median: 1.2400 GiB/s (+0.71%) [1.2755 GiB/s .. 1.0923 GiB/s]
hasmap with postings Memory: 25.4 MB Avg: 360.49 MiB/s (+1.07%) Median: 363.44 MiB/s (+1.27%) [404.88 MiB/s .. 300.38 MiB/s]
```
* rename bench
* update binggan
* rename to HASHMAP_CAPACITY
2024-10-16 11:41:33 +08:00
Bruce Mitchener
c17e513377
Reduce typo count. ( #2510 )
2024-10-10 09:55:37 +08:00
trinity-1686a
85395d942a
fix clippy lints from 1.80-1.81 ( #2488 )
...
* fix some clippy lints
* fix clippy::doc_lazy_continuation
* fix some lints for 1.82
2024-09-05 14:33:05 +02:00
PSeitz
56d79cb203
fix cardinality aggregation performance ( #2446 )
...
* fix cardinality aggregation performance
fix cardinality performance by fetching multiple terms at once. This
avoids decompressing the same block and keeps the buffer state between
terms.
add cardinality aggregation benchmark
bump rust version to 1.66
Performance comparison to before (AllQuery)
```
full
cardinality_agg Memory: 3.5 MB (-0.00%) Avg: 21.2256ms (-97.78%) Median: 21.0042ms (-97.82%) [20.4717ms .. 23.6206ms]
terms_few_with_cardinality_agg Memory: 10.6 MB Avg: 81.9293ms (-97.37%) Median: 81.5526ms (-97.38%) [79.7564ms .. 88.0374ms]
dense
cardinality_agg Memory: 3.6 MB (-0.00%) Avg: 25.9372ms (-97.24%) Median: 25.7744ms (-97.25%) [24.7241ms .. 27.8793ms]
terms_few_with_cardinality_agg Memory: 10.6 MB Avg: 93.9897ms (-96.91%) Median: 92.7821ms (-96.94%) [90.3312ms .. 117.4076ms]
sparse
cardinality_agg Memory: 895.4 KB (-0.00%) Avg: 22.5113ms (-95.01%) Median: 22.5629ms (-94.99%) [22.1628ms .. 22.9436ms]
terms_few_with_cardinality_agg Memory: 680.2 KB Avg: 26.4250ms (-94.85%) Median: 26.4135ms (-94.86%) [26.3210ms .. 26.6774ms]
```
* clippy
* assert for sorted ordinals
2024-07-02 15:29:00 +08:00
PSeitz
17d5869ad6
update CHANGELOG, use github API in cliff ( #2354 )
...
* update CHANGELOG, use github API in cliff
* reset version to 0.21.1, before release
* chore: Release
* remove unreleased from CHANGELOG
2024-04-15 10:07:20 +02:00
PSeitz
74940e9345
clippy ( #2349 )
...
* fix clippy
* fix clippy
* fix duplicate imports
2024-04-09 07:54:44 +02:00
PSeitz
1e9fc51535
update ahash ( #2344 )
2024-04-09 06:35:39 +02:00
PSeitz
1223a87eb2
add fuzz test for hashmap ( #2310 )
2024-01-31 10:30:21 +01:00
PSeitz
5943ee46bd
Truncate keys to u16::MAX in term hashmap ( #2299 )
...
Truncate keys to u16::MAX, instead e.g. storing 0 bytes for keys with length u16::MAX + 1
The term hashmap has a hidden API contract to only accept terms with lenght up u16::MAX.
2024-01-11 10:19:12 +01:00
PSeitz
f95a76293f
add memory arena test ( #2298 )
...
* add memory arena test
* add assert
* Update stacker/src/memory_arena.rs
Co-authored-by: Paul Masurel <paul@quickwit.io >
---------
Co-authored-by: Paul Masurel <paul@quickwit.io >
2024-01-11 07:18:48 +01:00
PSeitz
927b4432c9
Perf: use term hashmap in fastfield ( #2243 )
...
* add shared arena hashmap
* bench fastfield indexing
* use shared arena hashmap in columnar
lower minimum resize in hashtable
* clippy
* add comments
2023-11-09 13:44:02 +01:00
PSeitz
28dd6b6546
collect json paths in indexing ( #2231 )
...
* collect json paths in indexing
* remove unsafe iter_mut_keys
2023-11-01 11:25:17 +01:00
PSeitz
19a859d6fd
term hashmap remove copy in is_empty, unused unordered_id ( #2229 )
2023-10-27 05:01:32 +02:00
PSeitz
49448b31c6
chore: Release ( #2168 )
...
* chore: Release
* update CHANGELOG
2023-09-01 13:58:58 +02:00
PSeitz
480763db0d
track memory arena memory usage ( #2148 )
2023-08-16 18:19:42 +02:00
Adam Reichold
ebc78127f3
Add BytesFilterCollector to support filtering based on a bytes fast field ( #2075 )
...
* Do some Clippy- and Cargo-related boy-scouting.
* Add BytesFilterCollector to support filtering based on a bytes fast field
This is basically a copy of the existing FilterCollector but modified and
specialised to work on a bytes fast field.
* Changed semantics of filter collectors to consider multi-valued fields
2023-06-13 14:19:58 +09:00
PSeitz
e3eacb4388
release tantivy ( #2083 )
...
* prerelease
* chore: Release
2023-06-09 10:47:46 +02:00
PSeitz
27f202083c
Improve Termmap Indexing Performance +~30% ( #2058 )
...
* update benchmark
* Improve Termmap Indexing Performance +~30%
This contains many small changes to improve Termmap performance.
Most notably:
* Specialized byte compare and equality versions, instead of glibc calls.
* ExpUnrolledLinkedList to not contain inline items.
Allow compare hash only via a feature flag compare_hash_only:
64bits should be enough with a good hash function to compare strings by
their hashes instead of comparing the strings. Disabled by default
CreateHashMap/alice/174693
time: [642.23 µs 643.80 µs 645.24 µs]
thrpt: [258.20 MiB/s 258.78 MiB/s 259.41 MiB/s]
change:
time: [-14.429% -13.303% -12.348%] (p = 0.00 < 0.05)
thrpt: [+14.088% +15.344% +16.862%]
Performance has improved.
CreateHashMap/alice_expull/174693
time: [877.03 µs 880.44 µs 884.67 µs]
thrpt: [188.32 MiB/s 189.22 MiB/s 189.96 MiB/s]
change:
time: [-26.460% -26.274% -26.091%] (p = 0.00 < 0.05)
thrpt: [+35.301% +35.637% +35.981%]
Performance has improved.
CreateHashMap/numbers_zipf/8000000
time: [9.1198 ms 9.1573 ms 9.1961 ms]
thrpt: [829.64 MiB/s 833.15 MiB/s 836.57 MiB/s]
change:
time: [-35.229% -34.828% -34.384%] (p = 0.00 < 0.05)
thrpt: [+52.403% +53.440% +54.390%]
Performance has improved.
* clippy
* add bench for ids
* inline(always) to inline whole block with bounds checks
* cleanup
2023-06-08 11:13:52 +02:00
Paul Masurel
47e01b345b
Simplified linear probing code ( #2066 )
2023-06-01 04:58:42 +02:00
dependabot[bot]
4be6f83b0a
Update criterion requirement from 0.4 to 0.5 ( #2056 )
...
Updates the requirements on [criterion](https://github.com/bheisler/criterion.rs ) to permit the latest version.
- [Changelog](https://github.com/bheisler/criterion.rs/blob/master/CHANGELOG.md )
- [Commits](https://github.com/bheisler/criterion.rs/compare/0.4.0...0.5.0 )
---
updated-dependencies:
- dependency-name: criterion
dependency-type: direct:production
...
Signed-off-by: dependabot[bot] <support@github.com >
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-05-24 15:59:51 +09:00
PSeitz
00c5df610c
update termmap benchmark ( #2040 )
2023-05-12 07:35:06 +02:00
PSeitz
d1988be8e9
fix and extend benchmark ( #2030 )
...
* add benchmark, add missing inlines
* fix stacker bench
* add wiki benchmark
* move line split out of bench
2023-05-10 13:01:56 +02:00
PSeitz
d3357a8426
fix ArenaHashMap default ( #2034 )
...
an empty ArenaHashMap is invalid and causes a panic when combined with `get`
2023-05-10 11:39:47 +02:00
Yuri Astrakhan
74275b76a6
Inline format arguments where makes sense ( #2038 )
...
Applied this command to the code, making it a bit shorter and slightly
more readable.
```
cargo +nightly clippy --all-features --benches --tests --workspace --fix -- -A clippy::all -W clippy::uninlined_format_args
cargo +nightly fmt --all
```
2023-05-10 18:03:59 +09:00
tottoto
73452284ae
Remove unused crates from dependencies ( #2018 )
...
* Remove unused crates from dependencies
* Revert rand to columnar
* Revert criterion to stacker
2023-05-02 12:34:20 +02:00
PSeitz
7b31100208
refactor vint ( #2010 )
...
- improve performance of vint
vint serialization shows up in performance profiles during indexing.
It would also make sense to limit the value space to u29 and operate on 4 bytes only.
- remove unused code
- add missing inlines
- fix regex test
2023-04-25 08:49:36 +02:00
PSeitz
e83abbfe4a
perf: faster term hash map ( #1940 )
...
* add term hashmap benchmark
* refactor arena hashmap
add inlines
remove occupied array and use table_entry.is_empty instead (saves 4 bytes per entry)
reduce saturation threshold from 1/3 to 1/2 to reduce memory
use u32 for UnorderedId (we have the 4billion limit anyways on the Columnar stuff)
fix naming LinearProbing
remove byteorder dependency
memory consumption went down from 2Gb to 1.8GB on indexing wikipedia dataset in tantivy
* Update stacker/src/arena_hashmap.rs
Co-authored-by: Paul Masurel <paul@quickwit.io >
---------
Co-authored-by: Paul Masurel <paul@quickwit.io >
2023-04-17 09:07:33 +02:00
Paul Masurel
ed5a3b3172
Bumped murmurhash version
2023-03-03 21:24:32 +09:00
Paul Masurel
097fd6138d
Fix clippy comments ( #1872 )
2023-02-14 23:12:45 +09:00
Paul Masurel
60cc2644d6
Fixing test_fail_on_flush_segment_but_one_worker_remains ( #1869 )
...
The new fast field code, based on columnar, had a larger minimum memory
footprint, causing the first docuemnt to trigger a flush of the asegment
in this unit test.
This PR prevents the allocation of a large capacity for the different hashmap tables
using in the columnar writer.
Closes #1859
2023-02-14 16:09:42 +09:00
Paul Masurel
bd5eea9852
Integrated columnar work.
2023-02-09 13:14:31 +01:00
Adrien Guillo
14222a47a3
Fix typo ( #1776 )
2023-01-11 00:49:13 +09:00
Paul Masurel
2a6d1eaf78
Added missing license.
2022-12-22 12:47:43 +09:00
Paul Masurel
f39165e1e7
Moving FileSlice to tantivy-common ( #1729 )
2022-12-21 16:35:11 +09:00
PSeitz
f9171a3981
fix clippy ( #1725 )
...
* fix clippy
* fix clippy fastfield codecs
* fix clippy bitpacker
* fix clippy common
* fix clippy stacker
* fix clippy sstable
* fmt
2022-12-20 07:30:06 +01:00
Paul Masurel
136a8f4124
Isolating sstable and stacker in independant crates. ( #1718 )
...
Both crate will be used in the new (optional + dynamic) fastfield work.
2022-12-13 11:44:17 +09:00