tantivy

rust/tantivy

Fork 0

mirror of https://github.com/quickwit-oss/tantivy.git synced 2026-01-04 08:12:54 +00:00

Commit Graph

Author	SHA1	Message	Date
PSeitz	27f202083c	Improve Termmap Indexing Performance +~30% (#2058 ) * update benchmark * Improve Termmap Indexing Performance +~30% This contains many small changes to improve Termmap performance. Most notably: * Specialized byte compare and equality versions, instead of glibc calls. * ExpUnrolledLinkedList to not contain inline items. Allow compare hash only via a feature flag compare_hash_only: 64bits should be enough with a good hash function to compare strings by their hashes instead of comparing the strings. Disabled by default CreateHashMap/alice/174693 time: [642.23 µs 643.80 µs 645.24 µs] thrpt: [258.20 MiB/s 258.78 MiB/s 259.41 MiB/s] change: time: [-14.429% -13.303% -12.348%] (p = 0.00 < 0.05) thrpt: [+14.088% +15.344% +16.862%] Performance has improved. CreateHashMap/alice_expull/174693 time: [877.03 µs 880.44 µs 884.67 µs] thrpt: [188.32 MiB/s 189.22 MiB/s 189.96 MiB/s] change: time: [-26.460% -26.274% -26.091%] (p = 0.00 < 0.05) thrpt: [+35.301% +35.637% +35.981%] Performance has improved. CreateHashMap/numbers_zipf/8000000 time: [9.1198 ms 9.1573 ms 9.1961 ms] thrpt: [829.64 MiB/s 833.15 MiB/s 836.57 MiB/s] change: time: [-35.229% -34.828% -34.384%] (p = 0.00 < 0.05) thrpt: [+52.403% +53.440% +54.390%] Performance has improved. * clippy * add bench for ids * inline(always) to inline whole block with bounds checks * cleanup	2023-06-08 11:13:52 +02:00
PSeitz	00c5df610c	update termmap benchmark (#2040 )	2023-05-12 07:35:06 +02:00
PSeitz	e83abbfe4a	perf: faster term hash map (#1940 ) * add term hashmap benchmark * refactor arena hashmap add inlines remove occupied array and use table_entry.is_empty instead (saves 4 bytes per entry) reduce saturation threshold from 1/3 to 1/2 to reduce memory use u32 for UnorderedId (we have the 4billion limit anyways on the Columnar stuff) fix naming LinearProbing remove byteorder dependency memory consumption went down from 2Gb to 1.8GB on indexing wikipedia dataset in tantivy * Update stacker/src/arena_hashmap.rs Co-authored-by: Paul Masurel <paul@quickwit.io> --------- Co-authored-by: Paul Masurel <paul@quickwit.io>	2023-04-17 09:07:33 +02:00

Author

SHA1

Message

Date

PSeitz

27f202083c

Improve Termmap Indexing Performance +~30% (#2058 )

* update benchmark

* Improve Termmap Indexing Performance +~30%

This contains many small changes to improve Termmap performance.
Most notably:
* Specialized byte compare and equality versions, instead of glibc calls.
* ExpUnrolledLinkedList to not contain inline items.

Allow compare hash only via a feature flag compare_hash_only:
64bits should be enough with a good hash function to compare strings by
their hashes instead of comparing the strings. Disabled by default

CreateHashMap/alice/174693
                        time:   [642.23 µs 643.80 µs 645.24 µs]
                        thrpt:  [258.20 MiB/s 258.78 MiB/s 259.41 MiB/s]
                 change:
                        time:   [-14.429% -13.303% -12.348%] (p = 0.00 < 0.05)
                        thrpt:  [+14.088% +15.344% +16.862%]
                        Performance has improved.
CreateHashMap/alice_expull/174693
                        time:   [877.03 µs 880.44 µs 884.67 µs]
                        thrpt:  [188.32 MiB/s 189.22 MiB/s 189.96 MiB/s]
                 change:
                        time:   [-26.460% -26.274% -26.091%] (p = 0.00 < 0.05)
                        thrpt:  [+35.301% +35.637% +35.981%]
                        Performance has improved.
CreateHashMap/numbers_zipf/8000000
                        time:   [9.1198 ms 9.1573 ms 9.1961 ms]
                        thrpt:  [829.64 MiB/s 833.15 MiB/s 836.57 MiB/s]
                 change:
                        time:   [-35.229% -34.828% -34.384%] (p = 0.00 < 0.05)
                        thrpt:  [+52.403% +53.440% +54.390%]
                        Performance has improved.

* clippy

* add bench for ids

* inline(always) to inline whole block with bounds checks

* cleanup

2023-06-08 11:13:52 +02:00

PSeitz

00c5df610c

update termmap benchmark (#2040 )

2023-05-12 07:35:06 +02:00

PSeitz

e83abbfe4a

perf: faster term hash map (#1940 )

* add term hashmap benchmark

* refactor arena hashmap

add inlines
remove occupied array and use table_entry.is_empty instead (saves 4 bytes per entry)
reduce saturation threshold from 1/3 to 1/2 to reduce memory
use u32 for UnorderedId (we have the 4billion limit anyways on the Columnar stuff)
fix naming LinearProbing
remove byteorder dependency

memory consumption went down from 2Gb to 1.8GB on indexing wikipedia dataset in tantivy

* Update stacker/src/arena_hashmap.rs

Co-authored-by: Paul Masurel <paul@quickwit.io>

---------

Co-authored-by: Paul Masurel <paul@quickwit.io>

2023-04-17 09:07:33 +02:00

3 Commits