Files
tantivy/stacker/Performance.md
PSeitz 27f202083c Improve Termmap Indexing Performance +~30% (#2058)
* update benchmark

* Improve Termmap Indexing Performance +~30%

This contains many small changes to improve Termmap performance.
Most notably:
* Specialized byte compare and equality versions, instead of glibc calls.
* ExpUnrolledLinkedList to not contain inline items.

Allow compare hash only via a feature flag compare_hash_only:
64bits should be enough with a good hash function to compare strings by
their hashes instead of comparing the strings. Disabled by default

CreateHashMap/alice/174693
                        time:   [642.23 µs 643.80 µs 645.24 µs]
                        thrpt:  [258.20 MiB/s 258.78 MiB/s 259.41 MiB/s]
                 change:
                        time:   [-14.429% -13.303% -12.348%] (p = 0.00 < 0.05)
                        thrpt:  [+14.088% +15.344% +16.862%]
                        Performance has improved.
CreateHashMap/alice_expull/174693
                        time:   [877.03 µs 880.44 µs 884.67 µs]
                        thrpt:  [188.32 MiB/s 189.22 MiB/s 189.96 MiB/s]
                 change:
                        time:   [-26.460% -26.274% -26.091%] (p = 0.00 < 0.05)
                        thrpt:  [+35.301% +35.637% +35.981%]
                        Performance has improved.
CreateHashMap/numbers_zipf/8000000
                        time:   [9.1198 ms 9.1573 ms 9.1961 ms]
                        thrpt:  [829.64 MiB/s 833.15 MiB/s 836.57 MiB/s]
                 change:
                        time:   [-35.229% -34.828% -34.384%] (p = 0.00 < 0.05)
                        thrpt:  [+52.403% +53.440% +54.390%]
                        Performance has improved.

* clippy

* add bench for ids

* inline(always) to inline whole block with bounds checks

* cleanup
2023-06-08 11:13:52 +02:00

1.0 KiB

Notes

  • extend_from_slice(&key) calls memcpy, which is relatively slow, since most keys are relatively short. For now there's a specialized version toavoid memcpy calls. Wild copy 16 bytes in a loop is faster, but would require a guard against overflow from the caller side. (We probably can do that).
  • Comparing two slices of unknown length calls memcmp. Same as above, we can do a specialized version.

fastcmp and fastcpy both employ the same trick, to compare slices of odd length, e.g. 2 operations unconditional on 4 bytes, instead 3 operations with conditionals (1 4byte, 1 2byte, 1 1byte). [1, 2, 3, 4, 5, 6, 7] [1, 2, 3, 4] [4, 5, 6, 7]

  • Since the hashmap writes the values on every key insert/update, the values like expull should be small. Therefore inlining of the values has been removed.
  • Currently the first call to Expull will get a capacity of 0. It would be beneficial if it could be initialized with some memory, so that the first call doesn't have to allocate. But that would mean we don't have Default impls.