Commit Graph

10 Commits

Author SHA1 Message Date
PSeitz
e83abbfe4a perf: faster term hash map (#1940)
* add term hashmap benchmark

* refactor arena hashmap

add inlines
remove occupied array and use table_entry.is_empty instead (saves 4 bytes per entry)
reduce saturation threshold from 1/3 to 1/2 to reduce memory
use u32 for UnorderedId (we have the 4billion limit anyways on the Columnar stuff)
fix naming LinearProbing
remove byteorder dependency

memory consumption went down from 2Gb to 1.8GB on indexing wikipedia dataset in tantivy

* Update stacker/src/arena_hashmap.rs

Co-authored-by: Paul Masurel <paul@quickwit.io>

---------

Co-authored-by: Paul Masurel <paul@quickwit.io>
2023-04-17 09:07:33 +02:00
Paul Masurel
ed5a3b3172 Bumped murmurhash version 2023-03-03 21:24:32 +09:00
Paul Masurel
097fd6138d Fix clippy comments (#1872) 2023-02-14 23:12:45 +09:00
Paul Masurel
60cc2644d6 Fixing test_fail_on_flush_segment_but_one_worker_remains (#1869)
The new fast field code, based on columnar, had a larger minimum memory
footprint, causing the first docuemnt to trigger a flush of the asegment
in this unit test.

This PR prevents the allocation of a large capacity for the different hashmap tables
using in the columnar writer.

Closes #1859
2023-02-14 16:09:42 +09:00
Paul Masurel
bd5eea9852 Integrated columnar work. 2023-02-09 13:14:31 +01:00
Adrien Guillo
14222a47a3 Fix typo (#1776) 2023-01-11 00:49:13 +09:00
Paul Masurel
2a6d1eaf78 Added missing license. 2022-12-22 12:47:43 +09:00
Paul Masurel
f39165e1e7 Moving FileSlice to tantivy-common (#1729) 2022-12-21 16:35:11 +09:00
PSeitz
f9171a3981 fix clippy (#1725)
* fix clippy

* fix clippy fastfield codecs

* fix clippy bitpacker

* fix clippy common

* fix clippy stacker

* fix clippy sstable

* fmt
2022-12-20 07:30:06 +01:00
Paul Masurel
136a8f4124 Isolating sstable and stacker in independant crates. (#1718)
Both crate will be used in the new (optional + dynamic) fastfield work.
2022-12-13 11:44:17 +09:00