tantivy

mirror of https://github.com/quickwit-oss/tantivy.git synced 2026-01-04 00:02:55 +00:00

Author	SHA1	Message	Date
PSeitz	1223a87eb2	add fuzz test for hashmap (#2310 )	2024-01-31 10:30:21 +01:00
PSeitz	5943ee46bd	Truncate keys to u16::MAX in term hashmap (#2299 ) Truncate keys to u16::MAX, instead e.g. storing 0 bytes for keys with length u16::MAX + 1 The term hashmap has a hidden API contract to only accept terms with lenght up u16::MAX.	2024-01-11 10:19:12 +01:00
PSeitz	f95a76293f	add memory arena test (#2298 ) * add memory arena test * add assert * Update stacker/src/memory_arena.rs Co-authored-by: Paul Masurel <paul@quickwit.io> --------- Co-authored-by: Paul Masurel <paul@quickwit.io>	2024-01-11 07:18:48 +01:00
PSeitz	927b4432c9	Perf: use term hashmap in fastfield (#2243 ) * add shared arena hashmap * bench fastfield indexing * use shared arena hashmap in columnar lower minimum resize in hashtable * clippy * add comments	2023-11-09 13:44:02 +01:00
PSeitz	28dd6b6546	collect json paths in indexing (#2231 ) * collect json paths in indexing * remove unsafe iter_mut_keys	2023-11-01 11:25:17 +01:00
PSeitz	19a859d6fd	term hashmap remove copy in is_empty, unused unordered_id (#2229 )	2023-10-27 05:01:32 +02:00
PSeitz	49448b31c6	chore: Release (#2168 ) * chore: Release * update CHANGELOG	2023-09-01 13:58:58 +02:00
PSeitz	480763db0d	track memory arena memory usage (#2148 )	2023-08-16 18:19:42 +02:00
Adam Reichold	ebc78127f3	Add BytesFilterCollector to support filtering based on a bytes fast field (#2075 ) * Do some Clippy- and Cargo-related boy-scouting. * Add BytesFilterCollector to support filtering based on a bytes fast field This is basically a copy of the existing FilterCollector but modified and specialised to work on a bytes fast field. * Changed semantics of filter collectors to consider multi-valued fields	2023-06-13 14:19:58 +09:00
PSeitz	e3eacb4388	release tantivy (#2083 ) * prerelease * chore: Release	2023-06-09 10:47:46 +02:00
PSeitz	27f202083c	Improve Termmap Indexing Performance +~30% (#2058 ) * update benchmark * Improve Termmap Indexing Performance +~30% This contains many small changes to improve Termmap performance. Most notably: * Specialized byte compare and equality versions, instead of glibc calls. * ExpUnrolledLinkedList to not contain inline items. Allow compare hash only via a feature flag compare_hash_only: 64bits should be enough with a good hash function to compare strings by their hashes instead of comparing the strings. Disabled by default CreateHashMap/alice/174693 time: [642.23 µs 643.80 µs 645.24 µs] thrpt: [258.20 MiB/s 258.78 MiB/s 259.41 MiB/s] change: time: [-14.429% -13.303% -12.348%] (p = 0.00 < 0.05) thrpt: [+14.088% +15.344% +16.862%] Performance has improved. CreateHashMap/alice_expull/174693 time: [877.03 µs 880.44 µs 884.67 µs] thrpt: [188.32 MiB/s 189.22 MiB/s 189.96 MiB/s] change: time: [-26.460% -26.274% -26.091%] (p = 0.00 < 0.05) thrpt: [+35.301% +35.637% +35.981%] Performance has improved. CreateHashMap/numbers_zipf/8000000 time: [9.1198 ms 9.1573 ms 9.1961 ms] thrpt: [829.64 MiB/s 833.15 MiB/s 836.57 MiB/s] change: time: [-35.229% -34.828% -34.384%] (p = 0.00 < 0.05) thrpt: [+52.403% +53.440% +54.390%] Performance has improved. * clippy * add bench for ids * inline(always) to inline whole block with bounds checks * cleanup	2023-06-08 11:13:52 +02:00
Paul Masurel	47e01b345b	Simplified linear probing code (#2066 )	2023-06-01 04:58:42 +02:00
dependabot[bot]	4be6f83b0a	Update criterion requirement from 0.4 to 0.5 (#2056 ) Updates the requirements on [criterion](https://github.com/bheisler/criterion.rs) to permit the latest version. - [Changelog](https://github.com/bheisler/criterion.rs/blob/master/CHANGELOG.md) - [Commits](https://github.com/bheisler/criterion.rs/compare/0.4.0...0.5.0) --- updated-dependencies: - dependency-name: criterion dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2023-05-24 15:59:51 +09:00
PSeitz	00c5df610c	update termmap benchmark (#2040 )	2023-05-12 07:35:06 +02:00
PSeitz	d1988be8e9	fix and extend benchmark (#2030 ) * add benchmark, add missing inlines * fix stacker bench * add wiki benchmark * move line split out of bench	2023-05-10 13:01:56 +02:00
PSeitz	d3357a8426	fix ArenaHashMap default (#2034 ) an empty ArenaHashMap is invalid and causes a panic when combined with `get`	2023-05-10 11:39:47 +02:00
Yuri Astrakhan	74275b76a6	Inline format arguments where makes sense (#2038 ) Applied this command to the code, making it a bit shorter and slightly more readable. ``` cargo +nightly clippy --all-features --benches --tests --workspace --fix -- -A clippy::all -W clippy::uninlined_format_args cargo +nightly fmt --all ```	2023-05-10 18:03:59 +09:00
tottoto	73452284ae	Remove unused crates from dependencies (#2018 ) * Remove unused crates from dependencies * Revert rand to columnar * Revert criterion to stacker	2023-05-02 12:34:20 +02:00
PSeitz	7b31100208	refactor vint (#2010 ) - improve performance of vint vint serialization shows up in performance profiles during indexing. It would also make sense to limit the value space to u29 and operate on 4 bytes only. - remove unused code - add missing inlines - fix regex test	2023-04-25 08:49:36 +02:00
PSeitz	e83abbfe4a	perf: faster term hash map (#1940 ) * add term hashmap benchmark * refactor arena hashmap add inlines remove occupied array and use table_entry.is_empty instead (saves 4 bytes per entry) reduce saturation threshold from 1/3 to 1/2 to reduce memory use u32 for UnorderedId (we have the 4billion limit anyways on the Columnar stuff) fix naming LinearProbing remove byteorder dependency memory consumption went down from 2Gb to 1.8GB on indexing wikipedia dataset in tantivy * Update stacker/src/arena_hashmap.rs Co-authored-by: Paul Masurel <paul@quickwit.io> --------- Co-authored-by: Paul Masurel <paul@quickwit.io>	2023-04-17 09:07:33 +02:00
Paul Masurel	ed5a3b3172	Bumped murmurhash version	2023-03-03 21:24:32 +09:00
Paul Masurel	097fd6138d	Fix clippy comments (#1872 )	2023-02-14 23:12:45 +09:00
Paul Masurel	60cc2644d6	Fixing test_fail_on_flush_segment_but_one_worker_remains (#1869 ) The new fast field code, based on columnar, had a larger minimum memory footprint, causing the first docuemnt to trigger a flush of the asegment in this unit test. This PR prevents the allocation of a large capacity for the different hashmap tables using in the columnar writer. Closes #1859	2023-02-14 16:09:42 +09:00
Paul Masurel	bd5eea9852	Integrated columnar work.	2023-02-09 13:14:31 +01:00
Adrien Guillo	14222a47a3	Fix typo (#1776 )	2023-01-11 00:49:13 +09:00
Paul Masurel	2a6d1eaf78	Added missing license.	2022-12-22 12:47:43 +09:00
Paul Masurel	f39165e1e7	Moving FileSlice to tantivy-common (#1729 )	2022-12-21 16:35:11 +09:00
PSeitz	f9171a3981	fix clippy (#1725 ) * fix clippy * fix clippy fastfield codecs * fix clippy bitpacker * fix clippy common * fix clippy stacker * fix clippy sstable * fmt	2022-12-20 07:30:06 +01:00
Paul Masurel	136a8f4124	Isolating sstable and stacker in independant crates. (#1718 ) Both crate will be used in the new (optional + dynamic) fastfield work.	2022-12-13 11:44:17 +09:00

29 Commits