tantivy

mirror of https://github.com/quickwit-oss/tantivy.git synced 2026-05-22 03:00:42 +00:00

Author	SHA1	Message	Date
Pascal Seitz	3cbcb5d0aa	Abstract tantivy's data storage behind traits for pluggable backends Extract trait interfaces from tantivy's core reader types so that alternative storage backends (e.g. Quickwit) can provide their own implementations while tantivy's query engine works through dynamic dispatch. Reader trait extraction: - SegmentReader is now a trait; the concrete implementation is renamed to TantivySegmentReader. - DynInvertedIndexReader trait for object-safe dynamic dispatch, plus a typed InvertedIndexReader trait with associated Postings/DocSet types for static dispatch. The concrete reader becomes TantivyInvertedIndexReader. - StoreReader is now a trait; the concrete implementation is renamed to TantivyStoreReader. get() returns TantivyDocument directly instead of requiring a generic DocumentDeserialize bound. Typed downcast for performance-critical paths: - try_downcast_and_call() + TypedInvertedIndexReaderCb allow query weights (TermWeight, PhraseWeight) to attempt a downcast to the concrete TantivyInvertedIndexReader, obtaining typed postings for zero-cost scoring, and falling back to the dynamic path otherwise. - TermScorer<TPostings> is now generic over its postings type. - PostingsWithBlockMax trait enables block-max WAND acceleration through the trait boundary. - block_wand() and block_wand_single_scorer() are generic over PostingsWithBlockMax, and for_each_pruning is dispatched through the SegmentReader trait so custom backends can provide their own block-max implementations. Searcher decoupled from Index: - New SearcherContext holds schema, executor, and tokenizers. - Searcher can be constructed from Vec<Arc<dyn SegmentReader>> via Searcher::from_segment_readers(), without needing an Index. - Searcher::index() is deprecated in favor of Searcher::context(). Postings and DocSet changes: - Postings trait gains doc_freq() -> DocFreq (Exact/Approximate) and has_freq(). - RawPostingsData struct carries raw postings bytes across the trait boundary for custom reader implementations. - BlockSegmentPostings::open() takes OwnedBytes instead of FileSlice. - DocSet gains fill_bitset() method. Scorer improvements: - Scorer trait absorbs for_each, for_each_pruning, and explain (previously free functions or on Weight). - box_scorer() helper avoids double-boxing Box<dyn Scorer>. - BoxedTermScorer wraps a type-erased term scorer. - BufferedUnionScorer initialization fixed to avoid an extra advance() on construction. Other changes: - Document::to_json() now returns serde_json::Value; the old string serialization is renamed to to_serialized_json(). - DocumentDeserialize removed from the store reader public API.	2026-03-30 12:57:53 +08:00
PSeitz	945af922d1	clippy (#2661 ) * clippy * use readable version --------- Co-authored-by: Pascal Seitz <pascal.seitz@datadoghq.com>	2025-07-02 11:25:03 +02:00
PSeitz	eea70030bf	cleanup top level exports (#2382 ) remove some top level exports	2024-05-07 09:59:41 +02:00
PSeitz	7ce950f141	add method to fetch block of first vals in columnar (#2330 ) * add method to fetch block of first vals in columnar add method to fetch block of first vals in columnar (this is way faster than single calls for full columns) add benchmark fix import warnings ``` test bench_get_block_first_on_full_column ... bench: 56 ns/iter (+/- 26) test bench_get_block_first_on_full_column_single_calls ... bench: 311 ns/iter (+/- 6) test bench_get_block_first_on_multi_column ... bench: 378 ns/iter (+/- 15) test bench_get_block_first_on_multi_column_single_calls ... bench: 546 ns/iter (+/- 13) test bench_get_block_first_on_optional_column ... bench: 291 ns/iter (+/- 6) test bench_get_block_first_on_optional_column_single_calls ... bench: 362 ns/iter (+/- 8) ``` * use remainder	2024-03-15 08:01:47 +01:00
giovannicuccu	ef603c8c7e	rename ReloadPolicy onCommit to onCommitWithDelay (#2235 ) * rename ReloadPolicy onCommit to onCommitWithDelay * fix format issues --------- Co-authored-by: Giovanni Cuccu <gcuccu@imolainformatica.it>	2023-11-03 12:22:10 +01:00
PSeitz	2d7390341c	increase min memory to 15MB for indexing (#2176 ) With tantivy 0.20 the minimum memory consumption per SegmentWriter increased to 12MB. 7MB are for the different fast field collectors types (they could be lazily created). Increase the minimum memory from 3MB to 15MB. Change memory variable naming from arena to budget. closes #2156	2023-09-13 07:38:34 +02:00
ethever.eth	ee6a7c2bbb	fix a small typo (#2165 ) Co-authored-by: famouscat <onismaa@gmail.com>	2023-08-30 20:14:26 +02:00
PSeitz	0f20787917	fix doc store cache docs (#1821 ) * fix doc store cache docs addresses an issue reported in #1820 * rename doc_store_cache_size	2023-01-23 07:06:49 +01:00
Bruce Mitchener	c2f1c250f9	doc: Remove reference to `Searcher` pool. (#1598 ) The pool of searchers was removed in `23fe73a6` as part of #1411.	2022-10-06 00:04:11 +09:00
Bruce Mitchener	cf02e32578	Improvements to doc linking, grammar, etc.	2022-09-19 18:10:22 +07:00
Bruce Mitchener	6a88ac3fe3	Documentation improvements. Fix some linking, some grammar, some typos, etc.	2022-09-18 18:05:37 +07:00
PSeitz	bb01e99e05	Fixes race condition in Searcher (#1464 ) Fixes a race condition in Searcher, by avoiding repeated calls to open_segment_readers and passing them instead as argument Closes #1461	2022-08-24 21:17:37 +09:00
Kian-Meng Ang	625bcb4877	Fix typos and markdowns Found via these commands: codespell -L crate,ser,panting,beauti,hart,ue,atleast,childs,ond,pris,hel,mot markdownlint .md doc/src/.md --disable MD013 MD025 MD033 MD001 MD024 MD036 MD041 MD003	2022-08-13 18:25:47 +08:00
PSeitz	23fe73a6c0	remove searcher pool and make Searcher cloneable (#1411 ) * remove searcher pool and make Searcher cloneable closes #1410 * use SearcherInner in InnerIndexReader	2022-07-12 18:07:48 +09:00
PSeitz	2406d9278b	allow set doc store cache size on IndexReaderBuilder (#1407 )	2022-07-06 14:40:35 +09:00
Pascal Seitz	5750224d4c	set docstore cache size at construction	2022-07-04 14:27:55 +08:00
Pascal Seitz	9db2f0e82b	expose doc store cache size expose lru doc store cache size optimize doc store cache size	2022-07-04 13:54:41 +08:00
Paul Masurel	f0a2b1cc44	Bumped tantivy and subcrate versions.	2022-05-25 22:50:33 +09:00
PSeitz	1232af7928	fix docs (#1288 )	2022-02-21 23:15:58 +09:00
Shikhar Bhushan	505e6a440c	Remove test assertion sensitive to background segment merging (#1274 )	2022-02-17 10:59:46 +09:00
Paul Masurel	2069e3e52b	Fixing clippy comments	2022-02-01 10:24:05 +09:00
Paul Masurel	eca6628b3c	Minor refactoring (#1266 )	2022-01-28 15:55:55 +09:00
Shikhar Bhushan	5a2497b6fd	Avoid exposing TrackedObject from Warmer API (#1264 )	2022-01-25 10:04:08 +09:00
Shikhar Bhushan	99d4b1a177	Searcher Warming API (#1261 ) Adds an API to register Warmers in the IndexReader. Co-authored-by: Paul Masurel <paul@quickwit.io>	2022-01-20 23:40:25 +09:00
Paul Masurel	3ea6800ac5	Pleasing clippy (#1253 )	2022-01-06 16:41:24 +09:00
Paul Masurel	ffe4446d90	Minor lint comments (#1166 )	2021-10-06 11:27:48 +09:00
Pascal Seitz	2de249af74	clippy fixes	2021-07-01 17:37:37 +02:00
Paul Masurel	316d65d7c6	removed deprecated compare_and_swap	2021-03-09 10:30:02 +09:00
Paul Masurel	eef348004e	Closes #930 Minor bug. Watch callback could be callback if the last watch handle was dropped shortly before meta.json is called.	2020-11-11 15:51:23 +09:00
Paul Masurel	9cc1661ce2	Updating crossbeam (#909 )	2020-10-13 10:55:50 +09:00
Paul Masurel	c23a03ad81	Large API Change in the Directory API. (#901 ) Tantivy used to assume that all files could be somehow memory mapped. After this change, Directory return a `FileSlice` that can be reduced and eventually read into an `OwnedBytes` object. Long and blocking io operation are still required by they do not span over the entire file.	2020-10-08 16:36:51 +09:00
Paul Masurel	0b583b8130	Plastic changes	2020-08-22 21:29:12 +09:00
lyj	5300cb5da0	Update mod.rs (#845 )	2020-07-01 10:25:26 +09:00
Paul Masurel	c55db83609	Closes #805 (#820 ) Added TryInto implementation for IndexReaderBuilder	2020-04-27 12:01:17 +09:00
Paul Masurel	7d6cfa58e1	[WIP] Alternative take on boosted queries (#772 ) * Alternative take on boosted queries * Fixing unit test * Added boosting to the query grammar. * Made BoostQuery public. * Added support for boosting field in QueryParser Closes #547	2020-02-19 11:04:38 +09:00
Paul Masurel	72f7cc1569	Closes #777 (#779 )	2020-02-17 09:53:38 +09:00
Paul Masurel	ae14022bf0	Removed `use::Result`. (#771 )	2020-01-31 18:47:02 +09:00
Paul Masurel	42756c7474	Removing futures-cpupool and upgrading to futures-0.3	2019-11-15 18:35:31 +09:00
Paul Masurel	280ea1209c	Changes required for python binding (#610 )	2019-08-01 17:26:21 +09:00
Paul Masurel	efd1af1325	Closes #544 . (#607 ) Prepare for release 0.10.1	2019-07-30 13:38:06 +09:00
Paul Masurel	462774b15c	Tiqb feature/2018 (#583 ) * rust 2018 * Added CHANGELOG comment	2019-07-01 10:01:46 +09:00
Paul Masurel	663dd89c05	Feature/reader (#517 ) Adding IndexReader to the API. Making it possible to watch for changes. * Closes #500	2019-03-20 08:39:22 +09:00

42 Commits