tantivy

mirror of https://github.com/quickwit-oss/tantivy.git synced 2026-06-03 00:50:41 +00:00

Author	SHA1	Message	Date
Stu Hood	7183ac6cbc	fix: Use smaller buffers during merging (#71 ) `MergeOptimizedInvertedIndexReader` was added in #32 in order to avoid making small reads to our underlying `FileHandle`. It did so by reading the entire content of the posting lists and positions at open time. As that PR says: > A likely downside to this approach is that now pg_search will be, indirectly, holding onto a lot of heap-allocated memory that was read from its block storage. Perhaps in the (near) future we can further optimize the new `MergeOptimizedInvertedIndexReader` such that it pages in blocks of a few megabytes at a time, on demand, rather than the whole file. This PR makes that change. But it additionally removes code that was later added in #47 to borrow individual entries rather than creating `OwnedBytes` for them. I believe that this code was added due to a misunderstanding: `OwnedBytes` is a total misnomer: the bytes are not "owned": they are immutably borrowed and reference counted. An `OwnedBytes` object can be created for any type which derefs to a slice of bytes, and can be cheaply cloned and sliced. So there is no need to actually borrow _or_ copy the buffer under the `OwnedBytes`. Removing the code that was doing so allows us to safely recreate our buffer without worrying about the lifetimes of buffers that we've handed out.	2025-12-10 10:17:28 -08:00
Ming	f9e4a8413b	make the directory `BufWriter` capacity configurable (#52 )	2025-12-10 10:17:27 -08:00
Eric Ridge	936d6af471	feat: ability to directly merge segments in the foregound (#36 ) This adds new public-facing (and internal) APIs for being able to merge a list of segments in the foreground, without using any threads. It's largely a cut-n-paste of the existing background merge code. For pg_search, this is beneficial because it allows us to merge directly using our `MVCCDirectory` rather than going through the `ChannelDirectory`, which has quite a bit of overhead.	2025-12-10 10:17:26 -08:00
Eric Ridge	75a8384c2b	feat: remove `Directory::reconsider_merge_policy()` and add other niceties to Directory API (#33 ) This removes `Directory::reconsider_merge_policy()`. After reconsidering this, it's better to make this decision ahead of time. Also adds a `Directory::log(message: &str)` function along with passing a `Directory` reference to `MergePolicy::compute_merge_candidates()`. It also hits some `#[derive(Debug)]` and `#[derive(Serialize)]` on a couple of structs that can benefit.	2025-12-10 10:17:26 -08:00
Eric Ridge	8b7db36c99	feat: Add `Directory::wants_cancel()` function (#31 ) This adds a function named `wants_cancel() -> bool` to the `Directory` trait. It allows a Directory implementation to indicate that it would like Tantivy to cancel an operation. Right now, querying this function only happens during key points of index merging, but _could_ be used in other places. Technically, segment merging is the only "black box" in tantivy that users don't otherwise have the direct ability to control. The default implementaiton of `wants_cancel()` returns false, so there's no fear of default tantivy spuriously cancelling a merge. The cancels happen "cleanly" such that if `wants_cancel()` returns true an `Err(TantivyError::Cancelled)` is returned from the calling function at that point, and the error result will be propogated up the stack. No panics are raised.	2025-12-10 10:17:26 -08:00
Eric Ridge	eabe589814	feat: ability to assign a panic handler to a Directory (#30 ) Tantivy creates thread pools for some of its background work, specifically committing and merging. It's possible if one of the thread workers panics that rayon will simply abort the process. This is terrible from pg_search as that takes down the entire Postgres cluster. These changes allow a Directory to assign a panic handler that gets called in such cases. Which allows pg_search to gracefully rollback the current transaction, while presenting the panic message to the user.	2025-12-10 10:17:26 -08:00
Eric Ridge	65d3574dfd	feat: make garbage collection opt-out (#28 )	2025-12-10 10:17:26 -08:00
Eric Ridge	16da31cf06	perf: make the footer fixed width (#23 ) Prior to this commit, the Footer tantivy serialized at the end of every file included a json blob that could be an arbitrary size. This changes the Footer to be exactly 24 bytes (6 u32s), making sure to keep the `crc` value. The other change we make here is to not actually read/validate the footer bytes when opening a file. From pg_search's perspective, this is quite unnecessary Postgres buffer cache I/O and increases index/segment opening overhead, which is something pg_search does often for each query. Two tests are ignored here as they test physical index files stored here in the repo that this change completely breaks.	2025-12-10 10:17:25 -08:00
Neil Hansen	a35e3dcb5a	suppress warnings after rebase	2025-12-10 10:17:25 -08:00
Ming Ying	2584325e0d	add reconsider_merge_policy to directory	2025-12-10 10:17:24 -08:00
Eric Ridge	91db6909d1	Add a `payload: &mut (dyn Any + '_)` argument to `Directory::save_meta()` (#17 )	2025-12-10 10:17:24 -08:00
Ming Ying	8d29f19110	make save_metas provide previous metas	2025-12-10 10:17:24 -08:00
Ming Ying	3adc85c017	Directory trait can read/write meta/managed	2025-12-10 10:17:24 -08:00
Ming	5503cfb8ef	Fix managed paths (#5 )	2025-12-10 10:17:23 -08:00
PSeitz-dd	c6912ce89a	Handle JSON fields and columnar in space_usage (#2761 ) return field names in space_usage instead of `Field` more detailed info for columns	2025-12-10 20:33:33 +08:00
Ang	08a92675dc	Fix typos again (#2753 ) Found via `codespell -S benches,stopwords.rs -L womens,parth,abd,childs,ond,ser,ue,mot,hel,atleast,pris,claus,allo`	2025-12-01 12:15:41 +01:00
PSeitz-dd	2340dca628	fix compiler warnings (#2699 ) * fix compiler warnings * fix import	2025-09-19 15:55:04 +02:00
Paul M.	f2c77f06c5	Update fs4 to latest (0.13.1) (#2654 ) - One change was needed to handle the `Result<bool>` that now returns from `try_lock_exclusive` Co-authored-by: Paul M. <prov223@tutanota.com>	2025-07-14 11:26:19 +08:00
PSeitz	945af922d1	clippy (#2661 ) * clippy * use readable version --------- Co-authored-by: Pascal Seitz <pascal.seitz@datadoghq.com>	2025-07-02 11:25:03 +02:00
Pascal Seitz	e7daf69de9	use usize in bitpacker use usize in bitpacker to enable larger columns in the columnar store Godbolt comparison with u32 vs u64 for get access: https://godbolt.org/z/cjf7nenYP Add a mini-tool to inspect columnar files created by tantivy. (very basic functionality which can be extended later)	2025-02-20 15:39:10 +01:00
PSeitz	21d057059e	clippy (#2527 ) * clippy * clippy * clippy * clippy * convert allow to expect and remove unused * cargo fmt * cleanup * export sample * clippy	2024-10-22 09:26:54 +08:00
trinity-1686a	85395d942a	fix clippy lints from 1.80-1.81 (#2488 ) * fix some clippy lints * fix clippy::doc_lazy_continuation * fix some lints for 1.82	2024-09-05 14:33:05 +02:00
Hamir Mahal	0c634adbe1	style: simplify strings with string interpolation (#2412 ) * style: simplify strings with string interpolation * fix: formatting	2024-05-27 09:16:47 +02:00
PSeitz	74940e9345	clippy (#2349 ) * fix clippy * fix clippy * fix duplicate imports	2024-04-09 07:54:44 +02:00
PSeitz	79b041f81f	clippy (#2314 )	2024-02-13 05:56:31 +01:00
PSeitz	48630ceec9	move into new index module (#2259 ) move core modules to index module	2024-01-31 10:30:04 +01:00
Adam Reichold	72002e8a89	Make test builds Clippy clean. (#2277 )	2024-01-31 02:47:06 +01:00
trinity-1686a	7a0064db1f	bump index version (#2237 ) * bump index version and add constant for lowest supported version * use range instead of handcoded bounds	2023-11-06 19:02:37 +01:00
giovannicuccu	ef603c8c7e	rename ReloadPolicy onCommit to onCommitWithDelay (#2235 ) * rename ReloadPolicy onCommit to onCommitWithDelay * fix format issues --------- Co-authored-by: Giovanni Cuccu <gcuccu@imolainformatica.it>	2023-11-03 12:22:10 +01:00
PSeitz	bf6544cf28	fix mmap::Advice reexport (#2230 )	2023-10-27 14:09:25 +09:00
PSeitz	c2b0469180	improve docs, rework exports (#2220 ) * rework exports move snippet and advice make indexer pub, remove indexer reexports * add deprecation warning * add architecture overview	2023-10-18 09:22:24 +02:00
Harrison Burt	1c7c6fd591	POC: Tantivy documents as a trait (#2071 ) * fix windows build (#1) * Fix windows build * Add doc traits * Add field value iter * Add value and serialization * Adjust order * Fix bug * Correct type * Fix generic bugs * Reformat code * Add generic to index writer which I forgot about * Fix missing generics on single segment writer * Add missing type export * Add default methods for convenience * Cleanup * Fix more-like-this query to use standard types * Update API and fix tests * Add doc traits * Add field value iter * Add value and serialization * Adjust order * Fix bug * Correct type * Rebase main and fix conflicts * Reformat code * Merge upstream * Fix missing generics on single segment writer * Add missing type export * Add default methods for convenience * Cleanup * Fix more-like-this query to use standard types * Update API and fix tests * Add tokenizer improvements from previous commits * Add tokenizer improvements from previous commits * Reformat * Fix unit tests * Fix unit tests * Use enum in changes * Stage changes * Add new deserializer logic * Add serializer integration * Add document deserializer * Implement new (de)serialization api for existing types * Fix bugs and type errors * Add helper implementations * Fix errors * Reformat code * Add unit tests and some code organisation for serialization * Add unit tests to deserializer * Add some small docs * Add support for deserializing serde values * Reformat * Fix typo * Fix typo * Change repr of facet * Remove unused trait methods * Add child value type * Resolve comments * Fix build * Fix more build errors * Fix more build errors * Fix the tests I missed * Fix examples * fix numerical order, serialize PreTok Str * fix coverage * rename Document to TantivyDocument, rename DocumentAccess to Document add Binary prefix to binary de/serialization * fix coverage --------- Co-authored-by: Pascal Seitz <pascal.seitz@gmail.com>	2023-10-02 10:01:16 +02:00
ethever.eth	90586bc1e2	chore: remove unused Seek impl for Writers (#2187 ) (#2189 ) Co-authored-by: famouscat <onismaa@gmail.com>	2023-09-26 17:03:28 +09:00
Adam Reichold	c805f08ca7	Fix a few more upcoming Clippy lints (#2133 )	2023-07-24 17:07:57 +09:00
PSeitz	44850e1036	move fail dep to dev only (#2094 ) wasm compilation fails with dep only	2023-06-22 06:59:11 +02:00
Harrison Burt	7220df8a09	Fix building on windows with mmap (#2070 ) * Fix windows build * Make pub * Update docs * Re arrange * Fix compilation error on unix * Fix unix borrows * Revert "Fix unix borrows" This reverts commit `c1d94fd12b`. * Fix unix borrows and revert original change * Fix warning * Cleaner code. --------- Co-authored-by: Paul Masurel <paul@quickwit.io>	2023-06-10 18:32:39 +02:00
Paul Masurel	fe3ecf9567	Added support for madvise (#2036 ) Added support for madvise	2023-05-11 05:39:17 +02:00
Yuri Astrakhan	74275b76a6	Inline format arguments where makes sense (#2038 ) Applied this command to the code, making it a bit shorter and slightly more readable. ``` cargo +nightly clippy --all-features --benches --tests --workspace --fix -- -A clippy::all -W clippy::uninlined_format_args cargo +nightly fmt --all ```	2023-05-10 18:03:59 +09:00
PSeitz	80df1d9835	Handle error for exists on MMapDirectory (#1988 ) `exists` will return false in case of other io errors, like permission denied	2023-04-25 09:20:33 +02:00
Till Wegmüller	1a35f6573d	Switch fs2 to fs4 as it is now unmaintained and does not support illumos (#1944 ) Signed-off-by: Till Wegmueller <toasterson@gmail.com>	2023-03-22 13:48:49 +09:00
PSeitz	9e2faecf5b	add memory limit for aggregations (#1942 ) * add memory limit for aggregations introduce AggregationLimits to set memory consumption limit and bucket limits memory limit is checked during aggregation, bucket limit is checked before returning the aggregation request. * Apply suggestions from code review Co-authored-by: Paul Masurel <paul@quickwit.io> * add ByteCount with human readable format --------- Co-authored-by: Paul Masurel <paul@quickwit.io>	2023-03-16 06:21:07 +01:00
Paul Masurel	097fd6138d	Fix clippy comments (#1872 )	2023-02-14 23:12:45 +09:00
Yukun Guo	dfe4e95fde	Make index compatible with virtual drives on Windows (#1843 ) * Make index compatible with virtual drives on Windows * Get rid of normpath	2023-02-14 16:41:48 +09:00
Paul Masurel	bd5eea9852	Integrated columnar work.	2023-02-09 13:14:31 +01:00
Adrien Guillo	14222a47a3	Fix typo (#1776 )	2023-01-11 00:49:13 +09:00
Adam Reichold	2080c370c2	Enable usage of FuzzyTermQuery for specific fields via QueryParser (#1750 ) * Make nightly Clippy mostly happy. * Document how to produce TermSetQuery queries using QueryParser. * Enable construction of queries using FuzzyTermQuery via the QueryParser * Use FxHashMap instead of HashMap in the QueryParser as these hash tables are not exposed to DoS attacks. * Use a struct instead of a tuple to improve readability.	2023-01-04 18:11:27 +09:00
Paul Masurel	f39165e1e7	Moving FileSlice to tantivy-common (#1729 )	2022-12-21 16:35:11 +09:00
Paul Masurel	32cb1d22da	Removed AsyncIoResult. (#1728 )	2022-12-21 16:01:17 +09:00
PSeitz	f9171a3981	fix clippy (#1725 ) * fix clippy * fix clippy fastfield codecs * fix clippy bitpacker * fix clippy common * fix clippy stacker * fix clippy sstable * fmt	2022-12-20 07:30:06 +01:00
Pascal Seitz	38ad46e580	fix clippy	2022-11-07 16:09:55 +08:00

1 2 3 4 5 ...

300 Commits