tantivy

mirror of https://github.com/quickwit-oss/tantivy.git synced 2026-07-03 15:50:44 +00:00

Author	SHA1	Message	Date
Ming	384f31d350	feat: Restore index sorting (#2959 ) We ([ParadeDB](https://github.com/paradedb/paradedb)) have restored and been using the removed [index sorting](https://github.com/quickwit-oss/tantivy/issues/2352) feature in our Tantivy fork. Our use case is sorting the index by Postgres' internal `ctid` identifier. Results returned from Tantivy must be checked against Postgres' visibility map, and checking them in ctid order is much more cache friendly, resulting in up to 80% speedups for certain queries. This PR is split into 5 commits, corresponding to the index sorting reversal plus bug fixes we uncovered during our usage of index sorting. \| Commit \| Maps to \| What it does \| \|---\|---\|---\| \| `2aea0ad9f` \| foundation ([#104](https://github.com/paradedb/tantivy/pull/104)) \| Restore `SegmentComponent::TempStore` (revert of upstream #2815). Subsumes fork PR [#104](https://github.com/paradedb/tantivy/pull/104)'s CI fix. \| \| `9205bcb0c` \| [#92](https://github.com/paradedb/tantivy/pull/92) \| Restore sort-by-field (single-segment + merge paths). \| \| `39c790f0f` \| [#101](https://github.com/paradedb/tantivy/pull/101) \| Enable `sort_by` for `Str`/`Bytes` fast fields. \| \| `9c4341a87` \| [#105](https://github.com/paradedb/tantivy/pull/105) \| Native typed numeric sort-key comparison (precision/NULL fix). \| \| `2d9ba2418` \| [#106](https://github.com/paradedb/tantivy/pull/106) \| Preserve NULL ordering in numeric segment merges. \| We have discussed with the Tantivy maintainers and they indicated they would be open to this PR. Another motivation for landing this PR is we are planning on contributing a significant refactor that makes Tantivy's segment components extensible, and landing that without index sorting leads to too many conflicts.	2026-06-22 11:22:25 -07:00
Paul Masurel	545169c0d8	Composite agg merge (#2856 ) Add composite aggregation Co-authored-by: Remi Dettai <remi.dettai@sekoia.io> Co-authored-by: Paul Masurel <paul.masurel@datadoghq.com>	2026-03-18 17:28:59 +01:00
PSeitz	57fe659fff	make serializer pub (#2835 ) some changes on the posting list serializer to make it usable in other contexts. Improve errors Signed-off-by: Pascal Seitz <pascal.seitz@gmail.com>	2026-02-11 14:37:42 +01:00
Paul Masurel	0ae94baef5	Remove temp file (#2815 ) Co-authored-by: Paul Masurel <paul.masurel@datadoghq.com>	2026-01-27 09:22:11 +01:00
Paul Masurel	77505c3d03	Making stemming optional. (#2791 ) Fixed code and CI to run on no default features. Co-authored-by: Paul Masurel <paul.masurel@datadoghq.com>	2026-01-02 12:40:42 +01:00
PSeitz-dd	c6912ce89a	Handle JSON fields and columnar in space_usage (#2761 ) return field names in space_usage instead of `Field` more detailed info for columns	2025-12-10 20:33:33 +08:00
Ang	08a92675dc	Fix typos again (#2753 ) Found via `codespell -S benches,stopwords.rs -L womens,parth,abd,childs,ond,ser,ue,mot,hel,atleast,pris,claus,allo`	2025-12-01 12:15:41 +01:00
PSeitz-dd	2340dca628	fix compiler warnings (#2699 ) * fix compiler warnings * fix import	2025-09-19 15:55:04 +02:00
Paul M.	f2c77f06c5	Update fs4 to latest (0.13.1) (#2654 ) - One change was needed to handle the `Result<bool>` that now returns from `try_lock_exclusive` Co-authored-by: Paul M. <prov223@tutanota.com>	2025-07-14 11:26:19 +08:00
PSeitz	945af922d1	clippy (#2661 ) * clippy * use readable version --------- Co-authored-by: Pascal Seitz <pascal.seitz@datadoghq.com>	2025-07-02 11:25:03 +02:00
Pascal Seitz	e7daf69de9	use usize in bitpacker use usize in bitpacker to enable larger columns in the columnar store Godbolt comparison with u32 vs u64 for get access: https://godbolt.org/z/cjf7nenYP Add a mini-tool to inspect columnar files created by tantivy. (very basic functionality which can be extended later)	2025-02-20 15:39:10 +01:00
PSeitz	21d057059e	clippy (#2527 ) * clippy * clippy * clippy * clippy * convert allow to expect and remove unused * cargo fmt * cleanup * export sample * clippy	2024-10-22 09:26:54 +08:00
trinity-1686a	85395d942a	fix clippy lints from 1.80-1.81 (#2488 ) * fix some clippy lints * fix clippy::doc_lazy_continuation * fix some lints for 1.82	2024-09-05 14:33:05 +02:00
Hamir Mahal	0c634adbe1	style: simplify strings with string interpolation (#2412 ) * style: simplify strings with string interpolation * fix: formatting	2024-05-27 09:16:47 +02:00
PSeitz	74940e9345	clippy (#2349 ) * fix clippy * fix clippy * fix duplicate imports	2024-04-09 07:54:44 +02:00
PSeitz	79b041f81f	clippy (#2314 )	2024-02-13 05:56:31 +01:00
PSeitz	48630ceec9	move into new index module (#2259 ) move core modules to index module	2024-01-31 10:30:04 +01:00
Adam Reichold	72002e8a89	Make test builds Clippy clean. (#2277 )	2024-01-31 02:47:06 +01:00
trinity-1686a	7a0064db1f	bump index version (#2237 ) * bump index version and add constant for lowest supported version * use range instead of handcoded bounds	2023-11-06 19:02:37 +01:00
giovannicuccu	ef603c8c7e	rename ReloadPolicy onCommit to onCommitWithDelay (#2235 ) * rename ReloadPolicy onCommit to onCommitWithDelay * fix format issues --------- Co-authored-by: Giovanni Cuccu <gcuccu@imolainformatica.it>	2023-11-03 12:22:10 +01:00
PSeitz	bf6544cf28	fix mmap::Advice reexport (#2230 )	2023-10-27 14:09:25 +09:00
PSeitz	c2b0469180	improve docs, rework exports (#2220 ) * rework exports move snippet and advice make indexer pub, remove indexer reexports * add deprecation warning * add architecture overview	2023-10-18 09:22:24 +02:00
Harrison Burt	1c7c6fd591	POC: Tantivy documents as a trait (#2071 ) * fix windows build (#1) * Fix windows build * Add doc traits * Add field value iter * Add value and serialization * Adjust order * Fix bug * Correct type * Fix generic bugs * Reformat code * Add generic to index writer which I forgot about * Fix missing generics on single segment writer * Add missing type export * Add default methods for convenience * Cleanup * Fix more-like-this query to use standard types * Update API and fix tests * Add doc traits * Add field value iter * Add value and serialization * Adjust order * Fix bug * Correct type * Rebase main and fix conflicts * Reformat code * Merge upstream * Fix missing generics on single segment writer * Add missing type export * Add default methods for convenience * Cleanup * Fix more-like-this query to use standard types * Update API and fix tests * Add tokenizer improvements from previous commits * Add tokenizer improvements from previous commits * Reformat * Fix unit tests * Fix unit tests * Use enum in changes * Stage changes * Add new deserializer logic * Add serializer integration * Add document deserializer * Implement new (de)serialization api for existing types * Fix bugs and type errors * Add helper implementations * Fix errors * Reformat code * Add unit tests and some code organisation for serialization * Add unit tests to deserializer * Add some small docs * Add support for deserializing serde values * Reformat * Fix typo * Fix typo * Change repr of facet * Remove unused trait methods * Add child value type * Resolve comments * Fix build * Fix more build errors * Fix more build errors * Fix the tests I missed * Fix examples * fix numerical order, serialize PreTok Str * fix coverage * rename Document to TantivyDocument, rename DocumentAccess to Document add Binary prefix to binary de/serialization * fix coverage --------- Co-authored-by: Pascal Seitz <pascal.seitz@gmail.com>	2023-10-02 10:01:16 +02:00
ethever.eth	90586bc1e2	chore: remove unused Seek impl for Writers (#2187 ) (#2189 ) Co-authored-by: famouscat <onismaa@gmail.com>	2023-09-26 17:03:28 +09:00
Adam Reichold	c805f08ca7	Fix a few more upcoming Clippy lints (#2133 )	2023-07-24 17:07:57 +09:00
PSeitz	44850e1036	move fail dep to dev only (#2094 ) wasm compilation fails with dep only	2023-06-22 06:59:11 +02:00
Harrison Burt	7220df8a09	Fix building on windows with mmap (#2070 ) * Fix windows build * Make pub * Update docs * Re arrange * Fix compilation error on unix * Fix unix borrows * Revert "Fix unix borrows" This reverts commit `c1d94fd12b`. * Fix unix borrows and revert original change * Fix warning * Cleaner code. --------- Co-authored-by: Paul Masurel <paul@quickwit.io>	2023-06-10 18:32:39 +02:00
Paul Masurel	fe3ecf9567	Added support for madvise (#2036 ) Added support for madvise	2023-05-11 05:39:17 +02:00
Yuri Astrakhan	74275b76a6	Inline format arguments where makes sense (#2038 ) Applied this command to the code, making it a bit shorter and slightly more readable. ``` cargo +nightly clippy --all-features --benches --tests --workspace --fix -- -A clippy::all -W clippy::uninlined_format_args cargo +nightly fmt --all ```	2023-05-10 18:03:59 +09:00
PSeitz	80df1d9835	Handle error for exists on MMapDirectory (#1988 ) `exists` will return false in case of other io errors, like permission denied	2023-04-25 09:20:33 +02:00
Till Wegmüller	1a35f6573d	Switch fs2 to fs4 as it is now unmaintained and does not support illumos (#1944 ) Signed-off-by: Till Wegmueller <toasterson@gmail.com>	2023-03-22 13:48:49 +09:00
PSeitz	9e2faecf5b	add memory limit for aggregations (#1942 ) * add memory limit for aggregations introduce AggregationLimits to set memory consumption limit and bucket limits memory limit is checked during aggregation, bucket limit is checked before returning the aggregation request. * Apply suggestions from code review Co-authored-by: Paul Masurel <paul@quickwit.io> * add ByteCount with human readable format --------- Co-authored-by: Paul Masurel <paul@quickwit.io>	2023-03-16 06:21:07 +01:00
Paul Masurel	097fd6138d	Fix clippy comments (#1872 )	2023-02-14 23:12:45 +09:00
Yukun Guo	dfe4e95fde	Make index compatible with virtual drives on Windows (#1843 ) * Make index compatible with virtual drives on Windows * Get rid of normpath	2023-02-14 16:41:48 +09:00
Paul Masurel	bd5eea9852	Integrated columnar work.	2023-02-09 13:14:31 +01:00
Adrien Guillo	14222a47a3	Fix typo (#1776 )	2023-01-11 00:49:13 +09:00
Adam Reichold	2080c370c2	Enable usage of FuzzyTermQuery for specific fields via QueryParser (#1750 ) * Make nightly Clippy mostly happy. * Document how to produce TermSetQuery queries using QueryParser. * Enable construction of queries using FuzzyTermQuery via the QueryParser * Use FxHashMap instead of HashMap in the QueryParser as these hash tables are not exposed to DoS attacks. * Use a struct instead of a tuple to improve readability.	2023-01-04 18:11:27 +09:00
Paul Masurel	f39165e1e7	Moving FileSlice to tantivy-common (#1729 )	2022-12-21 16:35:11 +09:00
Paul Masurel	32cb1d22da	Removed AsyncIoResult. (#1728 )	2022-12-21 16:01:17 +09:00
PSeitz	f9171a3981	fix clippy (#1725 ) * fix clippy * fix clippy fastfield codecs * fix clippy bitpacker * fix clippy common * fix clippy stacker * fix clippy sstable * fmt	2022-12-20 07:30:06 +01:00
Pascal Seitz	38ad46e580	fix clippy	2022-11-07 16:09:55 +08:00
Paul Masurel	07393c2fa0	Attempt to fix race condition in test. (#1619 ) Close #1550	2022-10-14 10:56:37 +09:00
PSeitz	2100ec5d26	Merge pull request #1593 from waywardmonkeys/doc-improvements Documentation improvements.	2022-10-05 15:50:08 +08:00
Bruce Mitchener	b3bf9a5716	Documentation improvements.	2022-10-05 14:18:10 +07:00
Paul Masurel	0dc8c458e0	Flaky unit test. (#1592 )	2022-10-05 16:15:48 +09:00
Bruce Mitchener	a24ae8d924	clippy: Fix needless-borrow warnings. (#1581 ) These show on nightly clippy.	2022-10-03 14:15:09 +09:00
Bruce Mitchener	f842da758c	Move ArcBytes,WeakArcBytes to mmap_directory. (#1555 ) When building without default features (so without mmap, etc), there are some warnings about unused things. This fixes the ones related to `ArcBytes` and `WeakArcBytes`, which are only used with the `mmap_directory` code.	2022-09-27 09:57:28 +09:00
Bruce Mitchener	cb252a42af	docs: "associated to" -> "associated with" (#1557 ) This reads better this way.	2022-09-26 20:23:37 +09:00
Bruce Mitchener	d9609dd6b6	POLLING_INTERVAL needn't be pub. (#1556 ) This is only used within the file watcher and is const, so it can't be configured.	2022-09-26 20:22:55 +09:00
trinity-1686a	fa3d786a2f	Add support for deleting all documents matching query (#1535 ) * add support for deleting all documents matching query #1494	2022-09-22 21:26:09 +09:00

1 2 3 4 5 ...

291 Commits