tantivy

mirror of https://github.com/quickwit-oss/tantivy.git synced 2026-05-13 23:00:42 +00:00

Author	SHA1	Message	Date
Pascal Seitz	d1555fe9f8	SegmentReader as trait	2026-01-27 13:56:40 +01:00
Paul Masurel	b86caeefe2	Major bugfix in intersection A bug was added with the `seek_into_the_danger_zone()` optimization (Spotted and fixed by Stu) The contract says seek_into_the_danger_zone returns true if do is part of the docset. The blanket implementation goes like this. ``` let current_doc = self.doc(); if current_doc < target { self.seek(target); } self.doc() == target ``` So it will return true if target is TERMINATED, where really TERMINATED does not belong to the docset. The fix tries to clarify the contracts and fixes the intersection algorithm. We observe a small but all over the board improvement in intersection performance. --------- Co-authored-by: Stu Hood <stuhood@gmail.com> Co-authored-by: Paul Masurel <paul.masurel@datadoghq.com>	2026-01-23 18:44:10 +01:00
ChangRui-Ryan	abf1e64f4d	add benchmark for string search and get (#2795 )	2026-01-19 11:50:41 +01:00
trinity-1686a	12977bc7c4	upgrade some dependancies (#2802 ) including rand, which had a few breaking changes	2026-01-14 10:19:09 +01:00
trinity-1686a	0c94eb94c3	Merge pull request #2799 from jollygreenlaser/lru	2026-01-13 22:47:35 +01:00
Paul Masurel	c92e831dde	Minor refactoring in PostingsSerializer (#2801 ) Removes the Write generics argument in PostingsSerializer. This removes useless generic. Prepares the path for codecs. Removes one useless CountingWrite layer. etc. Co-authored-by: Paul Masurel <paul.masurel@datadoghq.com>	2026-01-12 13:53:43 +01:00
Alex Lazar	947c0d5f40	Bump lru to 0.16.3 per dependabot	2026-01-09 23:25:51 -08:00
Paul Masurel	d904630e6a	Bumped bitpacking version (#2797 ) Co-authored-by: Paul Masurel <paul.masurel@datadoghq.com>	2026-01-08 15:50:22 +01:00
PSeitz-dd	65b5a1a306	one collector per agg request instead per bucket (#2759 ) * improve bench * add more tests for new collection type * one collector per agg request instead per bucket In this refactoring a collector knows in which bucket of the parent their data is in. This allows to convert the previous approach of one collector per bucket to one collector per request. low card bucket optimization * reduce dynamic dispatch, faster term agg * use radix map, fix prepare_max_bucket use paged term map in term agg use special no sub agg term map impl * specialize columntype in stats * remove stacktrace bloat, use &mut helper increase cache to 2048 * cleanup remove clone move data in term req, single doc opt for stats * add comment * share column block accessor * simplify fetch block in column_block_accessor * split subaggcache into two trait impls * move partitions to heap * fix name, add comment --------- Co-authored-by: Pascal Seitz <pascal.seitz@gmail.com>	2026-01-06 11:50:55 +01:00
ChangRui-Ryan	db2ecc6057	fix Column.first method parameter type (#2792 )	2026-01-05 10:03:01 +01:00
Paul Masurel	77505c3d03	Making stemming optional. (#2791 ) Fixed code and CI to run on no default features. Co-authored-by: Paul Masurel <paul.masurel@datadoghq.com>	2026-01-02 12:40:42 +01:00
PSeitz	735c588f4f	fix union performance regression (#2790 ) * add inlines * fix union performance regression Remove unwrap from hotpath generates better assembly. closes #2788	2026-01-02 12:06:51 +01:00
PSeitz	242a1531bf	fix flaky test (#2784 ) Signed-off-by: Pascal Seitz <pascal.seitz@gmail.com>	2026-01-02 11:30:51 +01:00
trinity-1686a	6443b63177	document 1bit hole and some queries supporting running with just fastfield (#2779 ) * add small doc on some queries using fast field when not indexed * document 1 unused bit in skiplist	2026-01-02 10:32:37 +01:00
Stu Hood	4987495ee4	Add an erased `SortKeyComputer` to sort on types which are not known until runtime (#2770 ) * Remove PartialOrd bound on compared values. * Fix declared `SortKey` type of `impl<..> SortKeyComputer for (HeadSortKeyComputer, TailSortKeyComputer)` * Add a SortByOwnedValue implementation to provide a type-erased column. * Add support for comparing mismatched `OwnedValue` types. * Support JSON columns. * Refer to https://github.com/quickwit-oss/tantivy/issues/2776 * Rename to `SortByErasedType`. * Comment on transitivity. Co-authored-by: Paul Masurel <paul@quickwit.io> * Fix clippy warnings in new code. --------- Co-authored-by: Paul Masurel <paul@quickwit.io>	2026-01-02 10:28:47 +01:00
Paul Masurel	b11605f045	Addressing clippy comments (#2789 ) Co-authored-by: Paul Masurel <paul.masurel@datadoghq.com>	2025-12-31 18:02:00 +01:00
ChangRui-Ryan	75d7989cc6	add benchmark for boolean query with range sub query (#2787 )	2025-12-31 12:00:53 +01:00
PSeitz	923f0508f2	seek_exact + cost based intersection (#2538 ) * seek_exact + cost based intersection Adds `seek_exact` and `cost` to `DocSet` for a more efficient intersection. Unlike `seek`, `seek_exact` does not require the DocSet to advance to the next hit, if the target does not exist. `cost` allows to address the different DocSet types and their cost model and is used to determine the DocSet that drives the intersection. E.g. fast field range queries may do a full scan. Phrase queries load the positions to check if a we have a hit. They both have a higher cost than their size_hint would suggest. Improves `size_hint` estimation for intersection and union, by having a estimation based on random distribution with a co-location factor. Refactor range query benchmark. Closes #2531 Future Work Implement `seek_exact` for BufferedUnionScorer and RangeDocSet (fast field range queries) Evaluate replacing `seek` with `seek_exact` to reduce code complexity * Apply suggestions from code review Co-authored-by: Paul Masurel <paul@quickwit.io> * add API contract verfication * impl seek_exact on union * rename seek_exact * add mixed AND OR test, fix buffered_union * Add a proptest of BooleanQuery. (#2690) * fix build * Increase the document count. * fix merge conflict * fix debug assert * Fix compilation errors after rebase - Remove duplicate proptest_boolean_query module - Remove duplicate cost() method implementations - Fix TopDocs API usage (add .order_by_score()) - Remove duplicate imports - Remove unused variable assignments --------- Co-authored-by: Paul Masurel <paul@quickwit.io> Co-authored-by: Pascal Seitz <pascal.seitz@datadoghq.com> Co-authored-by: Stu Hood <stuhood@gmail.com>	2025-12-30 14:43:25 +01:00
ChangRui-Ryan	e0b62e00ac	optimize RangeDocSet for non-overlapping query ranges (#2783 )	2025-12-29 16:55:28 +01:00
Stu Hood	ce97beb86f	Add support for natural-order-with-none-highest in `TopDocs::order_by` (#2780 ) * Add `ComparatorEnum::NaturalNoneHigher`. * Fix comments.	2025-12-23 09:22:20 +01:00
Stu Hood	c0f21a45ae	Use a strict comparison in TopNComputer (#2777 ) * Remove `(Partial)Ord` from `ComparableDoc`, and unify comparison between `TopNComputer` and `Comparator`. * Doc cleanups. * Require Ord for `ComparableDoc`. * Semantics are actually _ascending_ DocId order. * Adjust docs again for ascending DocId order. * minor change --------- Co-authored-by: Paul Masurel <paul.masurel@datadoghq.com>	2025-12-18 12:13:23 +01:00
Moe	73657dff77	fix: fixed integer overflow in ExpUnrolledLinkedList for large datasets (#2735 ) * Fixed the overflow issue. * Fixed lint issues. * Applied PR fixes. * Fixed a lint issue.	2025-12-16 22:57:12 +01:00
Moe	e3c9be1f92	fix: boolean query incorrectly dropping documents when AllScorer is present (#2760 ) * Fixed the range issue. * Fixed the second all scorer issue * Improved docs + tests * Improved code. * Fixed lint issues. * Improved tests + logic based on PR comments. * Fixed lint issues. * Increase the document count. * Improved the prop-tests * Expand the index size, and remove unused parameter. --------- Co-authored-by: Stu Hood <stuhood@gmail.com>	2025-12-16 22:52:02 +01:00
Ming	ba61ed6ef3	fix: vint buffer can overflow (#2778 ) * fix vint overflow * comment	2025-12-16 22:50:41 +01:00
trinity-1686a	d0e1600135	fix bug with minimum_should_match and AllScorer (#2774 )	2025-12-14 10:10:45 +01:00
PSeitz-dd	e9020d17d4	fix coverage (#2769 )	2025-12-11 11:35:58 +01:00
PSeitz-dd	5ba0031f7d	move rand_distr to dev_dep (#2772 )	2025-12-11 18:23:50 +08:00
Philippe Noël	22dde8f9ae	chore: Make some delete-related functions public (#46 ) (#2766 ) Co-authored-by: Ming <ming.ying.nyc@gmail.com>	2025-12-11 01:22:15 +01:00
Philippe Noël	14cc24614e	Make DeleteMeta pub (#2765 ) Co-authored-by: Ming Ying <ming.ying.nyc@gmail.com>	2025-12-11 00:11:03 +01:00
Philippe Noël	8a1079b2dc	expose AddOperation and with_max_doc (#7 ) (#2762 ) Co-authored-by: Ming <ming.ying.nyc@gmail.com>	2025-12-11 00:10:42 +01:00
Philippe Noël	794ff1ffc9	chore: Make `Language` hashable (#79 ) (#2763 ) Co-authored-by: Ming <ming.ying.nyc@gmail.com>	2025-12-10 15:38:43 +01:00
PSeitz-dd	c6912ce89a	Handle JSON fields and columnar in space_usage (#2761 ) return field names in space_usage instead of `Field` more detailed info for columns	2025-12-10 20:33:33 +08:00
PSeitz	618e3bd11b	Term and IndexingTerm cleanup (#2750 ) * refactor term * add deprecated functions --------- Co-authored-by: Pascal Seitz <pascal.seitz@datadoghq.com>	2025-12-05 09:48:40 +08:00
PSeitz	b2f99c6217	add term->histogram benchmark (#2758 ) * add term->histogram benchmark * add more term aggs --------- Co-authored-by: Pascal Seitz <pascal.seitz@datadoghq.com>	2025-12-04 02:29:37 +01:00
PSeitz	76de5bab6f	fix unsafe warnings (#2757 )	2025-12-03 20:15:21 +08:00
rustmailer	b7eb31162b	docs: add usage example to README (#2743 )	2025-12-02 21:56:57 +01:00
Paul Masurel	63c66005db	Lazy scorers (#2726 ) * Refactoring of the score tweaker into `SortKeyComputer`s to unlock two features. - Allow lazy evaluation of score. As soon as we identified that a doc won't reach the topK threshold, we can stop the evaluation. - Allow for a different segment level score, segment level score and their conversion. This PR breaks public API, but fixing code is straightforward. * Bumping tantivy version --------- Co-authored-by: Paul Masurel <paul.masurel@datadoghq.com>	2025-12-01 15:38:57 +01:00
Paul Masurel	7d513a44c5	Added some benchmark for top K by a fast field (#2754 ) Also removed query parsing from the bench code. Co-authored-by: Paul Masurel <paul.masurel@datadoghq.com>	2025-12-01 14:58:29 +01:00
Stu Hood	ca87fcd454	Implement `collect_block` for `Collector`s which wrap other `Collector`s (#2727 ) * Implement `collect_block` for tuple Collectors, and for MultiCollector. * Two more.	2025-12-01 12:26:29 +01:00
Ang	08a92675dc	Fix typos again (#2753 ) Found via `codespell -S benches,stopwords.rs -L womens,parth,abd,childs,ond,ser,ue,mot,hel,atleast,pris,claus,allo`	2025-12-01 12:15:41 +01:00
Raphaël Cohen	f7f4b354d6	fix: Handle phrase prefixed with star (#2751 ) Signed-off-by: Darkheir <raphael.cohen@sekoia.io>	2025-12-01 11:43:25 +01:00
Paul Masurel	25d44fcec8	Revert "remove unused columnar api (#2742 )" (#2748 ) * Revert "remove unused columnar api (#2742)" This reverts commit `8725594d47`. * Clippy comment + removing fill_vals --------- Co-authored-by: Paul Masurel <paul.masurel@datadoghq.com>	2025-11-26 17:44:02 +01:00
PSeitz-dd	842fe9295f	split Term in Term and IndexingTerm (#2744 ) * split Term in Term and IndexingTerm * add append_json_path to JsonTermSerializer	2025-11-26 16:48:59 +01:00
Paul Masurel	f88b7200b2	Optimization when posting list are saturated. (#2745 ) * Optimization when posting list are saturated. If a posting list doc freq is the segment reader's max_doc, and if scoring does not matter, we can replace it by a AllScorer. In turn, in a boolean query, we can dismiss all scorers and empty scorers, to accelerate the request. * Added range query optimization * CR comment * CR comments * CR comment --------- Co-authored-by: Paul Masurel <paul.masurel@datadoghq.com>	2025-11-26 15:50:57 +01:00
PSeitz-dd	8725594d47	remove unused columnar api (#2742 )	2025-11-21 18:07:25 +01:00
PSeitz	43a784671a	clippy (#2741 ) Co-authored-by: Pascal Seitz <pascal.seitz@datadoghq.com>	2025-11-21 18:07:03 +01:00
Paul Masurel	c363bbd23d	Optimize term aggregation with low cardinality + some refactoring (#2740 ) This introduce an optimization of top level term aggregation on field with a low cardinality. We then use a Vec as the underlying map. In addition, we buffer subaggregations. --------- Co-authored-by: Pascal Seitz <pascal.seitz@datadoghq.com> Co-authored-by: Paul Masurel <paul@quickwit.io>	2025-11-21 14:46:29 +01:00
Moe	70e591e230	feat: added filter aggregation (#2711 ) * Initial impl * Added `Filter` impl in `build_single_agg_segment_collector_with_reader` + Added tests * Added `Filter(FilterBucketResult)` + Made tests work. * Fixed type issues. * Fixed a test. * 8a7a73a: Pass `segment_reader` * Added more tests. * Improved parsing + tests * refactoring * Added more tests. * refactoring: moved parsing code under QueryParser * Use Tantivy syntax instead of ES * Added a sanity check test. * Simplified impl + tests * Added back tests in a more maintable way * nitz. * nitz * implemented very simple fast-path * improved a comment * implemented fast field support * Used `BoundsRange` * Improved fast field impl + tests * Simplified execution. * Fixed exports + nitz * Improved the tests to check to the expected result. * Improved test by checking the whole result JSON * Removed brittle perf checks. * Added efficiency verification tests. * Added one more efficiency check test. * Improved the efficiency tests. * Removed unnecessary parsing code + added direct Query obj * Fixed tests. * Improved tests * Fixed code structure * Fixed lint issues * nitz. * nitz * nitz. * nitz. * nitz. * Added an example * Fixed PR comments. * Applied PR comments + nitz * nitz. * Improved the code. * Fixed a perf issue. * Added batch processing. * Made the example more interesting * Fixed bucket count * Renamed Direct to CustomQuery * Fixed lint issues. * No need for scorer to be an `Option` * nitz * Used BitSet * Added an optimization for AllQuery * Fixed merge issues. * Fixed lint issues. * Added benchmark for FILTER * Removed the Option wrapper. * nitz. * Applied PR comments. * Fixed the AllQuery optimization * Applied PR comments. * feat: used `erased_serde` to allow filter query to be serialized * further improved a comment * Added back tests. * removed an unused method * removed an unused method * Added documentation * nitz. * Added query builder. * Fixed a comment. * Applied PR comments. * Fixed doctest issues. * Added ser/de * Removed bench in test * Fixed a lint issue.	2025-11-18 20:54:31 +01:00
Arthur	5277367cb0	remove duplicated call to `index_writer.commit()` in example (#2732 )	2025-11-12 14:52:44 +01:00
Paul Masurel	8b02bff9b8	Removing obsolete benchmark screenshot (#2730 ) Co-authored-by: Paul Masurel <paul.masurel@datadoghq.com>	2025-11-05 09:55:13 +01:00

1 2 3 4 5 ...

3437 Commits