tantivy

mirror of https://github.com/quickwit-oss/tantivy.git synced 2026-07-06 09:10:48 +00:00

Author	SHA1	Message	Date
Philippe Noël	ddd169b77c	chore: Don't do codecov (#21 )	2025-12-10 10:17:25 -08:00
Eric Ridge	bb4c4b8522	perf: push `FileSlice`s down through most of fast fields (#19 ) This PR modifies internal API signatures and implementation details so that `FileSlice`s are passed down into the innards of (at least) the `BlockwiseLinearCodec`. This allows tantivy to defer dereferencing large slices of bytes when reading numeric fast fields, and instead dereference only the slice of bytes it needs for any given compressed Block. The motivation here is for external `Directory` implementations where it's not exactly efficient to dereference large slices of bytes.	2025-12-10 10:17:25 -08:00
Neil Hansen	ffa558e3a9	fix: tests in ci (#18 )	2025-12-10 10:17:25 -08:00
Neil Hansen	a35e3dcb5a	suppress warnings after rebase	2025-12-10 10:17:25 -08:00
Neil Hansen	1e3998fbad	implement fuzzy scoring in sstable	2025-12-10 10:17:25 -08:00
Neil Hansen	f3df079d6b	chore: point tantivy-fst to paradedb fork to fix regex	2025-12-10 10:17:24 -08:00
Ming Ying	f7c0335857	comments	2025-12-10 10:17:24 -08:00
Ming Ying	2584325e0d	add reconsider_merge_policy to directory	2025-12-10 10:17:24 -08:00
Eric B. Ridge	1f2c2d0c8a	fix compilation warnings on rust v1.83	2025-12-10 10:17:24 -08:00
Eric Ridge	91db6909d1	Add a `payload: &mut (dyn Any + '_)` argument to `Directory::save_meta()` (#17 )	2025-12-10 10:17:24 -08:00
Ming Ying	7639b47615	small changes to make MVCC work with delete	2025-12-10 10:17:24 -08:00
Ming Ying	8b55f0f355	Make DeleteMeta pub	2025-12-10 10:17:24 -08:00
Ming Ying	8d29f19110	make save_metas provide previous metas	2025-12-10 10:17:24 -08:00
Ming Ying	d742d3277a	undo changes to segment_updater.rs	2025-12-10 10:17:24 -08:00
Eric B. Ridge	3afe3714a2	no pgrx, please	2025-12-10 10:17:24 -08:00
Ming Ying	67ea8e53a8	quickwit compiles	2025-12-10 10:17:24 -08:00
Ming Ying	3adc85c017	Directory trait can read/write meta/managed	2025-12-10 10:17:24 -08:00
Ming	6bb3a22c98	expose AddOperation and with_max_doc (#7 )	2025-12-10 10:17:23 -08:00
Ming	5503cfb8ef	Fix managed paths (#5 )	2025-12-10 10:17:23 -08:00
Alexander Alexandrov	ea0e88ae4b	feat: implement `TokenFilter` for `Option<F>` (#4 )	2025-12-10 10:17:23 -08:00
Neil Hansen	dee2dd3f21	Use Levenshtein distance to score documents in fuzzy term queries	2025-12-10 10:17:19 -08:00
Philippe Noël	794ff1ffc9	chore: Make `Language` hashable (#79 ) (#2763 ) Co-authored-by: Ming <ming.ying.nyc@gmail.com>	2025-12-10 15:38:43 +01:00
PSeitz-dd	c6912ce89a	Handle JSON fields and columnar in space_usage (#2761 ) return field names in space_usage instead of `Field` more detailed info for columns	2025-12-10 20:33:33 +08:00
PSeitz	618e3bd11b	Term and IndexingTerm cleanup (#2750 ) * refactor term * add deprecated functions --------- Co-authored-by: Pascal Seitz <pascal.seitz@datadoghq.com>	2025-12-05 09:48:40 +08:00
PSeitz	b2f99c6217	add term->histogram benchmark (#2758 ) * add term->histogram benchmark * add more term aggs --------- Co-authored-by: Pascal Seitz <pascal.seitz@datadoghq.com>	2025-12-04 02:29:37 +01:00
PSeitz	76de5bab6f	fix unsafe warnings (#2757 )	2025-12-03 20:15:21 +08:00
rustmailer	b7eb31162b	docs: add usage example to README (#2743 )	2025-12-02 21:56:57 +01:00
Paul Masurel	63c66005db	Lazy scorers (#2726 ) * Refactoring of the score tweaker into `SortKeyComputer`s to unlock two features. - Allow lazy evaluation of score. As soon as we identified that a doc won't reach the topK threshold, we can stop the evaluation. - Allow for a different segment level score, segment level score and their conversion. This PR breaks public API, but fixing code is straightforward. * Bumping tantivy version --------- Co-authored-by: Paul Masurel <paul.masurel@datadoghq.com>	2025-12-01 15:38:57 +01:00
Paul Masurel	7d513a44c5	Added some benchmark for top K by a fast field (#2754 ) Also removed query parsing from the bench code. Co-authored-by: Paul Masurel <paul.masurel@datadoghq.com>	2025-12-01 14:58:29 +01:00
Stu Hood	ca87fcd454	Implement `collect_block` for `Collector`s which wrap other `Collector`s (#2727 ) * Implement `collect_block` for tuple Collectors, and for MultiCollector. * Two more.	2025-12-01 12:26:29 +01:00
Ang	08a92675dc	Fix typos again (#2753 ) Found via `codespell -S benches,stopwords.rs -L womens,parth,abd,childs,ond,ser,ue,mot,hel,atleast,pris,claus,allo`	2025-12-01 12:15:41 +01:00
Raphaël Cohen	f7f4b354d6	fix: Handle phrase prefixed with star (#2751 ) Signed-off-by: Darkheir <raphael.cohen@sekoia.io>	2025-12-01 11:43:25 +01:00
Paul Masurel	25d44fcec8	Revert "remove unused columnar api (#2742 )" (#2748 ) * Revert "remove unused columnar api (#2742)" This reverts commit `8725594d47`. * Clippy comment + removing fill_vals --------- Co-authored-by: Paul Masurel <paul.masurel@datadoghq.com>	2025-11-26 17:44:02 +01:00
PSeitz-dd	842fe9295f	split Term in Term and IndexingTerm (#2744 ) * split Term in Term and IndexingTerm * add append_json_path to JsonTermSerializer	2025-11-26 16:48:59 +01:00
Paul Masurel	f88b7200b2	Optimization when posting list are saturated. (#2745 ) * Optimization when posting list are saturated. If a posting list doc freq is the segment reader's max_doc, and if scoring does not matter, we can replace it by a AllScorer. In turn, in a boolean query, we can dismiss all scorers and empty scorers, to accelerate the request. * Added range query optimization * CR comment * CR comments * CR comment --------- Co-authored-by: Paul Masurel <paul.masurel@datadoghq.com>	2025-11-26 15:50:57 +01:00
PSeitz-dd	8725594d47	remove unused columnar api (#2742 )	2025-11-21 18:07:25 +01:00
PSeitz	43a784671a	clippy (#2741 ) Co-authored-by: Pascal Seitz <pascal.seitz@datadoghq.com>	2025-11-21 18:07:03 +01:00
Paul Masurel	c363bbd23d	Optimize term aggregation with low cardinality + some refactoring (#2740 ) This introduce an optimization of top level term aggregation on field with a low cardinality. We then use a Vec as the underlying map. In addition, we buffer subaggregations. --------- Co-authored-by: Pascal Seitz <pascal.seitz@datadoghq.com> Co-authored-by: Paul Masurel <paul@quickwit.io>	2025-11-21 14:46:29 +01:00
Moe	70e591e230	feat: added filter aggregation (#2711 ) * Initial impl * Added `Filter` impl in `build_single_agg_segment_collector_with_reader` + Added tests * Added `Filter(FilterBucketResult)` + Made tests work. * Fixed type issues. * Fixed a test. * 8a7a73a: Pass `segment_reader` * Added more tests. * Improved parsing + tests * refactoring * Added more tests. * refactoring: moved parsing code under QueryParser * Use Tantivy syntax instead of ES * Added a sanity check test. * Simplified impl + tests * Added back tests in a more maintable way * nitz. * nitz * implemented very simple fast-path * improved a comment * implemented fast field support * Used `BoundsRange` * Improved fast field impl + tests * Simplified execution. * Fixed exports + nitz * Improved the tests to check to the expected result. * Improved test by checking the whole result JSON * Removed brittle perf checks. * Added efficiency verification tests. * Added one more efficiency check test. * Improved the efficiency tests. * Removed unnecessary parsing code + added direct Query obj * Fixed tests. * Improved tests * Fixed code structure * Fixed lint issues * nitz. * nitz * nitz. * nitz. * nitz. * Added an example * Fixed PR comments. * Applied PR comments + nitz * nitz. * Improved the code. * Fixed a perf issue. * Added batch processing. * Made the example more interesting * Fixed bucket count * Renamed Direct to CustomQuery * Fixed lint issues. * No need for scorer to be an `Option` * nitz * Used BitSet * Added an optimization for AllQuery * Fixed merge issues. * Fixed lint issues. * Added benchmark for FILTER * Removed the Option wrapper. * nitz. * Applied PR comments. * Fixed the AllQuery optimization * Applied PR comments. * feat: used `erased_serde` to allow filter query to be serialized * further improved a comment * Added back tests. * removed an unused method * removed an unused method * Added documentation * nitz. * Added query builder. * Fixed a comment. * Applied PR comments. * Fixed doctest issues. * Added ser/de * Removed bench in test * Fixed a lint issue.	2025-11-18 20:54:31 +01:00
Arthur	5277367cb0	remove duplicated call to `index_writer.commit()` in example (#2732 )	2025-11-12 14:52:44 +01:00
Paul Masurel	8b02bff9b8	Removing obsolete benchmark screenshot (#2730 ) Co-authored-by: Paul Masurel <paul.masurel@datadoghq.com>	2025-11-05 09:55:13 +01:00
PSeitz	60225bdd45	cleanup (#2724 ) Co-authored-by: Pascal Seitz <pascal.seitz@datadoghq.com>	2025-10-23 10:23:34 +02:00
PSeitz	938bfec8b7	use FxHashMap for Aggregations Request (#2722 ) Co-authored-by: Pascal Seitz <pascal.seitz@datadoghq.com>	2025-10-21 15:59:18 +02:00
PSeitz	dabcaa5809	fix merge intermediate aggregation results (#2719 ) Previously the merging relied on the order of the results, which is invalid since https://github.com/quickwit-oss/tantivy/pull/2035. This bug is only hit in specific scenarios, when the aggregation collectors are built in a different order on different segments. Co-authored-by: Pascal Seitz <pascal.seitz@datadoghq.com>	2025-10-17 12:41:31 +02:00
PSeitz	d410a3b0c0	Add Filtering for Term Aggregations (#2717 ) * Add Filtering for Term Aggregations Closes #2702 * add AggregationsSegmentCtx memory consumption --------- Co-authored-by: Pascal Seitz <pascal.seitz@datadoghq.com>	2025-10-15 17:39:53 +02:00
Remi	fc93391d0e	Minor clarifications on the AggregationsWithAccessor refacto (#2716 )	2025-10-14 19:59:33 +02:00
PSeitz	f8e79271ab	Replace AggregationsWithAccessor (#2715 ) * add nested histogram-termagg benchmark * Replace AggregationsWithAccessor with AggData With AggregationsWithAccessor pre-computation and caching was done on the collector level. If you have 10000 sub collectors (e.g. a term aggregation with sub aggregations) this is very inefficient. `AggData` instead moves the data from the collector to a node which reflects the cardinality of the request tree instead of the cardinality of the segment collector. It also moves the global struct shared with all aggregations in to aggregation specific structs. So each aggregation has its own space to store cached data and aggregation specific information. This also breaks up the dependency to the elastic search aggregation structure somewhat. Due to lifetime issues, we move the agg request specific object out of `AggData` during the collection and move it back at the end (for now). That's some unnecessary work, which costs CPU. This allows better caching and will also pave the way for another potential optimization, by separating the collector and its storage. Currently we allocate a new collector for each sub aggregation bucket (for nested aggregations), but ideally we would have just one collector instance. * renames * move request data to agg request files --------- Co-authored-by: Pascal Seitz <pascal.seitz@datadoghq.com>	2025-10-14 09:22:11 +02:00
PSeitz	33835b6a01	Add DocSet::cost() (#2707 ) * query: add DocSet cost hint and use it for intersection ordering - Add DocSet::cost() - Use cost() instead of size_hint() to order scorers in intersect_scorers This isolates cost-related changes without the new seek APIs from PR #2538 * add comments --------- Co-authored-by: Pascal Seitz <pascal.seitz@datadoghq.com>	2025-10-13 16:25:49 +02:00
PSeitz	270ca5123c	refactor postings (#2709 ) rename shallow_seek to seek_block remove full_block from public postings API This is as preparation to optionally handle Bitsets in the postings	2025-10-08 16:55:25 +02:00
Mustafa S. Moiz	714366d3b9	docs: correct grammar (#2704 ) Correct phrasing for a single line in the docs (`one documents` -> `a document`).	2025-10-08 16:47:09 +02:00

1 2 3 4 5 ...

3428 Commits