tantivy

mirror of https://github.com/quickwit-oss/tantivy.git synced 2025-12-23 02:29:57 +00:00

Author	SHA1	Message	Date
Paul Masurel	8b02bff9b8	Removing obsolete benchmark screenshot (#2730 ) Co-authored-by: Paul Masurel <paul.masurel@datadoghq.com>	2025-11-05 09:55:13 +01:00
PSeitz	60225bdd45	cleanup (#2724 ) Co-authored-by: Pascal Seitz <pascal.seitz@datadoghq.com>	2025-10-23 10:23:34 +02:00
PSeitz	938bfec8b7	use FxHashMap for Aggregations Request (#2722 ) Co-authored-by: Pascal Seitz <pascal.seitz@datadoghq.com>	2025-10-21 15:59:18 +02:00
PSeitz	dabcaa5809	fix merge intermediate aggregation results (#2719 ) Previously the merging relied on the order of the results, which is invalid since https://github.com/quickwit-oss/tantivy/pull/2035. This bug is only hit in specific scenarios, when the aggregation collectors are built in a different order on different segments. Co-authored-by: Pascal Seitz <pascal.seitz@datadoghq.com>	2025-10-17 12:41:31 +02:00
PSeitz	d410a3b0c0	Add Filtering for Term Aggregations (#2717 ) * Add Filtering for Term Aggregations Closes #2702 * add AggregationsSegmentCtx memory consumption --------- Co-authored-by: Pascal Seitz <pascal.seitz@datadoghq.com>	2025-10-15 17:39:53 +02:00
Remi	fc93391d0e	Minor clarifications on the AggregationsWithAccessor refacto (#2716 )	2025-10-14 19:59:33 +02:00
PSeitz	f8e79271ab	Replace AggregationsWithAccessor (#2715 ) * add nested histogram-termagg benchmark * Replace AggregationsWithAccessor with AggData With AggregationsWithAccessor pre-computation and caching was done on the collector level. If you have 10000 sub collectors (e.g. a term aggregation with sub aggregations) this is very inefficient. `AggData` instead moves the data from the collector to a node which reflects the cardinality of the request tree instead of the cardinality of the segment collector. It also moves the global struct shared with all aggregations in to aggregation specific structs. So each aggregation has its own space to store cached data and aggregation specific information. This also breaks up the dependency to the elastic search aggregation structure somewhat. Due to lifetime issues, we move the agg request specific object out of `AggData` during the collection and move it back at the end (for now). That's some unnecessary work, which costs CPU. This allows better caching and will also pave the way for another potential optimization, by separating the collector and its storage. Currently we allocate a new collector for each sub aggregation bucket (for nested aggregations), but ideally we would have just one collector instance. * renames * move request data to agg request files --------- Co-authored-by: Pascal Seitz <pascal.seitz@datadoghq.com>	2025-10-14 09:22:11 +02:00
PSeitz	33835b6a01	Add DocSet::cost() (#2707 ) * query: add DocSet cost hint and use it for intersection ordering - Add DocSet::cost() - Use cost() instead of size_hint() to order scorers in intersect_scorers This isolates cost-related changes without the new seek APIs from PR #2538 * add comments --------- Co-authored-by: Pascal Seitz <pascal.seitz@datadoghq.com>	2025-10-13 16:25:49 +02:00
PSeitz	270ca5123c	refactor postings (#2709 ) rename shallow_seek to seek_block remove full_block from public postings API This is as preparation to optionally handle Bitsets in the postings	2025-10-08 16:55:25 +02:00
Mustafa S. Moiz	714366d3b9	docs: correct grammar (#2704 ) Correct phrasing for a single line in the docs (`one documents` -> `a document`).	2025-10-08 16:47:09 +02:00
PSeitz-dd	40659d4d07	improve naming in buffered_union (#2705 )	2025-09-24 10:58:46 +02:00
PSeitz	e1e131a804	add and/or queries benchmark (#2701 )	2025-09-22 16:32:49 +02:00
PSeitz-dd	70da310b2d	perf: deduplicate queries (#2698 ) * deduplicate queries Deduplicate queries in the UserInputAst after parsing queries * add return type	2025-09-22 12:16:58 +02:00
PSeitz	85010b589a	clippy (#2700 ) * clippy * clippy * clippy * clippy + fmt --------- Co-authored-by: Pascal Seitz <pascal.seitz@datadoghq.com>	2025-09-19 18:04:25 +02:00
PSeitz-dd	2340dca628	fix compiler warnings (#2699 ) * fix compiler warnings * fix import	2025-09-19 15:55:04 +02:00
Remi	71a26d5b24	Fix CI with rust 1.90 (#2696 ) * Empty commit * Fix dead code lint error	2025-09-18 23:06:33 +02:00
PSeitz-dd	203751f2fe	Optimize ExistsQuery for a high number of dynamic columns (#2694 ) * Optimize ExistsQuery for a high number of dynamic columns The previous algorithm checked _each_ doc in _each_ column for existence. This causes huge cost on JSON fields with e.g. 100k columns. Compute a bitset instead if we have more than one column. add `iter_docs` to the multivalued_index * add benchmark subfields=1 exists_json_union Memory: 89.3 KB (+2.01%) Avg: 0.4865ms (-26.03%) Median: 0.4865ms (-26.03%) [0.4865ms .. 0.4865ms] subfields=2 exists_json_union Memory: 68.1 KB Avg: 1.7048ms (-0.46%) Median: 1.7048ms (-0.46%) [1.7048ms .. 1.7048ms] subfields=3 exists_json_union Memory: 61.8 KB Avg: 2.0742ms (-2.22%) Median: 2.0742ms (-2.22%) [2.0742ms .. 2.0742ms] subfields=4 exists_json_union Memory: 119.8 KB (+103.44%) Avg: 3.9500ms (+42.62%) Median: 3.9500ms (+42.62%) [3.9500ms .. 3.9500ms] subfields=5 exists_json_union Memory: 120.4 KB (+107.65%) Avg: 3.9610ms (+20.65%) Median: 3.9610ms (+20.65%) [3.9610ms .. 3.9610ms] subfields=6 exists_json_union Memory: 120.6 KB (+107.49%) Avg: 3.8903ms (+3.11%) Median: 3.8903ms (+3.11%) [3.8903ms .. 3.8903ms] subfields=7 exists_json_union Memory: 120.9 KB (+106.93%) Avg: 3.6220ms (-16.22%) Median: 3.6220ms (-16.22%) [3.6220ms .. 3.6220ms] subfields=8 exists_json_union Memory: 121.3 KB (+106.23%) Avg: 4.0981ms (-15.97%) Median: 4.0981ms (-15.97%) [4.0981ms .. 4.0981ms] subfields=16 exists_json_union Memory: 123.1 KB (+103.09%) Avg: 4.3483ms (-92.26%) Median: 4.3483ms (-92.26%) [4.3483ms .. 4.3483ms] subfields=256 exists_json_union Memory: 204.6 KB (+19.85%) Avg: 3.8874ms (-99.01%) Median: 3.8874ms (-99.01%) [3.8874ms .. 3.8874ms] subfields=4096 exists_json_union Memory: 2.0 MB Avg: 3.5571ms (-99.90%) Median: 3.5571ms (-99.90%) [3.5571ms .. 3.5571ms] subfields=65536 exists_json_union Memory: 28.3 MB Avg: 14.4417ms (-99.97%) Median: 14.4417ms (-99.97%) [14.4417ms .. 14.4417ms] subfields=262144 exists_json_union Memory: 113.3 MB Avg: 66.2860ms (-99.95%) Median: 66.2860ms (-99.95%) [66.2860ms .. 66.2860ms] * rename methods	2025-09-16 18:21:03 +02:00
PSeitz-dd	7963b0b4aa	Add fast field fallback for term query if not indexed (#2693 ) * Add fast field fallback for term query if not indexed * only fallback without scores	2025-09-12 14:58:21 +02:00
Paul Masurel	d5eefca11d	Merge pull request #2692 from quickwit-oss/paul.masurel/coerce-floats-too-in-search-too This PR changes the logic used on the ingestion of floats.	2025-09-10 09:46:54 +02:00
Paul Masurel	5d6c8de23e	Align search float search logic to the columnar coercion rules It applies the same logic on floats as for u64 or i64. In all case, the idea is (for the inverted index) to coerce number to their canonical representation, before indexing and before searching. That way a document with the float 1.0 will be searchable when the user searches for 1. Note that contrary to the columnar, we do not attempt to coerce all of the terms associated to a given json path to a single numerical type. We simply rely on this "point-wise" canonicalization.	2025-09-09 19:28:17 +02:00
PSeitz	a06365f39f	Update CHANGELOG.md for bugfixes (#2674 ) * Update CHANGELOG.md * Update CHANGELOG.md	2025-09-04 11:51:00 +02:00
Raphaël Cohen	f4b374110f	feat: Regex query grammar (#2677 ) * feat: Regex query grammar * feat: Disable regexes by default * chore: Apply formatting	2025-09-03 10:07:04 +02:00
PSeitz-dd	c37af9c1ff	update release instructions (#2687 )	2025-08-22 07:57:48 +08:00
PSeitz	33794a114c	chore: Release (#2686 ) Co-authored-by: Pascal Seitz <pascal.seitz@datadoghq.com>	2025-08-20 18:29:37 +08:00
PSeitz-dd	8676a1f57b	prepare release: update Changelog (#2685 )	2025-08-20 16:07:53 +08:00
PSeitz-dd	021ff2ad63	move bench to binggan (#2684 )	2025-08-14 17:02:44 +08:00
Paul Masurel	39e027667b	per field size details (#2679 ) * Added per-field size details. This also does a bunch of refactoring. merging field metadata does not silently asserts that arguments should be sorted. merging does not set `stored`. We do not rely on a hashmap to group fields, but instead rely on the fact that the term dictionary is sorted. The inverted level method that exposes field metadata is not exposed as public anymore. * CR comment --------- Co-authored-by: Paul Masurel <paul.masurel@datadoghq.com>	2025-08-13 13:12:22 +02:00
PSeitz-dd	a1d65c3df3	test stable ordering with pagination (#2683 )	2025-08-13 15:36:28 +08:00
trinity-1686a	2e4615c2d3	Merge pull request #2678 from Darkheir/feat/query_grammar_space_between_field_and_value feat: Support spaces between field name and value	2025-08-11 09:57:23 +02:00
Darkheir	610091e2c4	feat: Applies PR review suggestion	2025-08-04 10:12:51 +02:00
trinity-1686a	c301e7b1c4	Merge pull request #2673 from paradedb/stuhood.fix-order-by-dup-string Fix `TopDocs::order_by_string_fast_field` for duplicates	2025-07-30 18:25:03 +02:00
Stu Hood	d9eb093368	Attempt to clarify `sorted_ords_to_term_cb`.	2025-07-29 21:56:31 -07:00
Darkheir	d4b090124c	feat: Support spaces between field name and value	2025-07-23 11:12:13 +02:00
PSeitz-dd	811c68cdb2	fix field_names in top_hits aggregation (#2675 )	2025-07-21 12:19:30 +08:00
trinity-1686a	bc1c789897	Merge pull request #2676 from quickwit-oss/trinity.pointard/allow-partial-default-field-success ignore failure to parse query when other default field suceeded	2025-07-18 14:20:41 +02:00
trinity Pointard	e7c8c331bd	ignore failure to parse query when other default field suceeded	2025-07-17 14:47:28 +02:00
Eric Ridge	2f01152a3c	adjust `Dictionary::sorted_ords_to_term_cb()` to allow duplicates	2025-07-16 13:38:43 -07:00
PSeitz	4e84c70387	Fix TopNComputer for reverse order (#2672 ) Co-authored-by: Pascal Seitz <pascal.seitz@datadoghq.com>	2025-07-16 21:44:04 +08:00
Paul M.	f2c77f06c5	Update fs4 to latest (0.13.1) (#2654 ) - One change was needed to handle the `Result<bool>` that now returns from `try_lock_exclusive` Co-authored-by: Paul M. <prov223@tutanota.com>	2025-07-14 11:26:19 +08:00
MassimilianoBaglioni	74334f9c9a	Fixed typo in documentation (#2629 ) Co-authored-by: Massimiliano Baglioni <massimilianobaglioni@MacBook-Air-di-Massimiliano.local>	2025-07-11 14:45:59 +08:00
Parth	cc4beb61ba	update CHANGELOG (#2670 ) * update CHANGELOG * Update CHANGELOG.md Co-authored-by: PSeitz <PSeitz@users.noreply.github.com> * Update CHANGELOG.md --------- Co-authored-by: PSeitz <PSeitz@users.noreply.github.com>	2025-07-11 11:33:11 +08:00
Dale Seo	6742e5981b	fix a typo in the comment (#2668 )	2025-07-10 07:14:57 +02:00
Philippe Noël	b128299976	Update ParadeDB logo (#2669 )	2025-07-10 07:14:35 +02:00
PSeitz	945af922d1	clippy (#2661 ) * clippy * use readable version --------- Co-authored-by: Pascal Seitz <pascal.seitz@datadoghq.com>	2025-07-02 11:25:03 +02:00
PSeitz-dd	295d07e55c	fix union performance regression (#2663 ) closes https://github.com/quickwit-oss/tantivy/issues/2656	2025-07-01 20:32:25 +02:00
PSeitz	080fa4d1f4	add docs/example and Vec<u32> values to sstable (#2660 )	2025-07-01 15:40:02 +02:00
PSeitz-dd	988c2b35e7	fix import in test (#2657 )	2025-06-24 12:55:34 +02:00
PSeitz	bf3cc12610	update CHANGELOG (#2621 ) Co-authored-by: Pascal Seitz <pascal.seitz@datadoghq.com>	2025-06-24 11:58:44 +02:00
Stu Hood	a2400f4e73	Add string fast field support to `TopDocs`. (#2642 ) * Add string fast field support to `TopDocs`. * Remove unnecessary generics, and review feedback. * Use actual/less-ambiguous cities. * Review feedback	2025-06-20 10:27:14 +02:00
Zhang.Jinrui	436ec6caea	fix typo for the comments of search_with_executor() (#2653 ) Co-authored-by: Zhang Jinrui <zhangjinrui@microsoft.com>	2025-06-19 09:53:21 +02:00

1 2 3 4 5 ...

3388 Commits