tantivy

mirror of https://github.com/quickwit-oss/tantivy.git synced 2026-01-06 09:12:55 +00:00

Author	SHA1	Message	Date
PSeitz-dd	40659d4d07	improve naming in buffered_union (#2705 )	2025-09-24 10:58:46 +02:00
PSeitz-dd	70da310b2d	perf: deduplicate queries (#2698 ) * deduplicate queries Deduplicate queries in the UserInputAst after parsing queries * add return type	2025-09-22 12:16:58 +02:00
PSeitz-dd	2340dca628	fix compiler warnings (#2699 ) * fix compiler warnings * fix import	2025-09-19 15:55:04 +02:00
Remi	71a26d5b24	Fix CI with rust 1.90 (#2696 ) * Empty commit * Fix dead code lint error	2025-09-18 23:06:33 +02:00
PSeitz-dd	203751f2fe	Optimize ExistsQuery for a high number of dynamic columns (#2694 ) * Optimize ExistsQuery for a high number of dynamic columns The previous algorithm checked _each_ doc in _each_ column for existence. This causes huge cost on JSON fields with e.g. 100k columns. Compute a bitset instead if we have more than one column. add `iter_docs` to the multivalued_index * add benchmark subfields=1 exists_json_union Memory: 89.3 KB (+2.01%) Avg: 0.4865ms (-26.03%) Median: 0.4865ms (-26.03%) [0.4865ms .. 0.4865ms] subfields=2 exists_json_union Memory: 68.1 KB Avg: 1.7048ms (-0.46%) Median: 1.7048ms (-0.46%) [1.7048ms .. 1.7048ms] subfields=3 exists_json_union Memory: 61.8 KB Avg: 2.0742ms (-2.22%) Median: 2.0742ms (-2.22%) [2.0742ms .. 2.0742ms] subfields=4 exists_json_union Memory: 119.8 KB (+103.44%) Avg: 3.9500ms (+42.62%) Median: 3.9500ms (+42.62%) [3.9500ms .. 3.9500ms] subfields=5 exists_json_union Memory: 120.4 KB (+107.65%) Avg: 3.9610ms (+20.65%) Median: 3.9610ms (+20.65%) [3.9610ms .. 3.9610ms] subfields=6 exists_json_union Memory: 120.6 KB (+107.49%) Avg: 3.8903ms (+3.11%) Median: 3.8903ms (+3.11%) [3.8903ms .. 3.8903ms] subfields=7 exists_json_union Memory: 120.9 KB (+106.93%) Avg: 3.6220ms (-16.22%) Median: 3.6220ms (-16.22%) [3.6220ms .. 3.6220ms] subfields=8 exists_json_union Memory: 121.3 KB (+106.23%) Avg: 4.0981ms (-15.97%) Median: 4.0981ms (-15.97%) [4.0981ms .. 4.0981ms] subfields=16 exists_json_union Memory: 123.1 KB (+103.09%) Avg: 4.3483ms (-92.26%) Median: 4.3483ms (-92.26%) [4.3483ms .. 4.3483ms] subfields=256 exists_json_union Memory: 204.6 KB (+19.85%) Avg: 3.8874ms (-99.01%) Median: 3.8874ms (-99.01%) [3.8874ms .. 3.8874ms] subfields=4096 exists_json_union Memory: 2.0 MB Avg: 3.5571ms (-99.90%) Median: 3.5571ms (-99.90%) [3.5571ms .. 3.5571ms] subfields=65536 exists_json_union Memory: 28.3 MB Avg: 14.4417ms (-99.97%) Median: 14.4417ms (-99.97%) [14.4417ms .. 14.4417ms] subfields=262144 exists_json_union Memory: 113.3 MB Avg: 66.2860ms (-99.95%) Median: 66.2860ms (-99.95%) [66.2860ms .. 66.2860ms] * rename methods	2025-09-16 18:21:03 +02:00
PSeitz-dd	7963b0b4aa	Add fast field fallback for term query if not indexed (#2693 ) * Add fast field fallback for term query if not indexed * only fallback without scores	2025-09-12 14:58:21 +02:00
Paul Masurel	5d6c8de23e	Align search float search logic to the columnar coercion rules It applies the same logic on floats as for u64 or i64. In all case, the idea is (for the inverted index) to coerce number to their canonical representation, before indexing and before searching. That way a document with the float 1.0 will be searchable when the user searches for 1. Note that contrary to the columnar, we do not attempt to coerce all of the terms associated to a given json path to a single numerical type. We simply rely on this "point-wise" canonicalization.	2025-09-09 19:28:17 +02:00
Raphaël Cohen	f4b374110f	feat: Regex query grammar (#2677 ) * feat: Regex query grammar * feat: Disable regexes by default * chore: Apply formatting	2025-09-03 10:07:04 +02:00
Paul Masurel	39e027667b	per field size details (#2679 ) * Added per-field size details. This also does a bunch of refactoring. merging field metadata does not silently asserts that arguments should be sorted. merging does not set `stored`. We do not rely on a hashmap to group fields, but instead rely on the fact that the term dictionary is sorted. The inverted level method that exposes field metadata is not exposed as public anymore. * CR comment --------- Co-authored-by: Paul Masurel <paul.masurel@datadoghq.com>	2025-08-13 13:12:22 +02:00
PSeitz-dd	a1d65c3df3	test stable ordering with pagination (#2683 )	2025-08-13 15:36:28 +08:00
trinity-1686a	2e4615c2d3	Merge pull request #2678 from Darkheir/feat/query_grammar_space_between_field_and_value feat: Support spaces between field name and value	2025-08-11 09:57:23 +02:00
trinity-1686a	c301e7b1c4	Merge pull request #2673 from paradedb/stuhood.fix-order-by-dup-string Fix `TopDocs::order_by_string_fast_field` for duplicates	2025-07-30 18:25:03 +02:00
Darkheir	d4b090124c	feat: Support spaces between field name and value	2025-07-23 11:12:13 +02:00
PSeitz-dd	811c68cdb2	fix field_names in top_hits aggregation (#2675 )	2025-07-21 12:19:30 +08:00
trinity-1686a	bc1c789897	Merge pull request #2676 from quickwit-oss/trinity.pointard/allow-partial-default-field-success ignore failure to parse query when other default field suceeded	2025-07-18 14:20:41 +02:00
trinity Pointard	e7c8c331bd	ignore failure to parse query when other default field suceeded	2025-07-17 14:47:28 +02:00
Eric Ridge	2f01152a3c	adjust `Dictionary::sorted_ords_to_term_cb()` to allow duplicates	2025-07-16 13:38:43 -07:00
PSeitz	4e84c70387	Fix TopNComputer for reverse order (#2672 ) Co-authored-by: Pascal Seitz <pascal.seitz@datadoghq.com>	2025-07-16 21:44:04 +08:00
Paul M.	f2c77f06c5	Update fs4 to latest (0.13.1) (#2654 ) - One change was needed to handle the `Result<bool>` that now returns from `try_lock_exclusive` Co-authored-by: Paul M. <prov223@tutanota.com>	2025-07-14 11:26:19 +08:00
PSeitz	945af922d1	clippy (#2661 ) * clippy * use readable version --------- Co-authored-by: Pascal Seitz <pascal.seitz@datadoghq.com>	2025-07-02 11:25:03 +02:00
PSeitz-dd	295d07e55c	fix union performance regression (#2663 ) closes https://github.com/quickwit-oss/tantivy/issues/2656	2025-07-01 20:32:25 +02:00
Stu Hood	a2400f4e73	Add string fast field support to `TopDocs`. (#2642 ) * Add string fast field support to `TopDocs`. * Remove unnecessary generics, and review feedback. * Use actual/less-ambiguous cities. * Review feedback	2025-06-20 10:27:14 +02:00
Zhang.Jinrui	436ec6caea	fix typo for the comments of search_with_executor() (#2653 ) Co-authored-by: Zhang Jinrui <zhangjinrui@microsoft.com>	2025-06-19 09:53:21 +02:00
PSeitz	2b668bd2bf	readability improvement on executor (#2615 )	2025-04-08 18:28:49 +02:00
Remi Dettai	b681ec9335	Fix compilation stability	2025-04-01 09:33:33 +02:00
trinity Pointard	9426d5be7b	fix agg Key PartialEq impl	2025-03-14 14:57:45 +01:00
Paul Masurel	519e5d2ed1	clippy warnings	2025-03-05 11:15:06 +01:00
Paul Masurel	0afabad494	Cargo fmt	2025-03-05 11:07:46 +01:00
Remi Dettai	89b052cd42	Catch panics during merges (#2582 ) * Adding panic handler for the rayon merge thread pool * Return panic message in error --------- Co-authored-by: Paul Masurel <paul.masurel@datadoghq.com>	2025-03-05 10:36:48 +01:00
SteveLauC	c48c649436	refactor: use std AtomicU64 and remove wrapper (#2585 )	2025-02-24 03:56:15 +01:00
Paul Masurel	58c0739953	Merge pull request #2581 from quickwit-oss/merge_dict_column_repro use usize in bitpacker	2025-02-21 10:53:07 +09:00
Pascal Seitz	e7daf69de9	use usize in bitpacker use usize in bitpacker to enable larger columns in the columnar store Godbolt comparison with u32 vs u64 for get access: https://godbolt.org/z/cjf7nenYP Add a mini-tool to inspect columnar files created by tantivy. (very basic functionality which can be extended later)	2025-02-20 15:39:10 +01:00
trinity Pointard	0368162ef0	make DateHistogramAggregationReq buildable	2025-02-18 11:45:24 +01:00
trinity-1686a	d281ca3e65	Merge pull request #2559 from quickwit-oss/trinity/sstable-partial-automaton allow warming partially an sstable for an automaton	2025-01-08 16:35:35 +01:00
trinity Pointard	be17daf658	split iterator	2025-01-08 16:24:34 +01:00
trinity Pointard	6ca84a61fa	make termdict always clone	2025-01-08 16:19:54 +01:00
trinity Pointard	037d12c9c9	fix deadlocking on automaton warmup	2025-01-06 11:58:58 +01:00
Remi Dettai	71cf19870b	Exist queries match subpath fields (#2558 ) * Exist queries match subpath fields * Make subpath check optional * Add async subpath listing	2025-01-06 10:17:39 +01:00
trinity Pointard	175a529c41	use executor for cpu-heavy sstable decompression for automaton	2025-01-03 19:14:07 +01:00
trinity Pointard	fe0c7c5408	change rangebound style	2025-01-02 11:56:05 +01:00
Harrison Burt	148594f0f9	Improve `IndexWriter` customisation via builder (#2562 ) * Improve `IndexWriter` customisation via builder * Remove change noise from PR * Correct documentation * Resolve comments and add test	2025-01-02 09:43:22 +01:00
trinity Pointard	dfff5f3bcb	rename merge_holes_under => merge_holes_under_bytes	2024-12-23 16:17:44 +01:00
trinity-1686a	ebf4d84553	add comment about cpu-intensive operation in async context	2024-12-20 12:23:49 +01:00
trinity-1686a	a1447cc9c2	remove breaking change in sstable public api	2024-12-19 17:30:05 +01:00
trinity-1686a	c39d91f827	Merge pull request #2547 from quickwit-oss/trinity/count-str add support for counting non integer in aggregation	2024-12-17 15:27:30 +01:00
trinity Pointard	32b6e9711b	add tests	2024-12-13 16:06:24 +01:00
trinity-1686a	24c5dc2398	allow warming up automaton	2024-12-10 13:32:12 +01:00
Pierre Barre	6e02c5cb25	Make `NUM_MERGE_THREADS` configurable (#2535 ) * Make `NUM_MERGE_THREADS` configurable * Remove unused import * Reword comment src/index/index.rs Co-authored-by: PSeitz <PSeitz@users.noreply.github.com> --------- Co-authored-by: PSeitz <PSeitz@users.noreply.github.com>	2024-12-09 16:53:11 +08:00
trinity-1686a	0bac391291	add support for counting non integer in aggregation	2024-11-28 19:52:47 +01:00
Paul Masurel	c35a782747	Updating rustc-hash and clippy fixes (#2532 ) * Updating rustc-hash and clippy fixes * fix terms_aggregation_min_doc_count_special_case --------- Co-authored-by: Pascal Seitz <pascal.seitz@gmail.com>	2024-11-01 13:46:26 +08:00

1 2 3 4 5 ...

2490 Commits