PSeitz-dd
40659d4d07
improve naming in buffered_union ( #2705 )
2025-09-24 10:58:46 +02:00
PSeitz-dd
70da310b2d
perf: deduplicate queries ( #2698 )
...
* deduplicate queries
Deduplicate queries in the UserInputAst after parsing queries
* add return type
2025-09-22 12:16:58 +02:00
PSeitz-dd
2340dca628
fix compiler warnings ( #2699 )
...
* fix compiler warnings
* fix import
2025-09-19 15:55:04 +02:00
Remi
71a26d5b24
Fix CI with rust 1.90 ( #2696 )
...
* Empty commit
* Fix dead code lint error
2025-09-18 23:06:33 +02:00
PSeitz-dd
203751f2fe
Optimize ExistsQuery for a high number of dynamic columns ( #2694 )
...
* Optimize ExistsQuery for a high number of dynamic columns
The previous algorithm checked _each_ doc in _each_ column for
existence. This causes huge cost on JSON fields with e.g. 100k columns.
Compute a bitset instead if we have more than one column.
add `iter_docs` to the multivalued_index
* add benchmark
subfields=1
exists_json_union Memory: 89.3 KB (+2.01%) Avg: 0.4865ms (-26.03%) Median: 0.4865ms (-26.03%) [0.4865ms .. 0.4865ms]
subfields=2
exists_json_union Memory: 68.1 KB Avg: 1.7048ms (-0.46%) Median: 1.7048ms (-0.46%) [1.7048ms .. 1.7048ms]
subfields=3
exists_json_union Memory: 61.8 KB Avg: 2.0742ms (-2.22%) Median: 2.0742ms (-2.22%) [2.0742ms .. 2.0742ms]
subfields=4
exists_json_union Memory: 119.8 KB (+103.44%) Avg: 3.9500ms (+42.62%) Median: 3.9500ms (+42.62%) [3.9500ms .. 3.9500ms]
subfields=5
exists_json_union Memory: 120.4 KB (+107.65%) Avg: 3.9610ms (+20.65%) Median: 3.9610ms (+20.65%) [3.9610ms .. 3.9610ms]
subfields=6
exists_json_union Memory: 120.6 KB (+107.49%) Avg: 3.8903ms (+3.11%) Median: 3.8903ms (+3.11%) [3.8903ms .. 3.8903ms]
subfields=7
exists_json_union Memory: 120.9 KB (+106.93%) Avg: 3.6220ms (-16.22%) Median: 3.6220ms (-16.22%) [3.6220ms .. 3.6220ms]
subfields=8
exists_json_union Memory: 121.3 KB (+106.23%) Avg: 4.0981ms (-15.97%) Median: 4.0981ms (-15.97%) [4.0981ms .. 4.0981ms]
subfields=16
exists_json_union Memory: 123.1 KB (+103.09%) Avg: 4.3483ms (-92.26%) Median: 4.3483ms (-92.26%) [4.3483ms .. 4.3483ms]
subfields=256
exists_json_union Memory: 204.6 KB (+19.85%) Avg: 3.8874ms (-99.01%) Median: 3.8874ms (-99.01%) [3.8874ms .. 3.8874ms]
subfields=4096
exists_json_union Memory: 2.0 MB Avg: 3.5571ms (-99.90%) Median: 3.5571ms (-99.90%) [3.5571ms .. 3.5571ms]
subfields=65536
exists_json_union Memory: 28.3 MB Avg: 14.4417ms (-99.97%) Median: 14.4417ms (-99.97%) [14.4417ms .. 14.4417ms]
subfields=262144
exists_json_union Memory: 113.3 MB Avg: 66.2860ms (-99.95%) Median: 66.2860ms (-99.95%) [66.2860ms .. 66.2860ms]
* rename methods
2025-09-16 18:21:03 +02:00
PSeitz-dd
7963b0b4aa
Add fast field fallback for term query if not indexed ( #2693 )
...
* Add fast field fallback for term query if not indexed
* only fallback without scores
2025-09-12 14:58:21 +02:00
Paul Masurel
5d6c8de23e
Align search float search logic to the columnar coercion rules
...
It applies the same logic on floats as for u64 or i64.
In all case, the idea is (for the inverted index) to coerce number
to their canonical representation, before indexing and before searching.
That way a document with the float 1.0 will be searchable when the user
searches for 1.
Note that contrary to the columnar, we do not attempt to coerce all of the
terms associated to a given json path to a single numerical type.
We simply rely on this "point-wise" canonicalization.
2025-09-09 19:28:17 +02:00
Raphaël Cohen
f4b374110f
feat: Regex query grammar ( #2677 )
...
* feat: Regex query grammar
* feat: Disable regexes by default
* chore: Apply formatting
2025-09-03 10:07:04 +02:00
Paul Masurel
39e027667b
per field size details ( #2679 )
...
* Added per-field size details.
This also does a bunch of refactoring.
merging field metadata does not silently asserts that arguments should be sorted.
merging does not set `stored`.
We do not rely on a hashmap to group fields, but instead rely on the fact that
the term dictionary is sorted.
The inverted level method that exposes field metadata is not exposed
as public anymore.
* CR comment
---------
Co-authored-by: Paul Masurel <paul.masurel@datadoghq.com >
2025-08-13 13:12:22 +02:00
PSeitz-dd
a1d65c3df3
test stable ordering with pagination ( #2683 )
2025-08-13 15:36:28 +08:00
trinity-1686a
2e4615c2d3
Merge pull request #2678 from Darkheir/feat/query_grammar_space_between_field_and_value
...
feat: Support spaces between field name and value
2025-08-11 09:57:23 +02:00
trinity-1686a
c301e7b1c4
Merge pull request #2673 from paradedb/stuhood.fix-order-by-dup-string
...
Fix `TopDocs::order_by_string_fast_field` for duplicates
2025-07-30 18:25:03 +02:00
Darkheir
d4b090124c
feat: Support spaces between field name and value
2025-07-23 11:12:13 +02:00
PSeitz-dd
811c68cdb2
fix field_names in top_hits aggregation ( #2675 )
2025-07-21 12:19:30 +08:00
trinity-1686a
bc1c789897
Merge pull request #2676 from quickwit-oss/trinity.pointard/allow-partial-default-field-success
...
ignore failure to parse query when other default field suceeded
2025-07-18 14:20:41 +02:00
trinity Pointard
e7c8c331bd
ignore failure to parse query when other default field suceeded
2025-07-17 14:47:28 +02:00
Eric Ridge
2f01152a3c
adjust Dictionary::sorted_ords_to_term_cb() to allow duplicates
2025-07-16 13:38:43 -07:00
PSeitz
4e84c70387
Fix TopNComputer for reverse order ( #2672 )
...
Co-authored-by: Pascal Seitz <pascal.seitz@datadoghq.com >
2025-07-16 21:44:04 +08:00
Paul M.
f2c77f06c5
Update fs4 to latest (0.13.1) ( #2654 )
...
- One change was needed to handle the `Result<bool>` that now returns from `try_lock_exclusive`
Co-authored-by: Paul M. <prov223@tutanota.com >
2025-07-14 11:26:19 +08:00
PSeitz
945af922d1
clippy ( #2661 )
...
* clippy
* use readable version
---------
Co-authored-by: Pascal Seitz <pascal.seitz@datadoghq.com >
2025-07-02 11:25:03 +02:00
PSeitz-dd
295d07e55c
fix union performance regression ( #2663 )
...
closes https://github.com/quickwit-oss/tantivy/issues/2656
2025-07-01 20:32:25 +02:00
Stu Hood
a2400f4e73
Add string fast field support to TopDocs. ( #2642 )
...
* Add string fast field support to `TopDocs`.
* Remove unnecessary generics, and review feedback.
* Use actual/less-ambiguous cities.
* Review feedback
2025-06-20 10:27:14 +02:00
Zhang.Jinrui
436ec6caea
fix typo for the comments of search_with_executor() ( #2653 )
...
Co-authored-by: Zhang Jinrui <zhangjinrui@microsoft.com >
2025-06-19 09:53:21 +02:00
PSeitz
2b668bd2bf
readability improvement on executor ( #2615 )
2025-04-08 18:28:49 +02:00
Remi Dettai
b681ec9335
Fix compilation stability
2025-04-01 09:33:33 +02:00
trinity Pointard
9426d5be7b
fix agg Key PartialEq impl
2025-03-14 14:57:45 +01:00
Paul Masurel
519e5d2ed1
clippy warnings
2025-03-05 11:15:06 +01:00
Paul Masurel
0afabad494
Cargo fmt
2025-03-05 11:07:46 +01:00
Remi Dettai
89b052cd42
Catch panics during merges ( #2582 )
...
* Adding panic handler for the rayon merge thread pool
* Return panic message in error
---------
Co-authored-by: Paul Masurel <paul.masurel@datadoghq.com >
2025-03-05 10:36:48 +01:00
SteveLauC
c48c649436
refactor: use std AtomicU64 and remove wrapper ( #2585 )
2025-02-24 03:56:15 +01:00
Paul Masurel
58c0739953
Merge pull request #2581 from quickwit-oss/merge_dict_column_repro
...
use usize in bitpacker
2025-02-21 10:53:07 +09:00
Pascal Seitz
e7daf69de9
use usize in bitpacker
...
use usize in bitpacker to enable larger columns in the columnar store
Godbolt comparison with u32 vs u64 for get access: https://godbolt.org/z/cjf7nenYP
Add a mini-tool to inspect columnar files created by tantivy. (very basic functionality which can be extended later)
2025-02-20 15:39:10 +01:00
trinity Pointard
0368162ef0
make DateHistogramAggregationReq buildable
2025-02-18 11:45:24 +01:00
trinity-1686a
d281ca3e65
Merge pull request #2559 from quickwit-oss/trinity/sstable-partial-automaton
...
allow warming partially an sstable for an automaton
2025-01-08 16:35:35 +01:00
trinity Pointard
be17daf658
split iterator
2025-01-08 16:24:34 +01:00
trinity Pointard
6ca84a61fa
make termdict always clone
2025-01-08 16:19:54 +01:00
trinity Pointard
037d12c9c9
fix deadlocking on automaton warmup
2025-01-06 11:58:58 +01:00
Remi Dettai
71cf19870b
Exist queries match subpath fields ( #2558 )
...
* Exist queries match subpath fields
* Make subpath check optional
* Add async subpath listing
2025-01-06 10:17:39 +01:00
trinity Pointard
175a529c41
use executor for cpu-heavy sstable decompression for automaton
2025-01-03 19:14:07 +01:00
trinity Pointard
fe0c7c5408
change rangebound style
2025-01-02 11:56:05 +01:00
Harrison Burt
148594f0f9
Improve IndexWriter customisation via builder ( #2562 )
...
* Improve `IndexWriter` customisation via builder
* Remove change noise from PR
* Correct documentation
* Resolve comments and add test
2025-01-02 09:43:22 +01:00
trinity Pointard
dfff5f3bcb
rename merge_holes_under => merge_holes_under_bytes
2024-12-23 16:17:44 +01:00
trinity-1686a
ebf4d84553
add comment about cpu-intensive operation in async context
2024-12-20 12:23:49 +01:00
trinity-1686a
a1447cc9c2
remove breaking change in sstable public api
2024-12-19 17:30:05 +01:00
trinity-1686a
c39d91f827
Merge pull request #2547 from quickwit-oss/trinity/count-str
...
add support for counting non integer in aggregation
2024-12-17 15:27:30 +01:00
trinity Pointard
32b6e9711b
add tests
2024-12-13 16:06:24 +01:00
trinity-1686a
24c5dc2398
allow warming up automaton
2024-12-10 13:32:12 +01:00
Pierre Barre
6e02c5cb25
Make NUM_MERGE_THREADS configurable ( #2535 )
...
* Make `NUM_MERGE_THREADS` configurable
* Remove unused import
* Reword comment src/index/index.rs
Co-authored-by: PSeitz <PSeitz@users.noreply.github.com >
---------
Co-authored-by: PSeitz <PSeitz@users.noreply.github.com >
2024-12-09 16:53:11 +08:00
trinity-1686a
0bac391291
add support for counting non integer in aggregation
2024-11-28 19:52:47 +01:00
Paul Masurel
c35a782747
Updating rustc-hash and clippy fixes ( #2532 )
...
* Updating rustc-hash and clippy fixes
* fix terms_aggregation_min_doc_count_special_case
---------
Co-authored-by: Pascal Seitz <pascal.seitz@gmail.com >
2024-11-01 13:46:26 +08:00