PSeitz
72f61ff89c
remove index sorting ( #2434 )
...
closes https://github.com/quickwit-oss/tantivy/issues/2352
2024-06-13 15:51:53 +08:00
PSeitz
a141c3ec59
add columnar format compatibiliy tests ( #2433 )
...
* add columnar format compatibiliy tests
* always try to write current format
2024-06-13 15:04:52 +08:00
PSeitz
e90e7a25ae
add access benchmark for columnar ( #2432 )
2024-06-12 14:29:15 +08:00
PSeitz
c3b92a5412
fix compiler warning, cleanup ( #2393 )
...
fix compiler warning for missing feature flag
remove unused variables
cleanup unused methods
2024-06-11 16:03:50 +08:00
PSeitz
2f55511064
extend indexwriter proptests ( #2342 )
...
* index random values in proptest
* add proptest with multiple docs
2024-06-11 16:02:57 +08:00
trinity-1686a
08b9fc0b31
fix de-escaping too much in query parser ( #2427 )
...
* fix de-escaping too much in query parser
2024-06-10 11:19:01 +02:00
PSeitz
714f363d43
add bench & test for columnar merging ( #2428 )
...
* add merge columnar proptest
* add columnar merge benchmark
2024-06-10 16:26:16 +08:00
PSeitz
93ff7365b0
reduce top hits aggregation memory consumption ( #2426 )
...
move request structure out of top hits aggregation collector and use from the
passed structure instead
full
terms_many_with_top_hits Memory: 58.2 MB (-43.64%) Avg: 425.9680ms (-21.38%) Median: 415.1097ms (-23.56%) [395.5303ms .. 484.6325ms]
dense
terms_many_with_top_hits Memory: 58.2 MB (-43.64%) Avg: 440.0817ms (-19.68%) Median: 432.2286ms (-21.10%) [403.5632ms .. 497.7541ms]
sparse
terms_many_with_top_hits Memory: 13.1 MB (-49.31%) Avg: 33.3568ms (-32.19%) Median: 33.0834ms (-31.86%) [32.5126ms .. 35.7397ms]
multivalue
terms_many_with_top_hits Memory: 58.2 MB (-43.64%) Avg: 414.2340ms (-25.44%) Median: 413.4144ms (-25.64%) [403.9919ms .. 430.3170ms]
2024-06-06 22:32:58 +08:00
Adam Reichold
8151925068
Panicking in spawned Rayon tasks will abort the process by default. ( #2409 )
2024-06-04 17:04:30 +09:00
dependabot[bot]
b960e40bc8
Update sketches-ddsketch requirement from 0.2.1 to 0.3.0 ( #2423 )
...
Updates the requirements on [sketches-ddsketch](https://github.com/mheffner/rust-sketches-ddsketch ) to permit the latest version.
- [Release notes](https://github.com/mheffner/rust-sketches-ddsketch/releases )
- [Commits](https://github.com/mheffner/rust-sketches-ddsketch/compare/v0.2.1...v0.3.0 )
---
updated-dependencies:
- dependency-name: sketches-ddsketch
dependency-type: direct:production
...
Signed-off-by: dependabot[bot] <support@github.com >
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-06-04 15:50:23 +08:00
giovannicuccu
1095c9b073
Issue 1787 extended stats ( #2247 )
...
* first version of extended stats along with its tests
* using IntermediateExtendStats instead of IntermediateStats with all tests passing
* Created struct for request and response
* first test with extended_stats
* kahan summation and tests with approximate equality
* version ready for merge
* removed approx dependency
* refactor for using ExtendedStats only when needed
* interim version
* refined version with code formatted
* refactored a struct
* cosmetic refactor
* fix after merge
* fix format
* added extended_stat bench
* merge and new benchmark for extended stats
* split stat segment collectors
* wrapped intermediate extended stat with a box to limit memory usage
* Revert "wrapped intermediate extended stat with a box to limit memory usage"
This reverts commit 5b4aa9f393 .
* some code reformat, commented kahan summation
* refactor after review
* refactor after code review
* fix after incorrectly restoring kahan summation
* modifications for code review + bug fix in merge_fruit
* refactor assert_nearly_equals macro
* update after code review
---------
Co-authored-by: Giovanni Cuccu <gcuccu@imolainformatica.it >
2024-06-04 14:25:17 +08:00
PSeitz
c0686515a9
update one_shot ( #2420 )
2024-05-31 11:07:35 +08:00
trinity-1686a
455156f51c
improve query parser ( #2416 )
...
* support escape sequence in more place
and fix bug with singlequoted strings
* add query parser test for range query on default field
2024-05-30 17:29:27 +02:00
Meng Zhang
4143d31865
chore: fix build as the rev is gone ( #2417 )
2024-05-29 09:49:16 +08:00
Hamir Mahal
0c634adbe1
style: simplify strings with string interpolation ( #2412 )
...
* style: simplify strings with string interpolation
* fix: formatting
2024-05-27 09:16:47 +02:00
PSeitz
2e3641c2ae
return CompactDocValue instead of trait ( #2410 )
...
The CompactDocValue is easier to handle than the trait in some cases like comparison
and conversion
2024-05-27 07:33:50 +02:00
Paul Masurel
b806122c81
Fixing flaky test ( #2407 )
2024-05-22 10:10:55 +09:00
PSeitz
e1679f3fb9
compact doc ( #2402 )
...
* compact doc
* add any value type
* pass references when building CompactDoc
* remove OwnedValue from API
* clippy
* clippy
* fail on large documents
* fmt
* cleanup
* cleanup
* implement Value for different types
fix serde_json date Value implementation
* fmt
* cleanup
* fmt
* cleanup
* store positions instead of pos+len
* remove nodes array
* remove mediumvec
* cleanup
* infallible serialize into vec
* remove positions indirection
* remove 24MB limitation in document
use u32 for Addr
Remove the 3 byte addressing limitation and use VInt instead
* cleanup
* extend test
* cleanup, add comments
* rename, remove pub
2024-05-21 10:16:08 +02:00
dependabot[bot]
5a80420b10
--- ( #2406 )
...
updated-dependencies:
- dependency-name: binggan
dependency-type: direct:production
...
Signed-off-by: dependabot[bot] <support@github.com >
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-05-21 04:36:32 +02:00
dependabot[bot]
aa26ff5029
Update binggan requirement from 0.6.2 to 0.7.0 ( #2401 )
...
---
updated-dependencies:
- dependency-name: binggan
dependency-type: direct:production
...
Signed-off-by: dependabot[bot] <support@github.com >
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-05-17 02:53:25 +02:00
dependabot[bot]
e197b59258
Update itertools requirement from 0.12.0 to 0.13.0 ( #2400 )
...
Updates the requirements on [itertools](https://github.com/rust-itertools/itertools ) to permit the latest version.
- [Changelog](https://github.com/rust-itertools/itertools/blob/master/CHANGELOG.md )
- [Commits](https://github.com/rust-itertools/itertools/compare/v0.12.0...v0.13.0 )
---
updated-dependencies:
- dependency-name: itertools
dependency-type: direct:production
...
Signed-off-by: dependabot[bot] <support@github.com >
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-05-17 02:53:02 +02:00
PSeitz
5b7cca13e5
lower contention on AggregationLimits ( #2394 )
...
PR https://github.com/quickwit-oss/quickwit/pull/4962 fixes an issue
where the AggregationLimits are not passed correctly. Since the
AggregationLimits are shared properly we run into contention issues.
This PR includes some straightforward improvement to reduce contention,
by only calling if the memory changed and avoiding the second read.
We probably need some sharding with multiple counters or local caching before updating the
global after some threshold.
2024-05-15 12:25:40 +02:00
dependabot[bot]
a79590477e
Update binggan requirement from 0.5.2 to 0.6.2 ( #2399 )
...
---
updated-dependencies:
- dependency-name: binggan
dependency-type: direct:production
...
Signed-off-by: dependabot[bot] <support@github.com >
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-05-15 05:40:37 +02:00
Paul Masurel
6181c1eb5e
Small changes in the Executor API. ( #2391 )
...
Warning, this change is mildly not backward compatible
so I bumped tantivy's version.
2024-05-10 17:19:12 +09:00
Adam Reichold
1ee5f90761
Give allocation control to the caller instead of force a clone ( #2389 )
...
Achieved by moving the boxes out of the temporary reference wrappers which are
cloneable themselves, i.e. if required the caller can clone them already or
consume them to reuse existing allocations.
2024-05-09 16:01:13 +09:00
PSeitz
71f3b4e4e3
fix ReferenceValue API flaw ( #2372 )
...
* fix ReferenceValue API flaw
Remove `Facet` and `TokenizedString` values from the `ReferenceValue` API,
as this requires the trait value to have them stored somewhere.
Since `TokenizedString` is quite niche, I just copy it into a Box,
instead of designing a reference API around it.
* fix comment link
2024-05-09 06:14:42 +02:00
trinity-1686a
8cd7ddc535
run block decompression from executor ( #2386 )
...
* run block decompression from executor
* add a wrapper with is_closed to oneshot channel
* add cancelation test to Executor::spawn_blocking
2024-05-08 12:22:44 +02:00
Paul Masurel
2b76335a95
Removed usage of num_cpus ( #2387 )
...
* Removed usage of num_cpus
* handling error
2024-05-08 13:32:52 +09:00
PSeitz
c6b213d8f0
use bingang for agg benchmark ( #2378 )
...
* use bingang for agg benchmark
use bingang for agg benchmark, which includes memory consumption
Output:
```
full
histogram Memory: 15.8 KB Avg: 10.9322ms (+5.44%) Median: 10.8790ms (+9.28%) Min: 10.7470ms Max: 11.3263ms
histogram_hard_bounds Memory: 15.5 KB Avg: 5.1939ms (+6.61%) Median: 5.1722ms (+10.98%) Min: 5.0432ms Max: 5.3910ms
histogram_with_avg_sub_agg Memory: 48.7 KB Avg: 23.8165ms (+4.57%) Median: 23.7264ms (+10.06%) Min: 23.4995ms Max: 24.8107ms
dense
histogram Memory: 17.3 KB Avg: 15.6810ms (-8.54%) Median: 15.6174ms (-8.89%) Min: 15.4953ms Max: 16.0702ms
histogram_hard_bounds Memory: 15.4 KB Avg: 10.0720ms (-7.33%) Median: 10.0572ms (-7.06%) Min: 9.8500ms Max: 10.4819ms
histogram_with_avg_sub_agg Memory: 50.1 KB Avg: 33.0993ms (-7.04%) Median: 32.9499ms (-6.86%) Min: 32.8284ms Max: 34.0529ms
sparse
histogram Memory: 16.3 KB Avg: 19.2325ms (-0.44%) Median: 19.1211ms (-1.26%) Min: 19.0348ms Max: 19.7902ms
histogram_hard_bounds Memory: 16.1 KB Avg: 18.5179ms (-0.61%) Median: 18.4552ms (-0.90%) Min: 18.3799ms Max: 19.0535ms
histogram_with_avg_sub_agg Memory: 34.7 KB Avg: 21.2589ms (-0.69%) Median: 21.1867ms (-1.05%) Min: 21.0342ms Max: 21.9900ms
```
* add more bench with term as sub agg
2024-05-07 11:29:49 +02:00
PSeitz
eea70030bf
cleanup top level exports ( #2382 )
...
remove some top level exports
2024-05-07 09:59:41 +02:00
PSeitz
92b5526310
allow more JSON values, fix i64 special case ( #2383 )
...
This changes three things:
- Reuse positions_per_path hashmap instead of allocating one per
indexed JSON value
- Try to cast u64 values to i64 to streamline with search behaviour
- Allow top level json values to be of any type, instead of limiting it
to JSON objects. Remove special JSON object handling method.
TODO: We probably should also try to check f64 to i64 and u64 when
indexing, as values may get converted to f64 by the JSON parser
2024-05-01 12:08:12 +02:00
PSeitz
99a59ad37e
remove zero byte check ( #2379 )
...
remove zero byte checks in columnar. zero bytes are converted during serialization now.
unify code paths
extend test for expected column names
2024-04-26 06:03:28 +02:00
trinity-1686a
6a66a71cbb
modify fastfield range query heuristic ( #2375 )
2024-04-25 10:06:11 +02:00
PSeitz
ff40764204
make convert_to_fast_value_and_append_to_json_term pub ( #2370 )
...
* make convert_to_fast_value_and_append_to_json_term pub
* clippy
2024-04-23 04:05:41 +02:00
PSeitz
047da20b5b
add json path constructor to term ( #2367 )
2024-04-22 12:23:35 +02:00
PSeitz
1417eaf3a7
fix coverage ( #2368 )
2024-04-22 12:23:15 +02:00
PSeitz
4f8493d2de
improve document docs ( #2359 )
2024-04-22 12:05:16 +02:00
Paul Masurel
8861366137
Owned value relying on Vec instead of BTreeMap ( #2364 )
...
* Owned value relying on Vec instead of BTreeMap
* fmt
* fix build
* fix serialization
---------
Co-authored-by: Pascal Seitz <pascal.seitz@gmail.com >
2024-04-22 09:38:05 +02:00
PSeitz
0e9fced336
remove JsonTermWriter ( #2238 )
...
* remove JsonTermWriter
remove JsonTermWriter
remove path truncation logic, add assertion
* fix json_path_writer add sep logic
2024-04-18 16:28:05 +02:00
PSeitz
b257b960b3
validate sort by field type ( #2336 )
...
* validate sort by field type
* Update src/index/index.rs
Co-authored-by: Adam Reichold <adamreichold@users.noreply.github.com >
---------
Co-authored-by: Adam Reichold <adamreichold@users.noreply.github.com >
2024-04-16 04:42:24 +02:00
Adam Reichold
4708171a32
Fix some of the things current Clippy complains about ( #2363 )
2024-04-16 04:27:06 +02:00
Adam Reichold
b493743f8d
Fix trait bound of StoreReader::iter ( #2360 )
...
* Fix trait bound of StoreReader::iter
Similar to `StoreReader::get`, `StoreReader::iter` should only require
`DocumentDeserialize` and not `Document`.
* Mark the iterator returned by SegmentReader::doc_ids_alive as Send so it can be used in impls of Stream/AsyncIterator.
2024-04-15 15:50:02 +02:00
trinity-1686a
d2955a3fd2
extend field grouping ( #2333 )
...
* extend field grouping
2024-04-15 10:36:32 +02:00
PSeitz
17d5869ad6
update CHANGELOG, use github API in cliff ( #2354 )
...
* update CHANGELOG, use github API in cliff
* reset version to 0.21.1, before release
* chore: Release
* remove unreleased from CHANGELOG
0.22.0
2024-04-15 10:07:20 +02:00
PSeitz
dfa3aed32d
check unsupported parameters top_hits ( #2351 )
...
* check unsupported parameters top_hits
* move to function
2024-04-10 08:20:52 +02:00
PSeitz
398817ce7b
add index sorting deprecation warning ( #2353 )
...
* add index sorting deprecation warning
* remove deprecated IntOptions and DatePrecision
2024-04-10 08:09:09 +02:00
PSeitz
74940e9345
clippy ( #2349 )
...
* fix clippy
* fix clippy
* fix duplicate imports
2024-04-09 07:54:44 +02:00
PSeitz
1e9fc51535
update ahash ( #2344 )
2024-04-09 06:35:39 +02:00
PSeitz
92c32979d2
fix postcard compatibility for top_hits, add postcard test ( #2346 )
...
* fix postcard compatibility for top_hits, add postcard test
* fix top_hits naming, delay data fetch
closes #2347
* fix import
2024-04-09 06:17:25 +02:00
PSeitz
b644d78a32
fix null byte handling in JSON paths ( #2345 )
...
* fix null byte handling in JSON paths
closes https://github.com/quickwit-oss/tantivy/issues/2193
closes https://github.com/quickwit-oss/tantivy/issues/2340
* avoid repeated term truncation
* fix test
* Apply suggestions from code review
Co-authored-by: Paul Masurel <paul@quickwit.io >
* add comment
---------
Co-authored-by: Paul Masurel <paul@quickwit.io >
2024-04-05 09:53:35 +02:00