Commit Graph

3239 Commits

Author SHA1 Message Date
dependabot[bot]
99be20cedd Update binggan requirement from 0.10.0 to 0.12.0 (#2519)
* Update binggan requirement from 0.10.0 to 0.12.0

---
updated-dependencies:
- dependency-name: binggan
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>

* fix build

---------

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Pascal Seitz <pascal.seitz@gmail.com>
2024-10-16 11:36:04 +08:00
Bruce Mitchener
5f026901b8 Update MSRV to 1.75 (#2515)
This is required by the `fs4` dependency. There are other
things that need something later than 1.66.

Both quickwit and the Python binding already require something
newer.
2024-10-16 10:32:16 +08:00
baishen
6dfa2df06f fix OwnedBytes debug panic (#2512) 2024-10-16 10:31:40 +08:00
Bruce Mitchener
c17e513377 Reduce typo count. (#2510) 2024-10-10 09:55:37 +08:00
PSeitz
2f5a269c70 update packages (#2500)
fixes some warnings
2024-09-25 17:46:18 +08:00
PSeitz
50532260e3 update changelog (#2496) 2024-09-25 10:28:53 +08:00
Tri
8bd6eb06e6 feat: make SegmentMeta.with_max_doc public (#2499)
* chore: add container

* feat: make max doc editable externally

* chore: expose another method

* chore: remove comments

* remove unused devcontainer

* chore: manually match nightly format

* chore: change weird formating

* revert format change

* fix: format with nightly
2024-09-23 12:39:36 +08:00
PSeitz
55b0b52457 Fix AggregationLimits (#2495)
* change AggregationLimits behavior

This fixes an issue encountered with the current behaviour of
AggregationLimits.
Previously we had AggregationLimits and RessourceLimitGuard, which both
track the memory, but only RessourceLimitGuard released memory when
dropped, while AggregationLimits did not.

This PR changes AggregationLimits to be a guard itself and removes the
RessourceLimitGuard.

* rename AggregationLimits to AggregationLimitsGuard
2024-09-17 14:25:47 +08:00
dependabot[bot]
56fc56c5b9 Update binggan requirement from 0.8.0 to 0.10.0 (#2493)
* Update binggan requirement from 0.8.0 to 0.10.0

---
updated-dependencies:
- dependency-name: binggan
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>

* update PR

---------

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Pascal Seitz <pascal.seitz@gmail.com>
2024-09-10 14:26:06 +08:00
trinity-1686a
85395d942a fix clippy lints from 1.80-1.81 (#2488)
* fix some clippy lints

* fix clippy::doc_lazy_continuation

* fix some lints for 1.82
2024-09-05 14:33:05 +02:00
PSeitz
a206c3ccd3 add compat tests (#2485) 2024-09-04 18:26:57 +08:00
Chaya
dc5d31c116 grammar and misspellings (#2483)
* grammar

* grammar

* misspelling
2024-09-04 12:45:31 +08:00
gezihuzi
95a4ddea3e Fix: Improve collapse_overlapped_ranges function (#2474)
* Fix: Improve collapse_overlapped_ranges function

- Refactor into separate sort_and_deduplicate_ranges and merge_overlapping_ranges functions
- Enhance sorting to consider both start and end of ranges
- Optimize merging logic to handle adjacent ranges
- Add comprehensive examples in function documentation
- Ensure proper handling of duplicate and unsorted input ranges
- Improve overall efficiency and readability of range collapsing algorithm

* move debug_assert

---------

Co-authored-by: PSeitz <PSeitz@users.noreply.github.com>
2024-09-04 12:39:13 +08:00
trinity-1686a
ab5125d3dc remove unused trait bounds and outdated doc comment (#2478) 2024-09-03 16:31:51 +02:00
trinity-1686a
9f81d59ecd make find_field_with_default return json fields without path (#2476)
* make find_field_with_default return json fields without path

* add tests for find_field_with_default
2024-08-19 15:25:29 +02:00
PSeitz
c71ec8086d add FastFieldRangeQuery, rename (#2477)
* add FastFieldRangeQuery, rename

* remove Query impl
2024-08-19 09:02:00 +02:00
PSeitz
27be6aed91 lift clauses in LogicalAst (#2449)
(a OR b) OR (c OR d) can be simplified to (a OR b OR c OR d)
(a AND b) AND (c AND d) can be simplified to (a AND b AND c AND d)

This directly affects how queries are executed

remove unused SumWithCoordsCombiner
the number of fields is unused and private
2024-08-14 19:21:26 +02:00
PSeitz
3d1c4b313a support ff range queries on json fields (#2456)
* support ff range queries on json fields

* fix term date truncation

* use inverted index range query for phrase prefix queries

* rename to InvertedIndexRangeQuery

* fix column filter, add mixed column test
2024-08-02 00:06:50 +08:00
PSeitz
0d4e319965 add Key::I64 and Key::U64 variants in aggregation (#2468)
* add Key::I64 and Key::U64 variants in aggregation

Currently all `Key` numerical values are returned as f64. This causes problems in some
cases with the precision and the way f64 is serialized.

This PR adds `Key::I64` and `Key::U64` variants and uses them in the term
aggregation.

* add clarification comment
2024-07-31 20:29:32 +08:00
PSeitz
75dc3eb298 extend custom order deserialization (#2451)
allow arrays
improve validation
closes https://github.com/quickwit-oss/tantivy/issues/2435
2024-07-30 18:36:08 +08:00
PSeitz
3f6d225086 fix potential endless loop in merge (#2457)
avoid single segments lists without deletes as merge candidates, as they will be moved
to a merge operation and filtered for merging in the next
consider_merge_options call. In rare cases this may end up in a endless
merge loop where only single segments where nothing is to be done are
merged.
2024-07-30 16:37:20 +08:00
PSeitz
d8843c608c make FastFieldRangeWeight::new pub (#2460) 2024-07-29 10:39:27 +08:00
PSeitz
7ebcc15b17 add support for str fast field range query (#2453)
* add support for str fast field range query

Add support for range queries on fast fields, by converting term bounds to
term ordinals bounds.

closes https://github.com/quickwit-oss/tantivy/issues/2023

* extend tests, rename

* update comment

* update comment
2024-07-17 09:31:42 +08:00
PSeitz
1b4076691f refactor fast field query (#2452)
As preparation of #2023 and #1709

* Use Term to pass parameters
* merge u64 and ip fast field range query

Side note: I did not rename range_query_u64_fastfield, because then git can't track the changes.
2024-07-15 18:08:05 +08:00
Robert Caulk
eab660873a doc: fix typo in readme (#2450) 2024-07-09 15:12:22 +08:00
PSeitz
232f37126e fix coverage (#2448) 2024-07-05 12:04:18 +08:00
PSeitz
13e9885dfd faster term aggregation fetch terms (#2447)
big impact for term aggregations with large `size` parameter (e.g. 1000)
add top 1000 term agg bench

full
terms_few                                      Memory: 27.3 KB (+79.09%)    Avg: 3.8058ms (+2.40%)      Median: 3.7192ms (+3.47%)       [3.6224ms .. 4.3721ms]
terms_many                                     Memory: 6.9 MB               Avg: 12.6102ms (-4.70%)     Median: 12.1389ms (-6.58%)      [10.2847ms .. 15.4857ms]
terms_many_top_1000                            Memory: 6.9 MB               Avg: 15.8216ms (-83.19%)    Median: 15.4899ms (-83.46%)     [13.4250ms .. 20.6897ms]
terms_many_order_by_term                       Memory: 6.9 MB               Avg: 14.7820ms (-3.95%)     Median: 14.2236ms (-4.28%)      [12.6669ms .. 21.0968ms]
terms_many_with_top_hits                       Memory: 58.2 MB              Avg: 551.6218ms (+7.18%)    Median: 549.8826ms (+11.01%)    [496.7371ms .. 592.1299ms]
terms_many_with_avg_sub_agg                    Memory: 27.8 MB              Avg: 197.7029ms (+2.66%)    Median: 190.1564ms (+0.64%)     [167.9226ms .. 245.6651ms]
terms_many_json_mixed_type_with_avg_sub_agg    Memory: 42.0 MB (+0.00%)     Avg: 242.0121ms (+0.92%)    Median: 237.7084ms (-2.85%)     [201.9959ms .. 302.2136ms]
terms_few_with_cardinality_agg                 Memory: 10.6 MB              Avg: 122.6036ms (+1.21%)    Median: 119.0033ms (+2.60%)     [109.2859ms .. 161.5858ms]
range_agg_with_term_agg_few                    Memory: 45.4 KB (+39.75%)    Avg: 24.5454ms (+2.14%)     Median: 24.2861ms (+2.44%)      [23.5109ms .. 27.8406ms]
range_agg_with_term_agg_many                   Memory: 6.9 MB               Avg: 56.8049ms (+3.01%)     Median: 50.9706ms (+1.52%)      [41.4517ms .. 90.3934ms]
dense
terms_few                                      Memory: 28.8 KB (+81.74%)    Avg: 8.9092ms (-2.24%)      Median: 8.7143ms (-1.31%)      [8.6148ms .. 10.3868ms]
terms_many                                     Memory: 6.9 MB (-0.00%)      Avg: 17.9604ms (-10.18%)    Median: 17.1552ms (-11.93%)    [14.8979ms .. 26.2779ms]
terms_many_top_1000                            Memory: 6.9 MB               Avg: 21.4963ms (-78.90%)    Median: 21.2924ms (-78.98%)    [18.2033ms .. 28.0087ms]
terms_many_order_by_term                       Memory: 6.9 MB               Avg: 20.4167ms (-9.13%)     Median: 19.5596ms (-11.37%)    [17.5153ms .. 29.5987ms]
terms_many_with_top_hits                       Memory: 58.2 MB              Avg: 518.4474ms (-6.41%)    Median: 514.9180ms (-9.44%)    [471.5550ms .. 579.0220ms]
terms_many_with_avg_sub_agg                    Memory: 27.8 MB              Avg: 263.6702ms (-2.78%)    Median: 260.8775ms (-2.55%)    [239.5754ms .. 304.6669ms]
terms_many_json_mixed_type_with_avg_sub_agg    Memory: 42.0 MB              Avg: 299.9791ms (-2.01%)    Median: 302.2180ms (-3.08%)    [239.2080ms .. 346.3649ms]
terms_few_with_cardinality_agg                 Memory: 10.6 MB              Avg: 136.3303ms (-3.12%)    Median: 132.3831ms (-2.88%)    [123.7564ms .. 164.7914ms]
range_agg_with_term_agg_few                    Memory: 47.1 KB (+37.81%)    Avg: 35.4538ms (+0.66%)     Median: 34.8754ms (-0.56%)     [34.2287ms .. 40.0884ms]
range_agg_with_term_agg_many                   Memory: 6.9 MB               Avg: 72.2269ms (-4.38%)     Median: 66.1174ms (-4.98%)     [55.5125ms .. 124.1622ms]
sparse
terms_few                                      Memory: 27.3 KB (+69.68%)    Avg: 19.6053ms (-1.15%)     Median: 19.4543ms (-0.38%)     [19.3056ms .. 24.0547ms]
terms_many                                     Memory: 1.8 MB               Avg: 21.2886ms (-6.28%)     Median: 21.1287ms (-6.65%)     [20.6640ms .. 24.6144ms]
terms_many_top_1000                            Memory: 2.6 MB               Avg: 23.4869ms (-85.53%)    Median: 23.3393ms (-85.61%)    [22.7789ms .. 25.0896ms]
terms_many_order_by_term                       Memory: 1.8 MB               Avg: 21.7437ms (-7.78%)     Median: 21.6272ms (-7.66%)     [21.0409ms .. 23.6517ms]
terms_many_with_top_hits                       Memory: 13.1 MB              Avg: 43.7926ms (-2.76%)     Median: 44.3602ms (+0.01%)     [37.8039ms .. 51.0451ms]
terms_many_with_avg_sub_agg                    Memory: 7.5 MB               Avg: 34.6307ms (+3.72%)     Median: 33.4522ms (+1.16%)     [32.4418ms .. 41.4196ms]
terms_many_json_mixed_type_with_avg_sub_agg    Memory: 7.4 MB               Avg: 46.4318ms (+1.16%)     Median: 46.4050ms (+2.03%)     [44.5986ms .. 48.5142ms]
terms_few_with_cardinality_agg                 Memory: 680.0 KB (-0.04%)    Avg: 35.4410ms (+2.05%)     Median: 35.1384ms (+1.19%)     [34.4402ms .. 39.1082ms]
range_agg_with_term_agg_few                    Memory: 45.7 KB (+39.44%)    Avg: 22.7760ms (+0.44%)     Median: 22.5152ms (-0.35%)     [22.3078ms .. 26.1567ms]
range_agg_with_term_agg_many                   Memory: 1.8 MB               Avg: 25.7696ms (-4.45%)     Median: 25.4009ms (-5.61%)     [24.7874ms .. 29.6434ms]
multivalue
terms_few                                      Memory: 244.4 KB            Avg: 15.1253ms (-2.85%)     Median: 15.0988ms (-0.54%)     [14.8790ms .. 15.8193ms]
terms_many                                     Memory: 6.9 MB (-0.00%)     Avg: 26.3019ms (-6.24%)     Median: 26.3662ms (-4.94%)     [21.3553ms .. 31.0564ms]
terms_many_top_1000                            Memory: 6.9 MB              Avg: 29.5212ms (-72.90%)    Median: 29.4257ms (-72.84%)    [24.2645ms .. 35.1607ms]
terms_many_order_by_term                       Memory: 6.9 MB              Avg: 28.6076ms (-4.93%)     Median: 28.1059ms (-6.64%)     [24.0845ms .. 34.1493ms]
terms_many_with_top_hits                       Memory: 58.3 MB             Avg: 570.1548ms (+1.52%)    Median: 572.7759ms (+0.53%)    [525.9567ms .. 617.0862ms]
terms_many_with_avg_sub_agg                    Memory: 27.8 MB             Avg: 305.5207ms (+0.24%)    Median: 296.0101ms (-0.22%)    [277.8579ms .. 373.5914ms]
terms_many_json_mixed_type_with_avg_sub_agg    Memory: 42.0 MB (-0.00%)    Avg: 324.7342ms (-2.51%)    Median: 319.0025ms (-2.58%)    [298.7122ms .. 368.6144ms]
terms_few_with_cardinality_agg                 Memory: 10.8 MB             Avg: 151.6126ms (-2.54%)    Median: 149.0616ms (-0.32%)    [136.5592ms .. 181.8942ms]
range_agg_with_term_agg_few                    Memory: 248.2 KB            Avg: 49.5225ms (+3.11%)     Median: 48.3994ms (+3.18%)     [46.4134ms .. 60.5989ms]
range_agg_with_term_agg_many                   Memory: 6.9 MB              Avg: 85.9824ms (-3.66%)     Median: 78.4266ms (-3.85%)     [64.1231ms .. 128.5279ms]
2024-07-03 12:42:59 +08:00
PSeitz
56d79cb203 fix cardinality aggregation performance (#2446)
* fix cardinality aggregation performance

fix cardinality performance by fetching multiple terms at once. This
avoids decompressing the same block and keeps the buffer state between
terms.

add cardinality aggregation benchmark

bump rust version to 1.66

Performance comparison to before (AllQuery)
```
full
cardinality_agg                   Memory: 3.5 MB (-0.00%)    Avg: 21.2256ms (-97.78%)    Median: 21.0042ms (-97.82%)    [20.4717ms .. 23.6206ms]
terms_few_with_cardinality_agg    Memory: 10.6 MB            Avg: 81.9293ms (-97.37%)    Median: 81.5526ms (-97.38%)    [79.7564ms .. 88.0374ms]
dense
cardinality_agg                   Memory: 3.6 MB (-0.00%)    Avg: 25.9372ms (-97.24%)    Median: 25.7744ms (-97.25%)    [24.7241ms .. 27.8793ms]
terms_few_with_cardinality_agg    Memory: 10.6 MB            Avg: 93.9897ms (-96.91%)    Median: 92.7821ms (-96.94%)    [90.3312ms .. 117.4076ms]
sparse
cardinality_agg                   Memory: 895.4 KB (-0.00%)    Avg: 22.5113ms (-95.01%)    Median: 22.5629ms (-94.99%)    [22.1628ms .. 22.9436ms]
terms_few_with_cardinality_agg    Memory: 680.2 KB             Avg: 26.4250ms (-94.85%)    Median: 26.4135ms (-94.86%)    [26.3210ms .. 26.6774ms]
```

* clippy

* assert for sorted ordinals
2024-07-02 15:29:00 +08:00
Paul Masurel
0f4c2e27cf Fixes bug that causes out-of-order sstable key. (#2445)
The previous way to address the problem was to replace \u{0000}
with 0 in different places.

This logic had several flaws:
Done on the serializer side (like it was for the columnar), there was
a collision problem.

If a document in the segment contained a json field with a \0 and
antoher doc contained the same json field but `0` then we were sending
the same field path twice to the serializer.

Another option would have been to normalizes all values on the writer
side.

This PR simplifies the logic and simply ignore json path containing a
\0, both in the columnar and the inverted index.

Closes #2442
2024-07-01 15:40:07 +08:00
落叶乌龟
f9ae295507 feat(query): Make BooleanQuery supports minimum_number_should_match (#2405)
* feat(query): Make `BooleanQuery` supports `minimum_number_should_match`. see issue #2398

In this commit, a novel scorer named DisjunctionScorer is introduced, which performs the union of inverted chains with the minimal required elements. BTW, it's implemented via a min-heap. Necessary modifications on `BooleanQuery` and `BooleanWeight` are performed as well.

* fixup! fix test

* fixup!: refactor code.

1. More meaningful names.
2. Add Cache for `Disjunction`'s scorers, and fix bug.
3. Optimize `BooleanWeight::complex_scorer`

Thanks
 Paul Masurel <paul@quickwit.io>

* squash!: come up with better variable naming.

* squash!: fix naming issues.

* squash!: fix typo.

* squash!: Remove CombinationMethod::FullIntersection
2024-07-01 15:39:41 +08:00
Raphael Coeffic
d9db5302d9 feat: cardinality aggregation (#2337)
* WiP: cardinality aggregation

* Collect unique entries first, then insert into HyperLogLog

* Handle `missing`

* Hybrid approach

* Review changes

- insert `missing` value at most once
- `term_id` -> `term_ord`
- iterate directly over entries without collecting first

* Use salted hasher to include column type

* fix: formatting

* More review fixes

* Add cardinality to test_aggregation_flushing

* Formatting
2024-07-01 07:49:42 +08:00
Paul Masurel
e453848134 Recycling buffer in PrefixPhraseScorer (#2443) 2024-06-24 17:11:53 +09:00
PSeitz
59084143ef use optional index in multivalued index (#2439)
* use optional index in multivalued index

For mostly empty multivalued indices there was a large overhead during
creation when iterating all docids. This is alleviated by placing an
optional index in the multivalued index to mark documents that have values.

There's some performance overhead when accessing values in a multivalued
index. The accessing cost is now optional index + multivalue index. The
sparse codec performs relatively bad with the binary_search when accessing
data. This is reflected in the benchmarks below.

This changes the format of columnar to v2, but code is added to handle the v1
formats.

```
     Running benches/bench_access.rs (/home/pascal/Development/tantivy/optional_multivalues/target/release/deps/bench_access-ea323c028db88db4)
multi sparse 1/13
access_values_for_doc        Avg: 42.8946ms (+241.80%)    Median: 42.8869ms (+244.10%)    [42.7484ms .. 43.1074ms]
access_first_vals            Avg: 42.8022ms (+421.93%)    Median: 42.7553ms (+439.84%)    [42.6794ms .. 43.7404ms]
multi 2x
access_values_for_doc        Avg: 31.1244ms (+24.17%)    Median: 30.8339ms (+23.46%)    [30.7192ms .. 33.6059ms]
access_first_vals            Avg: 24.3070ms (+70.92%)    Median: 24.0966ms (+70.18%)    [23.9328ms .. 26.4851ms]
sparse 1/13
access_values_for_doc        Avg: 42.2490ms (+0.61%)    Median: 42.2346ms (+2.28%)    [41.8988ms .. 43.7821ms]
access_first_vals            Avg: 43.6272ms (+0.23%)    Median: 43.6197ms (+1.78%)    [43.4920ms .. 43.9009ms]
dense 1/12
access_values_for_doc        Avg: 8.6184ms (+23.18%)    Median: 8.6126ms (+23.78%)    [8.5843ms .. 8.7527ms]
access_first_vals            Avg: 6.8112ms (+4.47%)     Median: 6.8002ms (+4.55%)     [6.7887ms .. 6.8991ms]
full
access_values_for_doc        Avg: 9.4073ms (-5.09%)    Median: 9.4023ms (-2.23%)    [9.3694ms .. 9.4568ms]
access_first_vals            Avg: 4.9531ms (+6.24%)    Median: 4.9502ms (+7.85%)    [4.9423ms .. 4.9718ms]
```

```
     Running benches/bench_merge.rs (/home/pascal/Development/tantivy/optional_multivalues/target/release/deps/bench_merge-475697dfceb3639f)
merge_multi 2x_and_multi 2x                          Avg: 20.2280ms (+34.33%)    Median: 20.1829ms (+35.33%)    [19.9933ms .. 20.8806ms]
merge_multi sparse 1/13_and_multi sparse 1/13        Avg: 0.8961ms (-78.04%)     Median: 0.8943ms (-77.61%)     [0.8899ms .. 0.9272ms]
merge_dense 1/12_and_dense 1/12                      Avg: 0.6619ms (-1.26%)      Median: 0.6616ms (+2.20%)      [0.6473ms .. 0.6837ms]
merge_sparse 1/13_and_sparse 1/13                    Avg: 0.5508ms (-0.85%)      Median: 0.5508ms (+2.80%)      [0.5420ms .. 0.5634ms]
merge_sparse 1/13_and_dense 1/12                     Avg: 0.6046ms (-4.64%)      Median: 0.6038ms (+2.80%)      [0.5939ms .. 0.6296ms]
merge_multi sparse 1/13_and_dense 1/12               Avg: 0.9111ms (-83.48%)     Median: 0.9063ms (-83.50%)     [0.9047ms .. 0.9663ms]
merge_multi sparse 1/13_and_sparse 1/13              Avg: 0.8451ms (-89.49%)     Median: 0.8428ms (-89.43%)     [0.8411ms .. 0.8563ms]
merge_multi 2x_and_dense 1/12                        Avg: 10.6624ms (-4.82%)     Median: 10.6568ms (-4.49%)     [10.5738ms .. 10.8353ms]
merge_multi 2x_and_sparse 1/13                       Avg: 10.6336ms (-22.95%)    Median: 10.5925ms (-22.33%)    [10.5149ms .. 11.5657ms]
```

* Update columnar/src/columnar/format_version.rs

Co-authored-by: Paul Masurel <paul@quickwit.io>

* Update columnar/src/column_index/mod.rs

Co-authored-by: Paul Masurel <paul@quickwit.io>

---------

Co-authored-by: Paul Masurel <paul@quickwit.io>
2024-06-19 14:54:12 +08:00
PSeitz
511b027350 update columnar bench (#2438)
* update columnar bench

* fix compile
2024-06-14 10:42:35 +08:00
Philippe Noël
322f47eb47 Add ParadeDB to Companies List (#1) (#2437)
* Add ParadeDB logo
2024-06-14 09:12:58 +09:00
PSeitz
72f61ff89c remove index sorting (#2434)
closes https://github.com/quickwit-oss/tantivy/issues/2352
2024-06-13 15:51:53 +08:00
PSeitz
a141c3ec59 add columnar format compatibiliy tests (#2433)
* add columnar format compatibiliy tests

* always try to write current format
2024-06-13 15:04:52 +08:00
PSeitz
e90e7a25ae add access benchmark for columnar (#2432) 2024-06-12 14:29:15 +08:00
PSeitz
c3b92a5412 fix compiler warning, cleanup (#2393)
fix compiler warning for missing feature flag
remove unused variables
cleanup unused methods
2024-06-11 16:03:50 +08:00
PSeitz
2f55511064 extend indexwriter proptests (#2342)
* index random values in proptest

* add proptest with multiple docs
2024-06-11 16:02:57 +08:00
trinity-1686a
08b9fc0b31 fix de-escaping too much in query parser (#2427)
* fix de-escaping too much in query parser
2024-06-10 11:19:01 +02:00
PSeitz
714f363d43 add bench & test for columnar merging (#2428)
* add merge columnar proptest

* add columnar merge benchmark
2024-06-10 16:26:16 +08:00
PSeitz
93ff7365b0 reduce top hits aggregation memory consumption (#2426)
move request structure out of top hits aggregation collector and use from the
passed structure instead

full
terms_many_with_top_hits    Memory: 58.2 MB (-43.64%)    Avg: 425.9680ms (-21.38%)    Median: 415.1097ms (-23.56%)    [395.5303ms .. 484.6325ms]
dense
terms_many_with_top_hits    Memory: 58.2 MB (-43.64%)    Avg: 440.0817ms (-19.68%)    Median: 432.2286ms (-21.10%)    [403.5632ms .. 497.7541ms]
sparse
terms_many_with_top_hits    Memory: 13.1 MB (-49.31%)    Avg: 33.3568ms (-32.19%)    Median: 33.0834ms (-31.86%)    [32.5126ms .. 35.7397ms]
multivalue
terms_many_with_top_hits    Memory: 58.2 MB (-43.64%)    Avg: 414.2340ms (-25.44%)    Median: 413.4144ms (-25.64%)    [403.9919ms .. 430.3170ms]
2024-06-06 22:32:58 +08:00
Adam Reichold
8151925068 Panicking in spawned Rayon tasks will abort the process by default. (#2409) 2024-06-04 17:04:30 +09:00
dependabot[bot]
b960e40bc8 Update sketches-ddsketch requirement from 0.2.1 to 0.3.0 (#2423)
Updates the requirements on [sketches-ddsketch](https://github.com/mheffner/rust-sketches-ddsketch) to permit the latest version.
- [Release notes](https://github.com/mheffner/rust-sketches-ddsketch/releases)
- [Commits](https://github.com/mheffner/rust-sketches-ddsketch/compare/v0.2.1...v0.3.0)

---
updated-dependencies:
- dependency-name: sketches-ddsketch
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-06-04 15:50:23 +08:00
giovannicuccu
1095c9b073 Issue 1787 extended stats (#2247)
* first version of extended stats along with its tests

* using IntermediateExtendStats instead of IntermediateStats with all tests passing

* Created struct for request and response

* first test with extended_stats

* kahan summation and tests with approximate equality

* version ready for merge

* removed approx dependency

* refactor for using ExtendedStats only when needed

* interim version

* refined version with code formatted

* refactored a struct

* cosmetic refactor

* fix after merge

* fix format

* added extended_stat bench

* merge and new benchmark for extended stats

* split stat segment collectors

* wrapped intermediate extended stat with a box to limit memory usage

* Revert "wrapped intermediate extended stat with a box to limit memory usage"

This reverts commit 5b4aa9f393.

* some code reformat, commented kahan summation

* refactor after review

* refactor after code review

* fix after incorrectly restoring kahan summation

* modifications for code review + bug fix in merge_fruit

* refactor assert_nearly_equals macro

* update after code review

---------

Co-authored-by: Giovanni Cuccu <gcuccu@imolainformatica.it>
2024-06-04 14:25:17 +08:00
PSeitz
c0686515a9 update one_shot (#2420) 2024-05-31 11:07:35 +08:00
trinity-1686a
455156f51c improve query parser (#2416)
* support escape sequence in more place

and fix bug with singlequoted strings

* add query parser test for range query on default field
2024-05-30 17:29:27 +02:00
Meng Zhang
4143d31865 chore: fix build as the rev is gone (#2417) 2024-05-29 09:49:16 +08:00
Hamir Mahal
0c634adbe1 style: simplify strings with string interpolation (#2412)
* style: simplify strings with string interpolation

* fix: formatting
2024-05-27 09:16:47 +02:00