tantivy

mirror of https://github.com/quickwit-oss/tantivy.git synced 2026-05-28 06:00:40 +00:00

Author	SHA1	Message	Date
cong.xie	698f073f88	fix fmt	2026-02-11 15:52:39 -05:00
cong.xie	cdd24b7ee5	Replace hyperloglogplus with Apache DataSketches HLL (lg_k=11) Switch tantivy's cardinality aggregation from the hyperloglogplus crate (HyperLogLog++ with p=16) to the official Apache DataSketches HLL implementation (datasketches crate v0.2.0 with lg_k=11, Hll4). This enables returning raw HLL sketch bytes from pomsky to Datadog's event query, where they can be properly deserialized and merged using the same DataSketches library (Java). The previous implementation required pomsky to fabricate fake HLL sketches from scalar cardinality estimates, which produced incorrect results when merged. Changes: - Cargo.toml: hyperloglogplus 0.4.1 -> datasketches 0.2.0 - CardinalityCollector: HyperLogLogPlus<u64, BuildSaltedHasher> -> HllSketch - Custom Serde impl using HllSketch binary format (cross-shard compat) - New to_sketch_bytes() for external consumers (pomsky) - Salt preserved via (salt, value) tuple hashing for column type disambiguation - Removed BuildSaltedHasher struct - Added 4 new unit tests (serde roundtrip, merge, binary compat, salt)	2026-02-11 08:49:46 -05:00
Metin Dumandag	09b6ececa7	Export fields of the PercentileValuesVecEntry (#2833 ) Otherwise, there is no way to access these fields when not using the json serialized form of the aggregation results. This simple data struct is part of the public api, so its fields should be accessible as well.	2026-02-11 11:31:07 +01:00
cong.xie	bb141abe22	feat(aggregation): add keys() accessor to IntermediateAggregationResults	2026-02-09 15:38:35 -05:00
cong.xie	f1c29ba972	resolve conflcit	2026-02-06 14:23:11 -05:00
cong.xie	ae0554a6a5	feat(aggregation): add public accessors for intermediate aggregation results Add accessor methods to allow external crates to read intermediate aggregation results without accessing pub(crate) fields: - IntermediateAggregationResults: get(), remove() - IntermediateTermBucketResult: entries(), sum_other_doc_count(), doc_count_error_upper_bound() - IntermediateAverage: stats() - IntermediateStats: count(), sum() - IntermediateKey: Display impl for string conversion	2026-02-06 11:12:20 -05:00
cong.xie	0d7abe5d23	feat(aggregation): add public accessors for intermediate aggregation results Add accessor methods to allow external crates to read intermediate aggregation results without accessing pub(crate) fields: - IntermediateAggregationResults: get(), get_mut(), remove() - IntermediateTermBucketResult: entries(), sum_other_doc_count(), doc_count_error_upper_bound() - IntermediateAverage: stats() - IntermediateStats: count(), sum() - IntermediateKey: Display impl for string conversion	2026-02-06 10:28:59 -05:00
Paul Masurel	4a89e74597	Fix rfc3339 typos and add Claude Code skills (#2823 ) Closes #2817	2026-01-30 12:00:28 +01:00
PSeitz-dd	65b5a1a306	one collector per agg request instead per bucket (#2759 ) * improve bench * add more tests for new collection type * one collector per agg request instead per bucket In this refactoring a collector knows in which bucket of the parent their data is in. This allows to convert the previous approach of one collector per bucket to one collector per request. low card bucket optimization * reduce dynamic dispatch, faster term agg * use radix map, fix prepare_max_bucket use paged term map in term agg use special no sub agg term map impl * specialize columntype in stats * remove stacktrace bloat, use &mut helper increase cache to 2048 * cleanup remove clone move data in term req, single doc opt for stats * add comment * share column block accessor * simplify fetch block in column_block_accessor * split subaggcache into two trait impls * move partitions to heap * fix name, add comment --------- Co-authored-by: Pascal Seitz <pascal.seitz@gmail.com>	2026-01-06 11:50:55 +01:00
Paul Masurel	63c66005db	Lazy scorers (#2726 ) * Refactoring of the score tweaker into `SortKeyComputer`s to unlock two features. - Allow lazy evaluation of score. As soon as we identified that a doc won't reach the topK threshold, we can stop the evaluation. - Allow for a different segment level score, segment level score and their conversion. This PR breaks public API, but fixing code is straightforward. * Bumping tantivy version --------- Co-authored-by: Paul Masurel <paul.masurel@datadoghq.com>	2025-12-01 15:38:57 +01:00
Ang	08a92675dc	Fix typos again (#2753 ) Found via `codespell -S benches,stopwords.rs -L womens,parth,abd,childs,ond,ser,ue,mot,hel,atleast,pris,claus,allo`	2025-12-01 12:15:41 +01:00
Paul Masurel	f88b7200b2	Optimization when posting list are saturated. (#2745 ) * Optimization when posting list are saturated. If a posting list doc freq is the segment reader's max_doc, and if scoring does not matter, we can replace it by a AllScorer. In turn, in a boolean query, we can dismiss all scorers and empty scorers, to accelerate the request. * Added range query optimization * CR comment * CR comments * CR comment --------- Co-authored-by: Paul Masurel <paul.masurel@datadoghq.com>	2025-11-26 15:50:57 +01:00
Paul Masurel	c363bbd23d	Optimize term aggregation with low cardinality + some refactoring (#2740 ) This introduce an optimization of top level term aggregation on field with a low cardinality. We then use a Vec as the underlying map. In addition, we buffer subaggregations. --------- Co-authored-by: Pascal Seitz <pascal.seitz@datadoghq.com> Co-authored-by: Paul Masurel <paul@quickwit.io>	2025-11-21 14:46:29 +01:00
Moe	70e591e230	feat: added filter aggregation (#2711 ) * Initial impl * Added `Filter` impl in `build_single_agg_segment_collector_with_reader` + Added tests * Added `Filter(FilterBucketResult)` + Made tests work. * Fixed type issues. * Fixed a test. * 8a7a73a: Pass `segment_reader` * Added more tests. * Improved parsing + tests * refactoring * Added more tests. * refactoring: moved parsing code under QueryParser * Use Tantivy syntax instead of ES * Added a sanity check test. * Simplified impl + tests * Added back tests in a more maintable way * nitz. * nitz * implemented very simple fast-path * improved a comment * implemented fast field support * Used `BoundsRange` * Improved fast field impl + tests * Simplified execution. * Fixed exports + nitz * Improved the tests to check to the expected result. * Improved test by checking the whole result JSON * Removed brittle perf checks. * Added efficiency verification tests. * Added one more efficiency check test. * Improved the efficiency tests. * Removed unnecessary parsing code + added direct Query obj * Fixed tests. * Improved tests * Fixed code structure * Fixed lint issues * nitz. * nitz * nitz. * nitz. * nitz. * Added an example * Fixed PR comments. * Applied PR comments + nitz * nitz. * Improved the code. * Fixed a perf issue. * Added batch processing. * Made the example more interesting * Fixed bucket count * Renamed Direct to CustomQuery * Fixed lint issues. * No need for scorer to be an `Option` * nitz * Used BitSet * Added an optimization for AllQuery * Fixed merge issues. * Fixed lint issues. * Added benchmark for FILTER * Removed the Option wrapper. * nitz. * Applied PR comments. * Fixed the AllQuery optimization * Applied PR comments. * feat: used `erased_serde` to allow filter query to be serialized * further improved a comment * Added back tests. * removed an unused method * removed an unused method * Added documentation * nitz. * Added query builder. * Fixed a comment. * Applied PR comments. * Fixed doctest issues. * Added ser/de * Removed bench in test * Fixed a lint issue.	2025-11-18 20:54:31 +01:00
PSeitz	60225bdd45	cleanup (#2724 ) Co-authored-by: Pascal Seitz <pascal.seitz@datadoghq.com>	2025-10-23 10:23:34 +02:00
PSeitz	938bfec8b7	use FxHashMap for Aggregations Request (#2722 ) Co-authored-by: Pascal Seitz <pascal.seitz@datadoghq.com>	2025-10-21 15:59:18 +02:00
PSeitz	dabcaa5809	fix merge intermediate aggregation results (#2719 ) Previously the merging relied on the order of the results, which is invalid since https://github.com/quickwit-oss/tantivy/pull/2035. This bug is only hit in specific scenarios, when the aggregation collectors are built in a different order on different segments. Co-authored-by: Pascal Seitz <pascal.seitz@datadoghq.com>	2025-10-17 12:41:31 +02:00
PSeitz	d410a3b0c0	Add Filtering for Term Aggregations (#2717 ) * Add Filtering for Term Aggregations Closes #2702 * add AggregationsSegmentCtx memory consumption --------- Co-authored-by: Pascal Seitz <pascal.seitz@datadoghq.com>	2025-10-15 17:39:53 +02:00
Remi	fc93391d0e	Minor clarifications on the AggregationsWithAccessor refacto (#2716 )	2025-10-14 19:59:33 +02:00
PSeitz	f8e79271ab	Replace AggregationsWithAccessor (#2715 ) * add nested histogram-termagg benchmark * Replace AggregationsWithAccessor with AggData With AggregationsWithAccessor pre-computation and caching was done on the collector level. If you have 10000 sub collectors (e.g. a term aggregation with sub aggregations) this is very inefficient. `AggData` instead moves the data from the collector to a node which reflects the cardinality of the request tree instead of the cardinality of the segment collector. It also moves the global struct shared with all aggregations in to aggregation specific structs. So each aggregation has its own space to store cached data and aggregation specific information. This also breaks up the dependency to the elastic search aggregation structure somewhat. Due to lifetime issues, we move the agg request specific object out of `AggData` during the collection and move it back at the end (for now). That's some unnecessary work, which costs CPU. This allows better caching and will also pave the way for another potential optimization, by separating the collector and its storage. Currently we allocate a new collector for each sub aggregation bucket (for nested aggregations), but ideally we would have just one collector instance. * renames * move request data to agg request files --------- Co-authored-by: Pascal Seitz <pascal.seitz@datadoghq.com>	2025-10-14 09:22:11 +02:00
PSeitz-dd	2340dca628	fix compiler warnings (#2699 ) * fix compiler warnings * fix import	2025-09-19 15:55:04 +02:00
PSeitz-dd	811c68cdb2	fix field_names in top_hits aggregation (#2675 )	2025-07-21 12:19:30 +08:00
PSeitz	945af922d1	clippy (#2661 ) * clippy * use readable version --------- Co-authored-by: Pascal Seitz <pascal.seitz@datadoghq.com>	2025-07-02 11:25:03 +02:00
trinity Pointard	9426d5be7b	fix agg Key PartialEq impl	2025-03-14 14:57:45 +01:00
trinity Pointard	0368162ef0	make DateHistogramAggregationReq buildable	2025-02-18 11:45:24 +01:00
trinity Pointard	32b6e9711b	add tests	2024-12-13 16:06:24 +01:00
trinity-1686a	0bac391291	add support for counting non integer in aggregation	2024-11-28 19:52:47 +01:00
Paul Masurel	c35a782747	Updating rustc-hash and clippy fixes (#2532 ) * Updating rustc-hash and clippy fixes * fix terms_aggregation_min_doc_count_special_case --------- Co-authored-by: Pascal Seitz <pascal.seitz@gmail.com>	2024-11-01 13:46:26 +08:00
PSeitz	21d057059e	clippy (#2527 ) * clippy * clippy * clippy * clippy * convert allow to expect and remove unused * cargo fmt * cleanup * export sample * clippy	2024-10-22 09:26:54 +08:00
Bruce Mitchener	c17e513377	Reduce typo count. (#2510 )	2024-10-10 09:55:37 +08:00
PSeitz	55b0b52457	Fix AggregationLimits (#2495 ) * change AggregationLimits behavior This fixes an issue encountered with the current behaviour of AggregationLimits. Previously we had AggregationLimits and RessourceLimitGuard, which both track the memory, but only RessourceLimitGuard released memory when dropped, while AggregationLimits did not. This PR changes AggregationLimits to be a guard itself and removes the RessourceLimitGuard. * rename AggregationLimits to AggregationLimitsGuard	2024-09-17 14:25:47 +08:00
PSeitz	0d4e319965	add Key::I64 and Key::U64 variants in aggregation (#2468 ) * add Key::I64 and Key::U64 variants in aggregation Currently all `Key` numerical values are returned as f64. This causes problems in some cases with the precision and the way f64 is serialized. This PR adds `Key::I64` and `Key::U64` variants and uses them in the term aggregation. * add clarification comment	2024-07-31 20:29:32 +08:00
PSeitz	75dc3eb298	extend custom order deserialization (#2451 ) allow arrays improve validation closes https://github.com/quickwit-oss/tantivy/issues/2435	2024-07-30 18:36:08 +08:00
PSeitz	13e9885dfd	faster term aggregation fetch terms (#2447 ) big impact for term aggregations with large `size` parameter (e.g. 1000) add top 1000 term agg bench full terms_few Memory: 27.3 KB (+79.09%) Avg: 3.8058ms (+2.40%) Median: 3.7192ms (+3.47%) [3.6224ms .. 4.3721ms] terms_many Memory: 6.9 MB Avg: 12.6102ms (-4.70%) Median: 12.1389ms (-6.58%) [10.2847ms .. 15.4857ms] terms_many_top_1000 Memory: 6.9 MB Avg: 15.8216ms (-83.19%) Median: 15.4899ms (-83.46%) [13.4250ms .. 20.6897ms] terms_many_order_by_term Memory: 6.9 MB Avg: 14.7820ms (-3.95%) Median: 14.2236ms (-4.28%) [12.6669ms .. 21.0968ms] terms_many_with_top_hits Memory: 58.2 MB Avg: 551.6218ms (+7.18%) Median: 549.8826ms (+11.01%) [496.7371ms .. 592.1299ms] terms_many_with_avg_sub_agg Memory: 27.8 MB Avg: 197.7029ms (+2.66%) Median: 190.1564ms (+0.64%) [167.9226ms .. 245.6651ms] terms_many_json_mixed_type_with_avg_sub_agg Memory: 42.0 MB (+0.00%) Avg: 242.0121ms (+0.92%) Median: 237.7084ms (-2.85%) [201.9959ms .. 302.2136ms] terms_few_with_cardinality_agg Memory: 10.6 MB Avg: 122.6036ms (+1.21%) Median: 119.0033ms (+2.60%) [109.2859ms .. 161.5858ms] range_agg_with_term_agg_few Memory: 45.4 KB (+39.75%) Avg: 24.5454ms (+2.14%) Median: 24.2861ms (+2.44%) [23.5109ms .. 27.8406ms] range_agg_with_term_agg_many Memory: 6.9 MB Avg: 56.8049ms (+3.01%) Median: 50.9706ms (+1.52%) [41.4517ms .. 90.3934ms] dense terms_few Memory: 28.8 KB (+81.74%) Avg: 8.9092ms (-2.24%) Median: 8.7143ms (-1.31%) [8.6148ms .. 10.3868ms] terms_many Memory: 6.9 MB (-0.00%) Avg: 17.9604ms (-10.18%) Median: 17.1552ms (-11.93%) [14.8979ms .. 26.2779ms] terms_many_top_1000 Memory: 6.9 MB Avg: 21.4963ms (-78.90%) Median: 21.2924ms (-78.98%) [18.2033ms .. 28.0087ms] terms_many_order_by_term Memory: 6.9 MB Avg: 20.4167ms (-9.13%) Median: 19.5596ms (-11.37%) [17.5153ms .. 29.5987ms] terms_many_with_top_hits Memory: 58.2 MB Avg: 518.4474ms (-6.41%) Median: 514.9180ms (-9.44%) [471.5550ms .. 579.0220ms] terms_many_with_avg_sub_agg Memory: 27.8 MB Avg: 263.6702ms (-2.78%) Median: 260.8775ms (-2.55%) [239.5754ms .. 304.6669ms] terms_many_json_mixed_type_with_avg_sub_agg Memory: 42.0 MB Avg: 299.9791ms (-2.01%) Median: 302.2180ms (-3.08%) [239.2080ms .. 346.3649ms] terms_few_with_cardinality_agg Memory: 10.6 MB Avg: 136.3303ms (-3.12%) Median: 132.3831ms (-2.88%) [123.7564ms .. 164.7914ms] range_agg_with_term_agg_few Memory: 47.1 KB (+37.81%) Avg: 35.4538ms (+0.66%) Median: 34.8754ms (-0.56%) [34.2287ms .. 40.0884ms] range_agg_with_term_agg_many Memory: 6.9 MB Avg: 72.2269ms (-4.38%) Median: 66.1174ms (-4.98%) [55.5125ms .. 124.1622ms] sparse terms_few Memory: 27.3 KB (+69.68%) Avg: 19.6053ms (-1.15%) Median: 19.4543ms (-0.38%) [19.3056ms .. 24.0547ms] terms_many Memory: 1.8 MB Avg: 21.2886ms (-6.28%) Median: 21.1287ms (-6.65%) [20.6640ms .. 24.6144ms] terms_many_top_1000 Memory: 2.6 MB Avg: 23.4869ms (-85.53%) Median: 23.3393ms (-85.61%) [22.7789ms .. 25.0896ms] terms_many_order_by_term Memory: 1.8 MB Avg: 21.7437ms (-7.78%) Median: 21.6272ms (-7.66%) [21.0409ms .. 23.6517ms] terms_many_with_top_hits Memory: 13.1 MB Avg: 43.7926ms (-2.76%) Median: 44.3602ms (+0.01%) [37.8039ms .. 51.0451ms] terms_many_with_avg_sub_agg Memory: 7.5 MB Avg: 34.6307ms (+3.72%) Median: 33.4522ms (+1.16%) [32.4418ms .. 41.4196ms] terms_many_json_mixed_type_with_avg_sub_agg Memory: 7.4 MB Avg: 46.4318ms (+1.16%) Median: 46.4050ms (+2.03%) [44.5986ms .. 48.5142ms] terms_few_with_cardinality_agg Memory: 680.0 KB (-0.04%) Avg: 35.4410ms (+2.05%) Median: 35.1384ms (+1.19%) [34.4402ms .. 39.1082ms] range_agg_with_term_agg_few Memory: 45.7 KB (+39.44%) Avg: 22.7760ms (+0.44%) Median: 22.5152ms (-0.35%) [22.3078ms .. 26.1567ms] range_agg_with_term_agg_many Memory: 1.8 MB Avg: 25.7696ms (-4.45%) Median: 25.4009ms (-5.61%) [24.7874ms .. 29.6434ms] multivalue terms_few Memory: 244.4 KB Avg: 15.1253ms (-2.85%) Median: 15.0988ms (-0.54%) [14.8790ms .. 15.8193ms] terms_many Memory: 6.9 MB (-0.00%) Avg: 26.3019ms (-6.24%) Median: 26.3662ms (-4.94%) [21.3553ms .. 31.0564ms] terms_many_top_1000 Memory: 6.9 MB Avg: 29.5212ms (-72.90%) Median: 29.4257ms (-72.84%) [24.2645ms .. 35.1607ms] terms_many_order_by_term Memory: 6.9 MB Avg: 28.6076ms (-4.93%) Median: 28.1059ms (-6.64%) [24.0845ms .. 34.1493ms] terms_many_with_top_hits Memory: 58.3 MB Avg: 570.1548ms (+1.52%) Median: 572.7759ms (+0.53%) [525.9567ms .. 617.0862ms] terms_many_with_avg_sub_agg Memory: 27.8 MB Avg: 305.5207ms (+0.24%) Median: 296.0101ms (-0.22%) [277.8579ms .. 373.5914ms] terms_many_json_mixed_type_with_avg_sub_agg Memory: 42.0 MB (-0.00%) Avg: 324.7342ms (-2.51%) Median: 319.0025ms (-2.58%) [298.7122ms .. 368.6144ms] terms_few_with_cardinality_agg Memory: 10.8 MB Avg: 151.6126ms (-2.54%) Median: 149.0616ms (-0.32%) [136.5592ms .. 181.8942ms] range_agg_with_term_agg_few Memory: 248.2 KB Avg: 49.5225ms (+3.11%) Median: 48.3994ms (+3.18%) [46.4134ms .. 60.5989ms] range_agg_with_term_agg_many Memory: 6.9 MB Avg: 85.9824ms (-3.66%) Median: 78.4266ms (-3.85%) [64.1231ms .. 128.5279ms]	2024-07-03 12:42:59 +08:00
PSeitz	56d79cb203	fix cardinality aggregation performance (#2446 ) * fix cardinality aggregation performance fix cardinality performance by fetching multiple terms at once. This avoids decompressing the same block and keeps the buffer state between terms. add cardinality aggregation benchmark bump rust version to 1.66 Performance comparison to before (AllQuery) ``` full cardinality_agg Memory: 3.5 MB (-0.00%) Avg: 21.2256ms (-97.78%) Median: 21.0042ms (-97.82%) [20.4717ms .. 23.6206ms] terms_few_with_cardinality_agg Memory: 10.6 MB Avg: 81.9293ms (-97.37%) Median: 81.5526ms (-97.38%) [79.7564ms .. 88.0374ms] dense cardinality_agg Memory: 3.6 MB (-0.00%) Avg: 25.9372ms (-97.24%) Median: 25.7744ms (-97.25%) [24.7241ms .. 27.8793ms] terms_few_with_cardinality_agg Memory: 10.6 MB Avg: 93.9897ms (-96.91%) Median: 92.7821ms (-96.94%) [90.3312ms .. 117.4076ms] sparse cardinality_agg Memory: 895.4 KB (-0.00%) Avg: 22.5113ms (-95.01%) Median: 22.5629ms (-94.99%) [22.1628ms .. 22.9436ms] terms_few_with_cardinality_agg Memory: 680.2 KB Avg: 26.4250ms (-94.85%) Median: 26.4135ms (-94.86%) [26.3210ms .. 26.6774ms] ``` * clippy * assert for sorted ordinals	2024-07-02 15:29:00 +08:00
Raphael Coeffic	d9db5302d9	feat: cardinality aggregation (#2337 ) * WiP: cardinality aggregation * Collect unique entries first, then insert into HyperLogLog * Handle `missing` * Hybrid approach * Review changes - insert `missing` value at most once - `term_id` -> `term_ord` - iterate directly over entries without collecting first * Use salted hasher to include column type * fix: formatting * More review fixes * Add cardinality to test_aggregation_flushing * Formatting	2024-07-01 07:49:42 +08:00
PSeitz	93ff7365b0	reduce top hits aggregation memory consumption (#2426 ) move request structure out of top hits aggregation collector and use from the passed structure instead full terms_many_with_top_hits Memory: 58.2 MB (-43.64%) Avg: 425.9680ms (-21.38%) Median: 415.1097ms (-23.56%) [395.5303ms .. 484.6325ms] dense terms_many_with_top_hits Memory: 58.2 MB (-43.64%) Avg: 440.0817ms (-19.68%) Median: 432.2286ms (-21.10%) [403.5632ms .. 497.7541ms] sparse terms_many_with_top_hits Memory: 13.1 MB (-49.31%) Avg: 33.3568ms (-32.19%) Median: 33.0834ms (-31.86%) [32.5126ms .. 35.7397ms] multivalue terms_many_with_top_hits Memory: 58.2 MB (-43.64%) Avg: 414.2340ms (-25.44%) Median: 413.4144ms (-25.64%) [403.9919ms .. 430.3170ms]	2024-06-06 22:32:58 +08:00
giovannicuccu	1095c9b073	Issue 1787 extended stats (#2247 ) * first version of extended stats along with its tests * using IntermediateExtendStats instead of IntermediateStats with all tests passing * Created struct for request and response * first test with extended_stats * kahan summation and tests with approximate equality * version ready for merge * removed approx dependency * refactor for using ExtendedStats only when needed * interim version * refined version with code formatted * refactored a struct * cosmetic refactor * fix after merge * fix format * added extended_stat bench * merge and new benchmark for extended stats * split stat segment collectors * wrapped intermediate extended stat with a box to limit memory usage * Revert "wrapped intermediate extended stat with a box to limit memory usage" This reverts commit `5b4aa9f393`. * some code reformat, commented kahan summation * refactor after review * refactor after code review * fix after incorrectly restoring kahan summation * modifications for code review + bug fix in merge_fruit * refactor assert_nearly_equals macro * update after code review --------- Co-authored-by: Giovanni Cuccu <gcuccu@imolainformatica.it>	2024-06-04 14:25:17 +08:00
Hamir Mahal	0c634adbe1	style: simplify strings with string interpolation (#2412 ) * style: simplify strings with string interpolation * fix: formatting	2024-05-27 09:16:47 +02:00
PSeitz	5b7cca13e5	lower contention on AggregationLimits (#2394 ) PR https://github.com/quickwit-oss/quickwit/pull/4962 fixes an issue where the AggregationLimits are not passed correctly. Since the AggregationLimits are shared properly we run into contention issues. This PR includes some straightforward improvement to reduce contention, by only calling if the memory changed and avoiding the second read. We probably need some sharding with multiple counters or local caching before updating the global after some threshold.	2024-05-15 12:25:40 +02:00
PSeitz	c6b213d8f0	use bingang for agg benchmark (#2378 ) * use bingang for agg benchmark use bingang for agg benchmark, which includes memory consumption Output: ``` full histogram Memory: 15.8 KB Avg: 10.9322ms (+5.44%) Median: 10.8790ms (+9.28%) Min: 10.7470ms Max: 11.3263ms histogram_hard_bounds Memory: 15.5 KB Avg: 5.1939ms (+6.61%) Median: 5.1722ms (+10.98%) Min: 5.0432ms Max: 5.3910ms histogram_with_avg_sub_agg Memory: 48.7 KB Avg: 23.8165ms (+4.57%) Median: 23.7264ms (+10.06%) Min: 23.4995ms Max: 24.8107ms dense histogram Memory: 17.3 KB Avg: 15.6810ms (-8.54%) Median: 15.6174ms (-8.89%) Min: 15.4953ms Max: 16.0702ms histogram_hard_bounds Memory: 15.4 KB Avg: 10.0720ms (-7.33%) Median: 10.0572ms (-7.06%) Min: 9.8500ms Max: 10.4819ms histogram_with_avg_sub_agg Memory: 50.1 KB Avg: 33.0993ms (-7.04%) Median: 32.9499ms (-6.86%) Min: 32.8284ms Max: 34.0529ms sparse histogram Memory: 16.3 KB Avg: 19.2325ms (-0.44%) Median: 19.1211ms (-1.26%) Min: 19.0348ms Max: 19.7902ms histogram_hard_bounds Memory: 16.1 KB Avg: 18.5179ms (-0.61%) Median: 18.4552ms (-0.90%) Min: 18.3799ms Max: 19.0535ms histogram_with_avg_sub_agg Memory: 34.7 KB Avg: 21.2589ms (-0.69%) Median: 21.1867ms (-1.05%) Min: 21.0342ms Max: 21.9900ms ``` * add more bench with term as sub agg	2024-05-07 11:29:49 +02:00
PSeitz	eea70030bf	cleanup top level exports (#2382 ) remove some top level exports	2024-05-07 09:59:41 +02:00
PSeitz	0e9fced336	remove JsonTermWriter (#2238 ) * remove JsonTermWriter remove JsonTermWriter remove path truncation logic, add assertion * fix json_path_writer add sep logic	2024-04-18 16:28:05 +02:00
Adam Reichold	4708171a32	Fix some of the things current Clippy complains about (#2363 )	2024-04-16 04:27:06 +02:00
PSeitz	dfa3aed32d	check unsupported parameters top_hits (#2351 ) * check unsupported parameters top_hits * move to function	2024-04-10 08:20:52 +02:00
PSeitz	74940e9345	clippy (#2349 ) * fix clippy * fix clippy * fix duplicate imports	2024-04-09 07:54:44 +02:00
PSeitz	92c32979d2	fix postcard compatibility for top_hits, add postcard test (#2346 ) * fix postcard compatibility for top_hits, add postcard test * fix top_hits naming, delay data fetch closes #2347 * fix import	2024-04-09 06:17:25 +02:00
PSeitz	67ebba3c3c	expose collect_block buffer size (#2326 ) * expose buffer of collect_block * flip shard_size segment_size	2024-03-15 08:02:08 +01:00
PSeitz	7ce950f141	add method to fetch block of first vals in columnar (#2330 ) * add method to fetch block of first vals in columnar add method to fetch block of first vals in columnar (this is way faster than single calls for full columns) add benchmark fix import warnings ``` test bench_get_block_first_on_full_column ... bench: 56 ns/iter (+/- 26) test bench_get_block_first_on_full_column_single_calls ... bench: 311 ns/iter (+/- 6) test bench_get_block_first_on_multi_column ... bench: 378 ns/iter (+/- 15) test bench_get_block_first_on_multi_column_single_calls ... bench: 546 ns/iter (+/- 13) test bench_get_block_first_on_optional_column ... bench: 291 ns/iter (+/- 6) test bench_get_block_first_on_optional_column_single_calls ... bench: 362 ns/iter (+/- 8) ``` * use remainder	2024-03-15 08:01:47 +01:00
PSeitz	b0e65560a1	handle ip adresses in term aggregation (#2319 ) * handle ip adresses in term aggregation Stores IpAdresses during the segment term aggregation via u64 representation and convert to u128(IpV6Adress) via downcast when converting to intermediate results. Enable Downcasting on `ColumnValues` Expose u64 variant for u128 encoded data via `open_u64_lenient` method. Remove lifetime in VecColumn, to avoid 'static lifetime requirement coming from downcast trait. * rename method	2024-03-14 09:41:18 +01:00

1 2 3 4 5

233 Commits