* first version of extended stats along with its tests
* using IntermediateExtendStats instead of IntermediateStats with all tests passing
* Created struct for request and response
* first test with extended_stats
* kahan summation and tests with approximate equality
* version ready for merge
* removed approx dependency
* refactor for using ExtendedStats only when needed
* interim version
* refined version with code formatted
* refactored a struct
* cosmetic refactor
* fix after merge
* fix format
* added extended_stat bench
* merge and new benchmark for extended stats
* split stat segment collectors
* wrapped intermediate extended stat with a box to limit memory usage
* Revert "wrapped intermediate extended stat with a box to limit memory usage"
This reverts commit 5b4aa9f393.
* some code reformat, commented kahan summation
* refactor after review
* refactor after code review
* fix after incorrectly restoring kahan summation
* modifications for code review + bug fix in merge_fruit
* refactor assert_nearly_equals macro
* update after code review
---------
Co-authored-by: Giovanni Cuccu <gcuccu@imolainformatica.it>
Contributing
When adding new bucket aggregation make sure to extend the "test_aggregation_flushing" test for at least 2 levels.
Code Organization
Tantivy's aggregations have been designed to mimic the aggregations of elasticsearch.
The code is organized in submodules:
bucket
Contains all bucket aggregations, like range aggregation. These bucket aggregations group documents into buckets and can contain sub-aggregations.
metric
Contains all metric aggregations, like average aggregation. Metric aggregations do not have sub aggregations.
agg_req
agg_req contains the users aggregation request. Deserialization from json is compatible with elasticsearch aggregation requests.
agg_req_with_accessor
agg_req_with_accessor contains the users aggregation request enriched with fast field accessors etc, which are used during collection.
segment_agg_result
segment_agg_result contains the aggregation result tree, which is used for collection of a segment. The tree from agg_req_with_accessor is passed during collection.
intermediate_agg_result
intermediate_agg_result contains the aggregation tree for merging with other trees.
agg_result
agg_result contains the final aggregation tree.