* add nested histogram-termagg benchmark * Replace AggregationsWithAccessor with AggData With AggregationsWithAccessor pre-computation and caching was done on the collector level. If you have 10000 sub collectors (e.g. a term aggregation with sub aggregations) this is very inefficient. `AggData` instead moves the data from the collector to a node which reflects the cardinality of the request tree instead of the cardinality of the segment collector. It also moves the global struct shared with all aggregations in to aggregation specific structs. So each aggregation has its own space to store cached data and aggregation specific information. This also breaks up the dependency to the elastic search aggregation structure somewhat. Due to lifetime issues, we move the agg request specific object out of `AggData` during the collection and move it back at the end (for now). That's some unnecessary work, which costs CPU. This allows better caching and will also pave the way for another potential optimization, by separating the collector and its storage. Currently we allocate a new collector for each sub aggregation bucket (for nested aggregations), but ideally we would have just one collector instance. * renames * move request data to agg request files --------- Co-authored-by: Pascal Seitz <pascal.seitz@datadoghq.com>
1.2 KiB
Contributing
When adding new bucket aggregation make sure to extend the "test_aggregation_flushing" test for at least 2 levels.
Code Organization
Tantivy's aggregations have been designed to mimic the aggregations of elasticsearch.
The code is organized in submodules:
bucket
Contains all bucket aggregations, like range aggregation. These bucket aggregations group documents into buckets and can contain sub-aggregations.
metric
Contains all metric aggregations, like average aggregation. Metric aggregations do not have sub aggregations.
agg_req
agg_req contains the users aggregation request. Deserialization from json is compatible with elasticsearch aggregation requests.
agg_data
agg_data contains the users aggregation request enriched with fast field accessors etc, which are used during collection.
segment_agg_result
segment_agg_result contains the aggregation result tree, which is used for collection of a segment. agg_data is passed during collection.
intermediate_agg_result
intermediate_agg_result contains the aggregation tree for merging with other trees.
agg_result
agg_result contains the final aggregation tree.