* chore!:drop JSON support on intermediate agg result
add support for other formats by removing skip_serialize and untagged
JSON support is broken anyway due it's lack on f64::INF etc. handling
* Update src/aggregation/intermediate_agg_result.rs
Co-authored-by: Paul Masurel <paul@quickwit.io>
* move from impl
---------
Co-authored-by: Paul Masurel <paul@quickwit.io>
* tokenizer option on text fastfield
allow to set tokenizer option on text fastfield (fixes#1901)
handle PreTokenized strings in fast field
* change visibility
* remove custom de/serialization
* Better mixed types support in aggs and fix serialization issue
- Improve support for mixed types in JSON field aggregations (pick the right field, #1913)
- Resolve the issue with JSON serialization for numeric keys (fixes#1967)
- Add JSON round-trip test for term buckets
- Remove `u64_lenient`, as this is a footgun without the type
- move aggregation benchmarks
* remove shadowing
* add memory limit for aggregations
introduce AggregationLimits to set memory consumption limit and bucket limits
memory limit is checked during aggregation, bucket limit is checked before returning the aggregation request.
* Apply suggestions from code review
Co-authored-by: Paul Masurel <paul@quickwit.io>
* add ByteCount with human readable format
---------
Co-authored-by: Paul Masurel <paul@quickwit.io>
* handle missing column for aggs
add empty column fallback for missing column in aggs.
Fix sort for term agg on sub-agg with missing value (null is smallest)
* add error when field is not fast
dynamic dispatch seems to be really expensive, move the buffer in front of the dynamic dispatch, to reduce the number of calls into the dynamic dispatched collector.
* feat: add support for u64,i64,f64 fields in term aggregation
* hash enum values
* fix build
* Apply suggestions from code review
Co-authored-by: Paul Masurel <paul@quickwit.io>
---------
Co-authored-by: Paul Masurel <paul@quickwit.io>
* switch to sparse collection for histogram
Replaces histogram vec collection with a hashmap. This approach works much better for sparse data and enables use cases like drill downs (filter + small interval).
It is slower for dense cases (1.3x-2x slower). This can be alleviated with a specialized hashmap in the future.
closes#1704closes#1370
* refactor, clippy
* fix bucket_pos overflow issue
The new fast field code, based on columnar, had a larger minimum memory
footprint, causing the first docuemnt to trigger a flush of the asegment
in this unit test.
This PR prevents the allocation of a large capacity for the different hashmap tables
using in the columnar writer.
Closes#1859
* Fix max bucket limit in histogram
The max bucket limit in histogram was broken, since some code introduced temporary filtering of buckets, which then resulted into an incorrect increment on the bucket count.
The provided solution covers more scenarios, but there are still some scenarios unhandled (See #1702).
* Apply suggestions from code review
Co-authored-by: Paul Masurel <paul@quickwit.io>
Co-authored-by: Paul Masurel <paul@quickwit.io>
* add aggregation support for date type
fixes#1332
* serialize key_as_string as rfc3339 in date histogram
* update docs
* enable date for range aggregation