Commit Graph

41 Commits

Author SHA1 Message Date
PSeitz
92c32979d2 fix postcard compatibility for top_hits, add postcard test (#2346)
* fix postcard compatibility for top_hits, add postcard test

* fix top_hits naming, delay data fetch

closes #2347

* fix import
2024-04-09 06:17:25 +02:00
Tushar
0e04ec3136 feat(aggregators/metric): Add a top_hits aggregator (#2198)
* feat(aggregators/metric): Implement a top_hits aggregator

* fix: Expose get_fields

* fix: Serializer for top_hits request

Also removes extraneous the extraneous third-party
serialization helper.

* chore: Avert panick on parsing invalid top_hits query

* refactor: Allow multiple field names from aggregations

* perf: Replace binary heap with TopNComputer

* fix: Avoid comparator inversion by ComparableDoc

* fix: Rank missing field values lower than present values

* refactor: Make KeyOrder a struct

* feat: Rough attempt at docvalue_fields

* feat: Complete stab at docvalue_fields

- Rename "SearchResult*" => "Retrieval*"
- Revert Vec => HashMap for aggregation accessors.
- Split accessors for core aggregation and field retrieval.
- Resolve globbed field names in docvalue_fields retrieval.
- Handle strings/bytes and other column types with DynamicColumn

* test(unit): Add tests for top_hits aggregator

* fix: docfield_value field globbing

* test(unit): Include dynamic fields

* fix: Value -> OwnedValue

* fix: Use OwnedValue's native Null variant

* chore: Improve readability of test asserts

* chore: Remove DocAddress from top_hits result

* docs: Update aggregator doc

* revert: accidental doc test

* chore: enable time macros only for tests

* chore: Apply suggestions from review

* chore: Apply suggestions from review

* fix: Retrieve all values for fields

* test(unit): Update for multi-value retrieval

* chore: Assert term existence

* feat: Include all columns for a column name

Since a (name, type) constitutes a unique column.

* fix: Resolve json fields

Introduces a translation step to bridge the difference between
ColumnarReaders null `\0` separated json field keys to the common
`.` separated used by SegmentReader. Although, this should probably
be the default behavior for ColumnarReader's public API perhaps.

* chore: Address review on mutability

* chore: s/segment_id/segment_ordinal instances of SegmentOrdinal

* chore: Revert erroneous grammar change
2024-01-26 16:46:41 +01:00
PSeitz
b1d8b072db add missing aggregation part 2 (#2149)
* add missing aggregation part 2

Add missing support for:
- Mixed types columns
- Key of type string on numerical fields

The special aggregation is slower than the integrated one in TermsAggregation and therefore not
chosen by default, although it can cover all use cases.

* simplify, add num_docs to empty
2023-08-31 07:55:33 +02:00
PSeitz
73cb71762f add missing parameter for stats,min,max,count,sum,avg (#2151)
* add missing parameter for stats,min,max,count,sum,avg

add missing parameter for stats,min,max,count,sum,avg
closes #1913
partially #1789

* Apply suggestions from code review

Co-authored-by: Paul Masurel <paul@quickwit.io>

---------

Co-authored-by: Paul Masurel <paul@quickwit.io>
2023-08-28 08:59:51 +02:00
PSeitz
c2be6603a2 alternative mixed field aggregation collection (#2135)
* alternative mixed field aggregation collection

instead of having multiple accessor in one AggregationWithAccessor split it into
multiple independent AggregationWithAccessor

* Update src/aggregation/agg_req_with_accessor.rs

Co-authored-by: Paul Masurel <paul@quickwit.io>

---------

Co-authored-by: Paul Masurel <paul@quickwit.io>
2023-07-27 12:25:31 +02:00
Adam Reichold
c805f08ca7 Fix a few more upcoming Clippy lints (#2133) 2023-07-24 17:07:57 +09:00
PSeitz
6239697a02 switch to ms in histogram for date type (#2045)
* switch to ms in histogram for date type

switch to ms in histogram, by adding a normalization step that converts
to nanoseconds precision when creating the collector.

closes #2028
related to #2026

* add missing unit long variants

* use single thread to avoid handling test case

* fix docs

* revert CI

* cleanup

* improve docs

* Update src/aggregation/bucket/histogram/histogram.rs

Co-authored-by: Paul Masurel <paul@quickwit.io>

---------

Co-authored-by: Paul Masurel <paul@quickwit.io>
2023-05-19 08:15:44 +02:00
PSeitz
2dfe37940d handle multiple types in term aggregation (#2041) 2023-05-15 11:57:38 +02:00
PSeitz
45ff0e3c5c clear memory consumption in AggregationLimits (#2022)
* clear memory consumption in AggregationLimits

clear memory consumption in AggregationLimits at the end of segment collection

* switch to ResourceLimitGuard

* unduplicate code

* merge methods

* Apply suggestions from code review

Co-authored-by: Paul Masurel <paul@quickwit.io>

---------

Co-authored-by: Paul Masurel <paul@quickwit.io>
2023-05-08 10:15:09 +02:00
PSeitz
1f06997d04 fix single collector special case (#2014) 2023-04-27 09:30:19 +02:00
PSeitz
2e369db936 switch to Aggregation without serde_untagged (#2003)
* refactor result handling

* remove Internal stuff

* merge different accessors

* switch to Aggregation without serde_untagged

* fix doctests
2023-04-25 08:54:51 +02:00
PSeitz
41af70799d add percentiles aggregations (#1984)
* add percentiles aggregations

add percentiles aggregation
fix disabled agg benchmark

* Update src/aggregation/metric/percentiles.rs

Co-authored-by: Paul Masurel <paul@quickwit.io>

* Apply suggestions from code review

Co-authored-by: Paul Masurel <paul@quickwit.io>

* fix import

* fix import

---------

Co-authored-by: Paul Masurel <paul@quickwit.io>
2023-04-07 07:18:28 +02:00
PSeitz
da2804644f fetch blocks of vals in aggregation for all cardinality (#1950)
* fetch blocks of vals in aggregation for all cardinality

* move caching in common accessor
2023-03-23 08:41:11 +01:00
PSeitz
9e2faecf5b add memory limit for aggregations (#1942)
* add memory limit for aggregations

introduce AggregationLimits to set memory consumption limit and bucket limits
memory limit is checked during aggregation, bucket limit is checked before returning the aggregation request.

* Apply suggestions from code review

Co-authored-by: Paul Masurel <paul@quickwit.io>

* add ByteCount with human readable format

---------

Co-authored-by: Paul Masurel <paul@quickwit.io>
2023-03-16 06:21:07 +01:00
PSeitz
ca20bfa776 add date_histogram (#1900)
* add date_histogram

* add return result
2023-03-02 05:17:35 +01:00
PSeitz
bc36458334 move buffer in front of dynamic dispatch (#1915)
dynamic dispatch seems to be really expensive, move the buffer in front of the dynamic dispatch, to reduce the number of calls into the dynamic dispatched collector.
2023-02-28 13:07:50 +08:00
PSeitz
e510f699c8 feat: add support for u64,i64,f64 fields in term aggregation (#1883)
* feat: add support for u64,i64,f64 fields in term aggregation

* hash enum values

* fix build

* Apply suggestions from code review

Co-authored-by: Paul Masurel <paul@quickwit.io>

---------

Co-authored-by: Paul Masurel <paul@quickwit.io>
2023-02-27 15:04:41 +08:00
PSeitz
5f23bb7e65 switch to sparse collection for histogram (#1898)
* switch to sparse collection for histogram

Replaces histogram vec collection with a hashmap. This approach works much better for sparse data and enables use cases like drill downs (filter + small interval).
It is slower for dense cases (1.3x-2x slower). This can be alleviated with a specialized hashmap in the future.
closes #1704
closes #1370

* refactor, clippy

* fix bucket_pos overflow issue
2023-02-23 07:02:58 +01:00
Paul Masurel
e2aa5af075 Clippy warnings fixes (#1885) 2023-02-20 19:04:13 +09:00
PSeitz
74bf60b4f7 implement SegmentAggregationCollector on bucket aggs (#1878) 2023-02-17 12:53:29 +01:00
PSeitz
111f25a8f7 clippy (#1879)
* fix clippy

* fix clippy

* fmt
2023-02-17 11:34:21 +01:00
PSeitz
019db10e8e refactor aggregations (#1875)
* add specialized version for full cardinality

Pre Columnar
test aggregation::tests::bench::bench_aggregation_average_u64                                                            ... bench:   6,681,850 ns/iter (+/- 1,217,385)
test aggregation::tests::bench::bench_aggregation_average_u64_and_f64                                                    ... bench:  10,576,327 ns/iter (+/- 494,380)

Current
test aggregation::tests::bench::bench_aggregation_average_u64                                                            ... bench:  11,562,084 ns/iter (+/- 3,678,682)
test aggregation::tests::bench::bench_aggregation_average_u64_and_f64                                                    ... bench:  18,925,790 ns/iter (+/- 17,616,771)

Post Change
test aggregation::tests::bench::bench_aggregation_average_u64                                                            ... bench:   9,123,811 ns/iter (+/- 399,720)
test aggregation::tests::bench::bench_aggregation_average_u64_and_f64                                                    ... bench:  13,111,825 ns/iter (+/- 273,547)

* refactor aggregation collection

* add buffering collector
2023-02-16 13:15:16 +01:00
Paul Masurel
bd5eea9852 Integrated columnar work. 2023-02-09 13:14:31 +01:00
Adrien Guillo
f2dad194ea Add count, min, max, and sum aggregations 2023-01-16 12:22:20 -05:00
PSeitz
6ca9a477f3 reuse stats for average (#1785)
* reuse stats for average

* fix count type
2023-01-13 23:32:27 +08:00
PSeitz
f9171a3981 fix clippy (#1725)
* fix clippy

* fix clippy fastfield codecs

* fix clippy bitpacker

* fix clippy common

* fix clippy stacker

* fix clippy sstable

* fmt
2022-12-20 07:30:06 +01:00
Paul Masurel
8e775b6c3d Refactoring dyn Column (#1502) 2022-09-02 17:26:30 +09:00
Pascal Seitz
44ea7313ca set max bucket size as parameter 2022-05-13 13:21:52 +08:00
Pascal Seitz
11ac451250 abort aggregation when too many buckets are created
Validation happens on different phases depending on the aggregation
Term: During segment collection
Histogram: At the end when converting in intermediate buckets (we preallocate empty buckets for the range) Revisit after #1370
Range: When validating the request

update CHANGELOG
2022-05-12 12:26:43 +08:00
Pascal Seitz
6a4632211a forward error in aggregation collect 2022-05-12 12:26:43 +08:00
Pascal Seitz
1be6c6111c support order property on term aggregations
support order property on term aggregations
order can be by doc_count, key, or a metric sub_aggregation
2022-04-20 00:34:38 +08:00
Pascal Seitz
902d05ebec refactor getffreader function 2022-04-13 19:51:18 +08:00
Pascal Seitz
dd13dedaeb forward errors, remove unwrap 2022-04-13 19:51:18 +08:00
Pascal Seitz
24432bf523 add term aggregation 2022-04-13 19:51:18 +08:00
Pascal Seitz
0262e44bbd merge_fruits pass by value 2022-03-15 12:59:22 +08:00
Pascal Seitz
613aad7a8a vec optional, improve performance 2022-03-14 21:29:07 +08:00
Pascal Seitz
1aa88b0c51 improve performance 2022-03-14 20:28:08 +08:00
Pascal Seitz
564fa38085 move sub_aggregations to own vec, use itertools minmax 2022-03-14 16:20:26 +08:00
Pascal Seitz
226f577803 Add Histogram aggregation 2022-03-11 21:52:07 +08:00
PSeitz
c4f66eb185 improve validation in aggregation, extend invalid field test (#1292)
* improve validation in aggregation, extend invalid field test

improve validation in aggregation
extend invalid field test
Fixes #1291

* collect fast field names on request structure

* fix visibility of AggregationSegmentCollector
2022-02-25 15:21:19 +09:00
PSeitz
972cb6c26d Aggregation (#1276)
Added support for aggregation compatible with Elasticsearch's API.
2022-02-21 09:59:11 +09:00