Commit Graph

84 Commits

Author SHA1 Message Date
PSeitz
21d057059e clippy (#2527)
* clippy

* clippy

* clippy

* clippy

* convert allow to expect and remove unused

* cargo fmt

* cleanup

* export sample

* clippy
2024-10-22 09:26:54 +08:00
PSeitz
55b0b52457 Fix AggregationLimits (#2495)
* change AggregationLimits behavior

This fixes an issue encountered with the current behaviour of
AggregationLimits.
Previously we had AggregationLimits and RessourceLimitGuard, which both
track the memory, but only RessourceLimitGuard released memory when
dropped, while AggregationLimits did not.

This PR changes AggregationLimits to be a guard itself and removes the
RessourceLimitGuard.

* rename AggregationLimits to AggregationLimitsGuard
2024-09-17 14:25:47 +08:00
PSeitz
0d4e319965 add Key::I64 and Key::U64 variants in aggregation (#2468)
* add Key::I64 and Key::U64 variants in aggregation

Currently all `Key` numerical values are returned as f64. This causes problems in some
cases with the precision and the way f64 is serialized.

This PR adds `Key::I64` and `Key::U64` variants and uses them in the term
aggregation.

* add clarification comment
2024-07-31 20:29:32 +08:00
PSeitz
56d79cb203 fix cardinality aggregation performance (#2446)
* fix cardinality aggregation performance

fix cardinality performance by fetching multiple terms at once. This
avoids decompressing the same block and keeps the buffer state between
terms.

add cardinality aggregation benchmark

bump rust version to 1.66

Performance comparison to before (AllQuery)
```
full
cardinality_agg                   Memory: 3.5 MB (-0.00%)    Avg: 21.2256ms (-97.78%)    Median: 21.0042ms (-97.82%)    [20.4717ms .. 23.6206ms]
terms_few_with_cardinality_agg    Memory: 10.6 MB            Avg: 81.9293ms (-97.37%)    Median: 81.5526ms (-97.38%)    [79.7564ms .. 88.0374ms]
dense
cardinality_agg                   Memory: 3.6 MB (-0.00%)    Avg: 25.9372ms (-97.24%)    Median: 25.7744ms (-97.25%)    [24.7241ms .. 27.8793ms]
terms_few_with_cardinality_agg    Memory: 10.6 MB            Avg: 93.9897ms (-96.91%)    Median: 92.7821ms (-96.94%)    [90.3312ms .. 117.4076ms]
sparse
cardinality_agg                   Memory: 895.4 KB (-0.00%)    Avg: 22.5113ms (-95.01%)    Median: 22.5629ms (-94.99%)    [22.1628ms .. 22.9436ms]
terms_few_with_cardinality_agg    Memory: 680.2 KB             Avg: 26.4250ms (-94.85%)    Median: 26.4135ms (-94.86%)    [26.3210ms .. 26.6774ms]
```

* clippy

* assert for sorted ordinals
2024-07-02 15:29:00 +08:00
Hamir Mahal
0c634adbe1 style: simplify strings with string interpolation (#2412)
* style: simplify strings with string interpolation

* fix: formatting
2024-05-27 09:16:47 +02:00
PSeitz
c6b213d8f0 use bingang for agg benchmark (#2378)
* use bingang for agg benchmark

use bingang for agg benchmark, which includes memory consumption

Output:
```
full
histogram                     Memory: 15.8 KB              Avg: 10.9322ms  (+5.44%)    Median: 10.8790ms  (+9.28%)     Min: 10.7470ms    Max: 11.3263ms
histogram_hard_bounds         Memory: 15.5 KB              Avg: 5.1939ms  (+6.61%)     Median: 5.1722ms  (+10.98%)     Min: 5.0432ms     Max: 5.3910ms
histogram_with_avg_sub_agg    Memory: 48.7 KB              Avg: 23.8165ms  (+4.57%)    Median: 23.7264ms  (+10.06%)    Min: 23.4995ms    Max: 24.8107ms
dense
histogram                     Memory: 17.3 KB              Avg: 15.6810ms  (-8.54%)    Median: 15.6174ms  (-8.89%)    Min: 15.4953ms    Max: 16.0702ms
histogram_hard_bounds         Memory: 15.4 KB              Avg: 10.0720ms  (-7.33%)    Median: 10.0572ms  (-7.06%)    Min: 9.8500ms     Max: 10.4819ms
histogram_with_avg_sub_agg    Memory: 50.1 KB              Avg: 33.0993ms  (-7.04%)    Median: 32.9499ms  (-6.86%)    Min: 32.8284ms    Max: 34.0529ms
sparse
histogram                     Memory: 16.3 KB              Avg: 19.2325ms  (-0.44%)    Median: 19.1211ms  (-1.26%)    Min: 19.0348ms    Max: 19.7902ms
histogram_hard_bounds         Memory: 16.1 KB              Avg: 18.5179ms  (-0.61%)    Median: 18.4552ms  (-0.90%)    Min: 18.3799ms    Max: 19.0535ms
histogram_with_avg_sub_agg    Memory: 34.7 KB              Avg: 21.2589ms  (-0.69%)    Median: 21.1867ms  (-1.05%)    Min: 21.0342ms    Max: 21.9900ms
```

* add more bench with term as sub agg
2024-05-07 11:29:49 +02:00
Adam Reichold
4708171a32 Fix some of the things current Clippy complains about (#2363) 2024-04-16 04:27:06 +02:00
PSeitz
dfa3aed32d check unsupported parameters top_hits (#2351)
* check unsupported parameters top_hits

* move to function
2024-04-10 08:20:52 +02:00
PSeitz
74940e9345 clippy (#2349)
* fix clippy

* fix clippy

* fix duplicate imports
2024-04-09 07:54:44 +02:00
PSeitz
7e41d31c6e agg: support to deserialize f64 from string (#2311)
* agg: support to deserialize f64 from string

* remove visit_string

* disallow NaN
2024-03-05 05:49:41 +01:00
PSeitz
d57622d54b support bool type in term aggregation (#2318)
* support bool type in term aggregation

* add Bool to Intermediate Key
2024-02-20 03:22:22 +01:00
Harrison Burt
1c7c6fd591 POC: Tantivy documents as a trait (#2071)
* fix windows build (#1)

* Fix windows build

* Add doc traits

* Add field value iter

* Add value and serialization

* Adjust order

* Fix bug

* Correct type

* Fix generic bugs

* Reformat code

* Add generic to index writer which I forgot about

* Fix missing generics on single segment writer

* Add missing type export

* Add default methods for convenience

* Cleanup

* Fix more-like-this query to use standard types

* Update API and fix tests

* Add doc traits

* Add field value iter

* Add value and serialization

* Adjust order

* Fix bug

* Correct type

* Rebase main and fix conflicts

* Reformat code

* Merge upstream

* Fix missing generics on single segment writer

* Add missing type export

* Add default methods for convenience

* Cleanup

* Fix more-like-this query to use standard types

* Update API and fix tests

* Add tokenizer improvements from previous commits

* Add tokenizer improvements from previous commits

* Reformat

* Fix unit tests

* Fix unit tests

* Use enum in changes

* Stage changes

* Add new deserializer logic

* Add serializer integration

* Add document deserializer

* Implement new (de)serialization api for existing types

* Fix bugs and type errors

* Add helper implementations

* Fix errors

* Reformat code

* Add unit tests and some code organisation for serialization

* Add unit tests to deserializer

* Add some small docs

* Add support for deserializing serde values

* Reformat

* Fix typo

* Fix typo

* Change repr of facet

* Remove unused trait methods

* Add child value type

* Resolve comments

* Fix build

* Fix more build errors

* Fix more build errors

* Fix the tests I missed

* Fix examples

* fix numerical order, serialize PreTok Str

* fix coverage

* rename Document to TantivyDocument, rename DocumentAccess to Document

add Binary prefix to binary de/serialization

* fix coverage

---------

Co-authored-by: Pascal Seitz <pascal.seitz@gmail.com>
2023-10-02 10:01:16 +02:00
PSeitz
6239697a02 switch to ms in histogram for date type (#2045)
* switch to ms in histogram for date type

switch to ms in histogram, by adding a normalization step that converts
to nanoseconds precision when creating the collector.

closes #2028
related to #2026

* add missing unit long variants

* use single thread to avoid handling test case

* fix docs

* revert CI

* cleanup

* improve docs

* Update src/aggregation/bucket/histogram/histogram.rs

Co-authored-by: Paul Masurel <paul@quickwit.io>

---------

Co-authored-by: Paul Masurel <paul@quickwit.io>
2023-05-19 08:15:44 +02:00
PSeitz
ba3a885a3b handle multiple agg results (#2035)
handle multiple intermediate aggregation results with the same name.
2023-05-10 15:00:38 +02:00
Yuri Astrakhan
74275b76a6 Inline format arguments where makes sense (#2038)
Applied this command to the code, making it a bit shorter and slightly
more readable.

```
cargo +nightly clippy --all-features --benches --tests --workspace --fix -- -A clippy::all -W clippy::uninlined_format_args
cargo +nightly fmt --all
```
2023-05-10 18:03:59 +09:00
PSeitz
45ff0e3c5c clear memory consumption in AggregationLimits (#2022)
* clear memory consumption in AggregationLimits

clear memory consumption in AggregationLimits at the end of segment collection

* switch to ResourceLimitGuard

* unduplicate code

* merge methods

* Apply suggestions from code review

Co-authored-by: Paul Masurel <paul@quickwit.io>

---------

Co-authored-by: Paul Masurel <paul@quickwit.io>
2023-05-08 10:15:09 +02:00
PSeitz
cbf2bdc75b change bucket count type (#2013)
* change bucket count type

closes #2012

* Update src/aggregation/agg_limits.rs

Co-authored-by: Paul Masurel <paul@quickwit.io>

* Update src/directory/managed_directory.rs

Co-authored-by: Paul Masurel <paul@quickwit.io>

* fix test

---------

Co-authored-by: Paul Masurel <paul@quickwit.io>
2023-04-27 15:47:31 +08:00
PSeitz
c599bf3b6c chore!:drop JSON support on intermediate agg result (#1992)
* chore!:drop JSON support on intermediate agg result

add support for other formats by removing skip_serialize and untagged
JSON support is broken anyway due it's lack on f64::INF etc. handling

* Update src/aggregation/intermediate_agg_result.rs

Co-authored-by: Paul Masurel <paul@quickwit.io>

* move from impl

---------

Co-authored-by: Paul Masurel <paul@quickwit.io>
2023-04-26 13:05:16 +02:00
PSeitz
2e369db936 switch to Aggregation without serde_untagged (#2003)
* refactor result handling

* remove Internal stuff

* merge different accessors

* switch to Aggregation without serde_untagged

* fix doctests
2023-04-25 08:54:51 +02:00
PSeitz
e522163a1c use json in agg tests (#1998)
* switch to JSON in tests, add flat aggregation types

* use method

* clippy

* remove commented file
2023-04-17 14:08:48 +02:00
PSeitz
41af70799d add percentiles aggregations (#1984)
* add percentiles aggregations

add percentiles aggregation
fix disabled agg benchmark

* Update src/aggregation/metric/percentiles.rs

Co-authored-by: Paul Masurel <paul@quickwit.io>

* Apply suggestions from code review

Co-authored-by: Paul Masurel <paul@quickwit.io>

* fix import

* fix import

---------

Co-authored-by: Paul Masurel <paul@quickwit.io>
2023-04-07 07:18:28 +02:00
PSeitz
5c4ea6a708 tokenizer option on text fastfield (#1945)
* tokenizer option on text fastfield

allow to set tokenizer option on text fastfield (fixes #1901)
handle PreTokenized strings in fast field

* change visibility

* remove custom de/serialization
2023-03-31 10:03:38 +02:00
PSeitz
8f7f1d6be4 add Display for ByteCount (#1949)
* add Display for ByteCount

* export missing AggregationLimits
2023-03-21 08:02:35 +01:00
PSeitz
9e2faecf5b add memory limit for aggregations (#1942)
* add memory limit for aggregations

introduce AggregationLimits to set memory consumption limit and bucket limits
memory limit is checked during aggregation, bucket limit is checked before returning the aggregation request.

* Apply suggestions from code review

Co-authored-by: Paul Masurel <paul@quickwit.io>

* add ByteCount with human readable format

---------

Co-authored-by: Paul Masurel <paul@quickwit.io>
2023-03-16 06:21:07 +01:00
PSeitz
2fb3740cb0 handle missing column for aggs (#1920)
* handle missing column for aggs

add empty column fallback for missing column in aggs.
Fix sort for term agg on sub-agg with missing value (null is smallest)

* add error when field is not fast
2023-03-15 06:09:59 +01:00
PSeitz
61cfd8dc57 fix clippy (#1927) 2023-03-13 03:12:02 +01:00
PSeitz
ca20bfa776 add date_histogram (#1900)
* add date_histogram

* add return result
2023-03-02 05:17:35 +01:00
Paul Masurel
7fae4d98d7 Adapting for quickwit2 (#1912)
* Adapting tantivy to make it possible to be plugged to quickwit.

* Apply suggestions from code review

Co-authored-by: PSeitz <PSeitz@users.noreply.github.com>

* Added unit test

---------

Co-authored-by: PSeitz <PSeitz@users.noreply.github.com>
2023-03-01 16:27:46 +09:00
PSeitz
e510f699c8 feat: add support for u64,i64,f64 fields in term aggregation (#1883)
* feat: add support for u64,i64,f64 fields in term aggregation

* hash enum values

* fix build

* Apply suggestions from code review

Co-authored-by: Paul Masurel <paul@quickwit.io>

---------

Co-authored-by: Paul Masurel <paul@quickwit.io>
2023-02-27 15:04:41 +08:00
PSeitz
c7278b3258 remove schema in aggs (#1888)
* switch to ColumnType, move tests

* remove Schema dependency in agg
2023-02-22 04:50:28 +01:00
PSeitz
74bf60b4f7 implement SegmentAggregationCollector on bucket aggs (#1878) 2023-02-17 12:53:29 +01:00
PSeitz
019db10e8e refactor aggregations (#1875)
* add specialized version for full cardinality

Pre Columnar
test aggregation::tests::bench::bench_aggregation_average_u64                                                            ... bench:   6,681,850 ns/iter (+/- 1,217,385)
test aggregation::tests::bench::bench_aggregation_average_u64_and_f64                                                    ... bench:  10,576,327 ns/iter (+/- 494,380)

Current
test aggregation::tests::bench::bench_aggregation_average_u64                                                            ... bench:  11,562,084 ns/iter (+/- 3,678,682)
test aggregation::tests::bench::bench_aggregation_average_u64_and_f64                                                    ... bench:  18,925,790 ns/iter (+/- 17,616,771)

Post Change
test aggregation::tests::bench::bench_aggregation_average_u64                                                            ... bench:   9,123,811 ns/iter (+/- 399,720)
test aggregation::tests::bench::bench_aggregation_average_u64_and_f64                                                    ... bench:  13,111,825 ns/iter (+/- 273,547)

* refactor aggregation collection

* add buffering collector
2023-02-16 13:15:16 +01:00
PSeitz
347614c841 test error for avg agg on ip field (#1873)
closes #1835
2023-02-14 23:22:56 +08:00
PSeitz
7a9befd18d fix sort order test for term aggregation (#1858)
fix sort order test for term aggregation
fix invalid request test
2023-02-10 10:26:58 +01:00
Paul Masurel
bd5eea9852 Integrated columnar work. 2023-02-09 13:14:31 +01:00
PSeitz
a2ca12995e update aggregation docs (#1807) 2023-01-19 09:52:47 +01:00
Adrien Guillo
f2dad194ea Add count, min, max, and sum aggregations 2023-01-16 12:22:20 -05:00
Adam Reichold
8312c882a5 More cosmetic fixes for upcoming Clippy lints. (#1771) 2023-01-10 10:32:45 +01:00
PSeitz
f9171a3981 fix clippy (#1725)
* fix clippy

* fix clippy fastfield codecs

* fix clippy bitpacker

* fix clippy common

* fix clippy stacker

* fix clippy sstable

* fmt
2022-12-20 07:30:06 +01:00
PSeitz
ee1f2c1f28 add aggregation support for date type (#1693)
* add aggregation support for date type
fixes #1332

* serialize key_as_string as rfc3339 in date histogram
* update docs
* enable date for range aggregation
2022-11-28 09:12:08 +09:00
Pascal Seitz
952b048341 add term aggregation clarification 2022-10-14 16:12:19 +08:00
Bruce Mitchener
cf02e32578 Improvements to doc linking, grammar, etc. 2022-09-19 18:10:22 +07:00
PSeitz
45924711fd improve docs (#1514)
fix link alias after https://github.com/rust-lang/rustfmt/pull/5262 has been merged and released.
fix dead links
2022-09-08 22:33:59 +09:00
Paul Masurel
26876d41d7 Moving the serialization logic to the fastfield_codecs crate. 2022-09-03 00:29:52 +09:00
k-yomo
704d0a8d8b Refactor range aggregation tests 2022-07-28 06:31:25 +09:00
k-yomo
9b6b60cc2b Remove unnecessary keyed parameter setting 2022-07-27 18:43:52 +09:00
k-yomo
6444516a82 User serde default for the keyed params 2022-07-27 01:12:56 +09:00
k-yomo
a9b0d1a0ab Fix aggreagtion examples 2022-07-26 18:54:27 +09:00
k-yomo
2b333ca635 Fix keyed param type in the comment 2022-07-26 18:35:01 +09:00
k-yomo
5ab5f070ed Fix to use bool directory for the keyed parameter 2022-07-26 18:18:38 +09:00