Paul Masurel
8ea97e7d6b
Minor refactoring preparing for getting columnar integrated in quickwit. ( #1911 )
2023-02-27 14:23:30 +09:00
Paul Masurel
c838aa808b
Removedc the extra nesting in unit test file ( #1907 )
2023-02-27 12:17:52 +09:00
Paul Masurel
06850719dc
Renaming .values(DocId) to .values_for_doc(DocId) ( #1906 )
2023-02-27 12:15:13 +09:00
PSeitz
5f23bb7e65
switch to sparse collection for histogram ( #1898 )
...
* switch to sparse collection for histogram
Replaces histogram vec collection with a hashmap. This approach works much better for sparse data and enables use cases like drill downs (filter + small interval).
It is slower for dense cases (1.3x-2x slower). This can be alleviated with a specialized hashmap in the future.
closes #1704
closes #1370
* refactor, clippy
* fix bucket_pos overflow issue
2023-02-23 07:02:58 +01:00
PSeitz
c7278b3258
remove schema in aggs ( #1888 )
...
* switch to ColumnType, move tests
* remove Schema dependency in agg
2023-02-22 04:50:28 +01:00
Paul Masurel
e2aa5af075
Clippy warnings fixes ( #1885 )
2023-02-20 19:04:13 +09:00
PSeitz
74bf60b4f7
implement SegmentAggregationCollector on bucket aggs ( #1878 )
2023-02-17 12:53:29 +01:00
PSeitz
111f25a8f7
clippy ( #1879 )
...
* fix clippy
* fix clippy
* fmt
2023-02-17 11:34:21 +01:00
PSeitz
019db10e8e
refactor aggregations ( #1875 )
...
* add specialized version for full cardinality
Pre Columnar
test aggregation::tests::bench::bench_aggregation_average_u64 ... bench: 6,681,850 ns/iter (+/- 1,217,385)
test aggregation::tests::bench::bench_aggregation_average_u64_and_f64 ... bench: 10,576,327 ns/iter (+/- 494,380)
Current
test aggregation::tests::bench::bench_aggregation_average_u64 ... bench: 11,562,084 ns/iter (+/- 3,678,682)
test aggregation::tests::bench::bench_aggregation_average_u64_and_f64 ... bench: 18,925,790 ns/iter (+/- 17,616,771)
Post Change
test aggregation::tests::bench::bench_aggregation_average_u64 ... bench: 9,123,811 ns/iter (+/- 399,720)
test aggregation::tests::bench::bench_aggregation_average_u64_and_f64 ... bench: 13,111,825 ns/iter (+/- 273,547)
* refactor aggregation collection
* add buffering collector
2023-02-16 13:15:16 +01:00
PSeitz
347614c841
test error for avg agg on ip field ( #1873 )
...
closes #1835
2023-02-14 23:22:56 +08:00
Paul Masurel
097fd6138d
Fix clippy comments ( #1872 )
2023-02-14 23:12:45 +09:00
Paul Masurel
60cc2644d6
Fixing test_fail_on_flush_segment_but_one_worker_remains ( #1869 )
...
The new fast field code, based on columnar, had a larger minimum memory
footprint, causing the first docuemnt to trigger a flush of the asegment
in this unit test.
This PR prevents the allocation of a large capacity for the different hashmap tables
using in the columnar writer.
Closes #1859
2023-02-14 16:09:42 +09:00
Paul Masurel
10bccac61b
Bugfix in parse_into_milliseconds ( #1867 )
2023-02-14 15:06:40 +09:00
PSeitz
7a9befd18d
fix sort order test for term aggregation ( #1858 )
...
fix sort order test for term aggregation
fix invalid request test
2023-02-10 10:26:58 +01:00
Paul Masurel
bd5eea9852
Integrated columnar work.
2023-02-09 13:14:31 +01:00
PSeitz
a2ca12995e
update aggregation docs ( #1807 )
2023-01-19 09:52:47 +01:00
PSeitz
f687b3a5aa
start migrate Field to &str ( #1772 )
...
start migrate Field to &str in preparation of columnar
return Result for get_field
2023-01-18 16:12:07 +09:00
Adrien Guillo
c51d9f9f83
Fix some Clippy warnings
2023-01-17 10:17:51 -05:00
Adrien Guillo
0caaf13a90
Remove standard deviation from stats aggregation
2023-01-16 22:58:23 -05:00
Adrien Guillo
f2dad194ea
Add count, min, max, and sum aggregations
2023-01-16 12:22:20 -05:00
PSeitz
6ca9a477f3
reuse stats for average ( #1785 )
...
* reuse stats for average
* fix count type
2023-01-13 23:32:27 +08:00
Adam Reichold
8312c882a5
More cosmetic fixes for upcoming Clippy lints. ( #1771 )
2023-01-10 10:32:45 +01:00
PSeitz
f9171a3981
fix clippy ( #1725 )
...
* fix clippy
* fix clippy fastfield codecs
* fix clippy bitpacker
* fix clippy common
* fix clippy stacker
* fix clippy sstable
* fmt
2022-12-20 07:30:06 +01:00
PSeitz
2c50b02eb3
Fix max bucket limit in histogram ( #1703 )
...
* Fix max bucket limit in histogram
The max bucket limit in histogram was broken, since some code introduced temporary filtering of buckets, which then resulted into an incorrect increment on the bucket count.
The provided solution covers more scenarios, but there are still some scenarios unhandled (See #1702 ).
* Apply suggestions from code review
Co-authored-by: Paul Masurel <paul@quickwit.io >
Co-authored-by: Paul Masurel <paul@quickwit.io >
2022-12-12 04:40:15 +01:00
PSeitz
ee1f2c1f28
add aggregation support for date type ( #1693 )
...
* add aggregation support for date type
fixes #1332
* serialize key_as_string as rfc3339 in date histogram
* update docs
* enable date for range aggregation
2022-11-28 09:12:08 +09:00
Pascal Seitz
279b1b28d3
switch to fx hashmap
2022-10-27 16:19:59 +08:00
Adam Reichold
bbb058d976
Replace FNV by rustc-hash
...
Both construction have similar goals but rustc-hash ist better suited for
contemporary CPU as it works one word at a time instead of byte per byte.
2022-10-27 00:35:09 +02:00
Pascal Seitz
e772d3170d
switch get_val() to u32
...
Fixes #1638
2022-10-24 19:05:57 +08:00
Pascal Seitz
952b048341
add term aggregation clarification
2022-10-14 16:12:19 +08:00
Bruce Mitchener
b3bf9a5716
Documentation improvements.
2022-10-05 14:18:10 +07:00
Bruce Mitchener
44e03791f9
Fix warnings when doc'ing private items. ( #1579 )
...
This also fixes a couple of typos, but plenty remain!
2022-10-03 14:24:00 +09:00
Bruce Mitchener
d231671fe2
clippy: Remove borrows that the compiler will do.
...
This started showing up with clippy in rust 1.64.
2022-09-22 22:38:23 +07:00
Bruce Mitchener
cf02e32578
Improvements to doc linking, grammar, etc.
2022-09-19 18:10:22 +07:00
PSeitz
45924711fd
improve docs ( #1514 )
...
fix link alias after https://github.com/rust-lang/rustfmt/pull/5262 has been merged and released.
fix dead links
2022-09-08 22:33:59 +09:00
Paul Masurel
26876d41d7
Moving the serialization logic to the fastfield_codecs crate.
2022-09-03 00:29:52 +09:00
Paul Masurel
8e775b6c3d
Refactoring dyn Column ( #1502 )
2022-09-02 17:26:30 +09:00
Paul Masurel
5331be800b
Introducing a column trait
2022-08-28 14:14:27 +02:00
Kian-Meng Ang
014b1adc3e
cargo +nightly fmt
2022-08-17 22:33:44 +08:00
Kian-Meng Ang
84295d5b35
cargo fmt
2022-08-15 21:07:01 +08:00
Kian-Meng Ang
625bcb4877
Fix typos and markdowns
...
Found via these commands:
codespell -L crate,ser,panting,beauti,hart,ue,atleast,childs,ond,pris,hel,mot
markdownlint *.md doc/src/*.md --disable MD013 MD025 MD033 MD001 MD024 MD036 MD041 MD003
2022-08-13 18:25:47 +08:00
Evance Soumaoro
fad3faefe2
added InvertedIndexReader::doc_freq_async and SnippetGenerator::new methods
2022-08-12 06:39:10 +00:00
k-yomo
099e626156
Refactor InternalRangeAggregationRange initialization with From trait
2022-07-29 05:41:29 +09:00
k-yomo
704d0a8d8b
Refactor range aggregation tests
2022-07-28 06:31:25 +09:00
k-yomo
195309a557
Add support for custom key param for range aggregation
2022-07-28 06:21:39 +09:00
k-yomo
9b6b60cc2b
Remove unnecessary keyed parameter setting
2022-07-27 18:43:52 +09:00
k-yomo
6444516a82
User serde default for the keyed params
2022-07-27 01:12:56 +09:00
k-yomo
a9b0d1a0ab
Fix aggreagtion examples
2022-07-26 18:54:27 +09:00
k-yomo
2b333ca635
Fix keyed param type in the comment
2022-07-26 18:35:01 +09:00
k-yomo
80a1418284
Use FnvHashMap for keyed bucket entries
2022-07-26 18:24:54 +09:00
k-yomo
5ab5f070ed
Fix to use bool directory for the keyed parameter
2022-07-26 18:18:38 +09:00