tantivy

mirror of https://github.com/quickwit-oss/tantivy.git synced 2026-01-07 17:42:55 +00:00

Author	SHA1	Message	Date
PSeitz	92c32979d2	fix postcard compatibility for top_hits, add postcard test (#2346 ) * fix postcard compatibility for top_hits, add postcard test * fix top_hits naming, delay data fetch closes #2347 * fix import	2024-04-09 06:17:25 +02:00
Tushar	0e04ec3136	feat(aggregators/metric): Add a top_hits aggregator (#2198 ) * feat(aggregators/metric): Implement a top_hits aggregator * fix: Expose get_fields * fix: Serializer for top_hits request Also removes extraneous the extraneous third-party serialization helper. * chore: Avert panick on parsing invalid top_hits query * refactor: Allow multiple field names from aggregations * perf: Replace binary heap with TopNComputer * fix: Avoid comparator inversion by ComparableDoc * fix: Rank missing field values lower than present values * refactor: Make KeyOrder a struct * feat: Rough attempt at docvalue_fields * feat: Complete stab at docvalue_fields - Rename "SearchResult" => "Retrieval" - Revert Vec => HashMap for aggregation accessors. - Split accessors for core aggregation and field retrieval. - Resolve globbed field names in docvalue_fields retrieval. - Handle strings/bytes and other column types with DynamicColumn * test(unit): Add tests for top_hits aggregator * fix: docfield_value field globbing * test(unit): Include dynamic fields * fix: Value -> OwnedValue * fix: Use OwnedValue's native Null variant * chore: Improve readability of test asserts * chore: Remove DocAddress from top_hits result * docs: Update aggregator doc * revert: accidental doc test * chore: enable time macros only for tests * chore: Apply suggestions from review * chore: Apply suggestions from review * fix: Retrieve all values for fields * test(unit): Update for multi-value retrieval * chore: Assert term existence * feat: Include all columns for a column name Since a (name, type) constitutes a unique column. * fix: Resolve json fields Introduces a translation step to bridge the difference between ColumnarReaders null `\0` separated json field keys to the common `.` separated used by SegmentReader. Although, this should probably be the default behavior for ColumnarReader's public API perhaps. * chore: Address review on mutability * chore: s/segment_id/segment_ordinal instances of SegmentOrdinal * chore: Revert erroneous grammar change	2024-01-26 16:46:41 +01:00
PSeitz	b1d8b072db	add missing aggregation part 2 (#2149 ) * add missing aggregation part 2 Add missing support for: - Mixed types columns - Key of type string on numerical fields The special aggregation is slower than the integrated one in TermsAggregation and therefore not chosen by default, although it can cover all use cases. * simplify, add num_docs to empty	2023-08-31 07:55:33 +02:00
PSeitz	73cb71762f	add missing parameter for stats,min,max,count,sum,avg (#2151 ) * add missing parameter for stats,min,max,count,sum,avg add missing parameter for stats,min,max,count,sum,avg closes #1913 partially #1789 * Apply suggestions from code review Co-authored-by: Paul Masurel <paul@quickwit.io> --------- Co-authored-by: Paul Masurel <paul@quickwit.io>	2023-08-28 08:59:51 +02:00
PSeitz	c2be6603a2	alternative mixed field aggregation collection (#2135 ) * alternative mixed field aggregation collection instead of having multiple accessor in one AggregationWithAccessor split it into multiple independent AggregationWithAccessor * Update src/aggregation/agg_req_with_accessor.rs Co-authored-by: Paul Masurel <paul@quickwit.io> --------- Co-authored-by: Paul Masurel <paul@quickwit.io>	2023-07-27 12:25:31 +02:00
Adam Reichold	c805f08ca7	Fix a few more upcoming Clippy lints (#2133 )	2023-07-24 17:07:57 +09:00
PSeitz	6239697a02	switch to ms in histogram for date type (#2045 ) * switch to ms in histogram for date type switch to ms in histogram, by adding a normalization step that converts to nanoseconds precision when creating the collector. closes #2028 related to #2026 * add missing unit long variants * use single thread to avoid handling test case * fix docs * revert CI * cleanup * improve docs * Update src/aggregation/bucket/histogram/histogram.rs Co-authored-by: Paul Masurel <paul@quickwit.io> --------- Co-authored-by: Paul Masurel <paul@quickwit.io>	2023-05-19 08:15:44 +02:00
PSeitz	2dfe37940d	handle multiple types in term aggregation (#2041 )	2023-05-15 11:57:38 +02:00
PSeitz	45ff0e3c5c	clear memory consumption in AggregationLimits (#2022 ) * clear memory consumption in AggregationLimits clear memory consumption in AggregationLimits at the end of segment collection * switch to ResourceLimitGuard * unduplicate code * merge methods * Apply suggestions from code review Co-authored-by: Paul Masurel <paul@quickwit.io> --------- Co-authored-by: Paul Masurel <paul@quickwit.io>	2023-05-08 10:15:09 +02:00
PSeitz	1f06997d04	fix single collector special case (#2014 )	2023-04-27 09:30:19 +02:00
PSeitz	2e369db936	switch to Aggregation without serde_untagged (#2003 ) * refactor result handling * remove Internal stuff * merge different accessors * switch to Aggregation without serde_untagged * fix doctests	2023-04-25 08:54:51 +02:00
PSeitz	41af70799d	add percentiles aggregations (#1984 ) * add percentiles aggregations add percentiles aggregation fix disabled agg benchmark * Update src/aggregation/metric/percentiles.rs Co-authored-by: Paul Masurel <paul@quickwit.io> * Apply suggestions from code review Co-authored-by: Paul Masurel <paul@quickwit.io> * fix import * fix import --------- Co-authored-by: Paul Masurel <paul@quickwit.io>	2023-04-07 07:18:28 +02:00
PSeitz	da2804644f	fetch blocks of vals in aggregation for all cardinality (#1950 ) * fetch blocks of vals in aggregation for all cardinality * move caching in common accessor	2023-03-23 08:41:11 +01:00
PSeitz	9e2faecf5b	add memory limit for aggregations (#1942 ) * add memory limit for aggregations introduce AggregationLimits to set memory consumption limit and bucket limits memory limit is checked during aggregation, bucket limit is checked before returning the aggregation request. * Apply suggestions from code review Co-authored-by: Paul Masurel <paul@quickwit.io> * add ByteCount with human readable format --------- Co-authored-by: Paul Masurel <paul@quickwit.io>	2023-03-16 06:21:07 +01:00
PSeitz	ca20bfa776	add date_histogram (#1900 ) * add date_histogram * add return result	2023-03-02 05:17:35 +01:00
PSeitz	bc36458334	move buffer in front of dynamic dispatch (#1915 ) dynamic dispatch seems to be really expensive, move the buffer in front of the dynamic dispatch, to reduce the number of calls into the dynamic dispatched collector.	2023-02-28 13:07:50 +08:00
PSeitz	e510f699c8	feat: add support for u64,i64,f64 fields in term aggregation (#1883 ) * feat: add support for u64,i64,f64 fields in term aggregation * hash enum values * fix build * Apply suggestions from code review Co-authored-by: Paul Masurel <paul@quickwit.io> --------- Co-authored-by: Paul Masurel <paul@quickwit.io>	2023-02-27 15:04:41 +08:00
PSeitz	5f23bb7e65	switch to sparse collection for histogram (#1898 ) * switch to sparse collection for histogram Replaces histogram vec collection with a hashmap. This approach works much better for sparse data and enables use cases like drill downs (filter + small interval). It is slower for dense cases (1.3x-2x slower). This can be alleviated with a specialized hashmap in the future. closes #1704 closes #1370 * refactor, clippy * fix bucket_pos overflow issue	2023-02-23 07:02:58 +01:00
Paul Masurel	e2aa5af075	Clippy warnings fixes (#1885 )	2023-02-20 19:04:13 +09:00
PSeitz	74bf60b4f7	implement SegmentAggregationCollector on bucket aggs (#1878 )	2023-02-17 12:53:29 +01:00
PSeitz	111f25a8f7	clippy (#1879 ) * fix clippy * fix clippy * fmt	2023-02-17 11:34:21 +01:00
PSeitz	019db10e8e	refactor aggregations (#1875 ) * add specialized version for full cardinality Pre Columnar test aggregation::tests::bench::bench_aggregation_average_u64 ... bench: 6,681,850 ns/iter (+/- 1,217,385) test aggregation::tests::bench::bench_aggregation_average_u64_and_f64 ... bench: 10,576,327 ns/iter (+/- 494,380) Current test aggregation::tests::bench::bench_aggregation_average_u64 ... bench: 11,562,084 ns/iter (+/- 3,678,682) test aggregation::tests::bench::bench_aggregation_average_u64_and_f64 ... bench: 18,925,790 ns/iter (+/- 17,616,771) Post Change test aggregation::tests::bench::bench_aggregation_average_u64 ... bench: 9,123,811 ns/iter (+/- 399,720) test aggregation::tests::bench::bench_aggregation_average_u64_and_f64 ... bench: 13,111,825 ns/iter (+/- 273,547) * refactor aggregation collection * add buffering collector	2023-02-16 13:15:16 +01:00
Paul Masurel	bd5eea9852	Integrated columnar work.	2023-02-09 13:14:31 +01:00
Adrien Guillo	f2dad194ea	Add count, min, max, and sum aggregations	2023-01-16 12:22:20 -05:00
PSeitz	6ca9a477f3	reuse stats for average (#1785 ) * reuse stats for average * fix count type	2023-01-13 23:32:27 +08:00
PSeitz	f9171a3981	fix clippy (#1725 ) * fix clippy * fix clippy fastfield codecs * fix clippy bitpacker * fix clippy common * fix clippy stacker * fix clippy sstable * fmt	2022-12-20 07:30:06 +01:00
Paul Masurel	8e775b6c3d	Refactoring dyn Column (#1502 )	2022-09-02 17:26:30 +09:00
Pascal Seitz	44ea7313ca	set max bucket size as parameter	2022-05-13 13:21:52 +08:00
Pascal Seitz	11ac451250	abort aggregation when too many buckets are created Validation happens on different phases depending on the aggregation Term: During segment collection Histogram: At the end when converting in intermediate buckets (we preallocate empty buckets for the range) Revisit after #1370 Range: When validating the request update CHANGELOG	2022-05-12 12:26:43 +08:00
Pascal Seitz	6a4632211a	forward error in aggregation collect	2022-05-12 12:26:43 +08:00
Pascal Seitz	1be6c6111c	support order property on term aggregations support order property on term aggregations order can be by doc_count, key, or a metric sub_aggregation	2022-04-20 00:34:38 +08:00
Pascal Seitz	902d05ebec	refactor getffreader function	2022-04-13 19:51:18 +08:00
Pascal Seitz	dd13dedaeb	forward errors, remove unwrap	2022-04-13 19:51:18 +08:00
Pascal Seitz	24432bf523	add term aggregation	2022-04-13 19:51:18 +08:00
Pascal Seitz	0262e44bbd	merge_fruits pass by value	2022-03-15 12:59:22 +08:00
Pascal Seitz	613aad7a8a	vec optional, improve performance	2022-03-14 21:29:07 +08:00
Pascal Seitz	1aa88b0c51	improve performance	2022-03-14 20:28:08 +08:00
Pascal Seitz	564fa38085	move sub_aggregations to own vec, use itertools minmax	2022-03-14 16:20:26 +08:00
Pascal Seitz	226f577803	Add Histogram aggregation	2022-03-11 21:52:07 +08:00
PSeitz	c4f66eb185	improve validation in aggregation, extend invalid field test (#1292 ) * improve validation in aggregation, extend invalid field test improve validation in aggregation extend invalid field test Fixes #1291 * collect fast field names on request structure * fix visibility of AggregationSegmentCollector	2022-02-25 15:21:19 +09:00
PSeitz	972cb6c26d	Aggregation (#1276 ) Added support for aggregation compatible with Elasticsearch's API.	2022-02-21 09:59:11 +09:00

41 Commits