tantivy

mirror of https://github.com/quickwit-oss/tantivy.git synced 2026-05-25 12:40:41 +00:00

Author	SHA1	Message	Date
PSeitz	3af456972e	Fix min doc_count empty merge bug (#2057 ) This fixes an issue when min_doc==0 loads terms from the dictionary from one segment and merges the same term with a subaggregation from another segment. Previously the empty structure was not correctly initialized to contain the subaggregation so the merge was incorrect.	2023-05-29 14:20:50 +08:00
PSeitz	6239697a02	switch to ms in histogram for date type (#2045 ) * switch to ms in histogram for date type switch to ms in histogram, by adding a normalization step that converts to nanoseconds precision when creating the collector. closes #2028 related to #2026 * add missing unit long variants * use single thread to avoid handling test case * fix docs * revert CI * cleanup * improve docs * Update src/aggregation/bucket/histogram/histogram.rs Co-authored-by: Paul Masurel <paul@quickwit.io> --------- Co-authored-by: Paul Masurel <paul@quickwit.io>	2023-05-19 08:15:44 +02:00
PSeitz	2dfe37940d	handle multiple types in term aggregation (#2041 )	2023-05-15 11:57:38 +02:00
PSeitz	ba3a885a3b	handle multiple agg results (#2035 ) handle multiple intermediate aggregation results with the same name.	2023-05-10 15:00:38 +02:00
Yuri Astrakhan	74275b76a6	Inline format arguments where makes sense (#2038 ) Applied this command to the code, making it a bit shorter and slightly more readable. ``` cargo +nightly clippy --all-features --benches --tests --workspace --fix -- -A clippy::all -W clippy::uninlined_format_args cargo +nightly fmt --all ```	2023-05-10 18:03:59 +09:00
PSeitz	45ff0e3c5c	clear memory consumption in AggregationLimits (#2022 ) * clear memory consumption in AggregationLimits clear memory consumption in AggregationLimits at the end of segment collection * switch to ResourceLimitGuard * unduplicate code * merge methods * Apply suggestions from code review Co-authored-by: Paul Masurel <paul@quickwit.io> --------- Co-authored-by: Paul Masurel <paul@quickwit.io>	2023-05-08 10:15:09 +02:00
François Massot	992f755298	Fix clippy.	2023-05-05 10:51:29 +02:00
François Massot	c8df843f96	Fix date histogram bounds and field name.	2023-05-05 00:52:55 +02:00
PSeitz	ba309e18a1	switch to nanosecond precision (#2016 )	2023-05-01 03:32:20 +02:00
PSeitz	cbf2bdc75b	change bucket count type (#2013 ) * change bucket count type closes #2012 * Update src/aggregation/agg_limits.rs Co-authored-by: Paul Masurel <paul@quickwit.io> * Update src/directory/managed_directory.rs Co-authored-by: Paul Masurel <paul@quickwit.io> * fix test --------- Co-authored-by: Paul Masurel <paul@quickwit.io>	2023-04-27 15:47:31 +08:00
PSeitz	1f06997d04	fix single collector special case (#2014 )	2023-04-27 09:30:19 +02:00
PSeitz	c599bf3b6c	chore!:drop JSON support on intermediate agg result (#1992 ) * chore!:drop JSON support on intermediate agg result add support for other formats by removing skip_serialize and untagged JSON support is broken anyway due it's lack on f64::INF etc. handling * Update src/aggregation/intermediate_agg_result.rs Co-authored-by: Paul Masurel <paul@quickwit.io> * move from impl --------- Co-authored-by: Paul Masurel <paul@quickwit.io>	2023-04-26 13:05:16 +02:00
PSeitz	2e369db936	switch to Aggregation without serde_untagged (#2003 ) * refactor result handling * remove Internal stuff * merge different accessors * switch to Aggregation without serde_untagged * fix doctests	2023-04-25 08:54:51 +02:00
PSeitz	e522163a1c	use json in agg tests (#1998 ) * switch to JSON in tests, add flat aggregation types * use method * clippy * remove commented file	2023-04-17 14:08:48 +02:00
PSeitz	0ed13eeea8	add sparse to agg benchmark (#1986 ) * add sparse to agg benchmark * Update src/aggregation/agg_bench.rs Co-authored-by: Paul Masurel <paul@quickwit.io> --------- Co-authored-by: Paul Masurel <paul@quickwit.io>	2023-04-11 08:13:32 +02:00
PSeitz	41af70799d	add percentiles aggregations (#1984 ) * add percentiles aggregations add percentiles aggregation fix disabled agg benchmark * Update src/aggregation/metric/percentiles.rs Co-authored-by: Paul Masurel <paul@quickwit.io> * Apply suggestions from code review Co-authored-by: Paul Masurel <paul@quickwit.io> * fix import * fix import --------- Co-authored-by: Paul Masurel <paul@quickwit.io>	2023-04-07 07:18:28 +02:00
PSeitz	5c4ea6a708	tokenizer option on text fastfield (#1945 ) * tokenizer option on text fastfield allow to set tokenizer option on text fastfield (fixes #1901) handle PreTokenized strings in fast field * change visibility * remove custom de/serialization	2023-03-31 10:03:38 +02:00
PSeitz	5c380b76e7	Better mixed types support in aggs and fix serialization issue (#1971 ) * Better mixed types support in aggs and fix serialization issue - Improve support for mixed types in JSON field aggregations (pick the right field, #1913) - Resolve the issue with JSON serialization for numeric keys (fixes #1967) - Add JSON round-trip test for term buckets - Remove `u64_lenient`, as this is a footgun without the type - move aggregation benchmarks * remove shadowing	2023-03-31 05:52:11 +02:00
Paul Masurel	2b6a4da640	Exposing empty column builder. (#1959 )	2023-03-24 16:34:41 +09:00
PSeitz	d6a95381ee	add memory check for term agg (#1957 )	2023-03-24 06:47:45 +01:00
PSeitz	da2804644f	fetch blocks of vals in aggregation for all cardinality (#1950 ) * fetch blocks of vals in aggregation for all cardinality * move caching in common accessor	2023-03-23 08:41:11 +01:00
PSeitz	8f7f1d6be4	add Display for ByteCount (#1949 ) * add Display for ByteCount * export missing AggregationLimits	2023-03-21 08:02:35 +01:00
PSeitz	6a7a1106d6	work in batches of docs (#1937 ) * work in batches of docs * add fill_buffer test	2023-03-21 06:57:44 +01:00
PSeitz	9e2faecf5b	add memory limit for aggregations (#1942 ) * add memory limit for aggregations introduce AggregationLimits to set memory consumption limit and bucket limits memory limit is checked during aggregation, bucket limit is checked before returning the aggregation request. * Apply suggestions from code review Co-authored-by: Paul Masurel <paul@quickwit.io> * add ByteCount with human readable format --------- Co-authored-by: Paul Masurel <paul@quickwit.io>	2023-03-16 06:21:07 +01:00
PSeitz	b6703f1b3c	fix validation in date histogram (#1936 ) fix validation in date histogram for parameters interval and date_interval	2023-03-15 06:10:43 +01:00
PSeitz	2fb3740cb0	handle missing column for aggs (#1920 ) * handle missing column for aggs add empty column fallback for missing column in aggs. Fix sort for term agg on sub-agg with missing value (null is smallest) * add error when field is not fast	2023-03-15 06:09:59 +01:00
PSeitz	8459efa32c	split term collection count and sub_agg (#1921 ) use unrolled ColumnValues::get_vals	2023-03-13 04:37:41 +01:00
PSeitz	61cfd8dc57	fix clippy (#1927 )	2023-03-13 03:12:02 +01:00
Paul Masurel	364e321415	Clippy fix (#1926 )	2023-03-06 10:37:17 +09:00
PSeitz	ca20bfa776	add date_histogram (#1900 ) * add date_histogram * add return result	2023-03-02 05:17:35 +01:00
PSeitz	850a0d7ae2	add agg benchmark for optional and multi value (#1916 ) closes #1870	2023-03-01 17:01:52 +09:00
Paul Masurel	7fae4d98d7	Adapting for quickwit2 (#1912 ) * Adapting tantivy to make it possible to be plugged to quickwit. * Apply suggestions from code review Co-authored-by: PSeitz <PSeitz@users.noreply.github.com> * Added unit test --------- Co-authored-by: PSeitz <PSeitz@users.noreply.github.com>	2023-03-01 16:27:46 +09:00
PSeitz	bc36458334	move buffer in front of dynamic dispatch (#1915 ) dynamic dispatch seems to be really expensive, move the buffer in front of the dynamic dispatch, to reduce the number of calls into the dynamic dispatched collector.	2023-02-28 13:07:50 +08:00
PSeitz	e510f699c8	feat: add support for u64,i64,f64 fields in term aggregation (#1883 ) * feat: add support for u64,i64,f64 fields in term aggregation * hash enum values * fix build * Apply suggestions from code review Co-authored-by: Paul Masurel <paul@quickwit.io> --------- Co-authored-by: Paul Masurel <paul@quickwit.io>	2023-02-27 15:04:41 +08:00
Paul Masurel	d25fc155b2	Making some of the column/termdict operations async-friendly (#1902 )	2023-02-27 15:34:47 +09:00
Paul Masurel	8ea97e7d6b	Minor refactoring preparing for getting columnar integrated in quickwit. (#1911 )	2023-02-27 14:23:30 +09:00
Paul Masurel	c838aa808b	Removedc the extra nesting in unit test file (#1907 )	2023-02-27 12:17:52 +09:00
Paul Masurel	06850719dc	Renaming .values(DocId) to .values_for_doc(DocId) (#1906 )	2023-02-27 12:15:13 +09:00
PSeitz	5f23bb7e65	switch to sparse collection for histogram (#1898 ) * switch to sparse collection for histogram Replaces histogram vec collection with a hashmap. This approach works much better for sparse data and enables use cases like drill downs (filter + small interval). It is slower for dense cases (1.3x-2x slower). This can be alleviated with a specialized hashmap in the future. closes #1704 closes #1370 * refactor, clippy * fix bucket_pos overflow issue	2023-02-23 07:02:58 +01:00
PSeitz	c7278b3258	remove schema in aggs (#1888 ) * switch to ColumnType, move tests * remove Schema dependency in agg	2023-02-22 04:50:28 +01:00
Paul Masurel	e2aa5af075	Clippy warnings fixes (#1885 )	2023-02-20 19:04:13 +09:00
PSeitz	74bf60b4f7	implement SegmentAggregationCollector on bucket aggs (#1878 )	2023-02-17 12:53:29 +01:00
PSeitz	111f25a8f7	clippy (#1879 ) * fix clippy * fix clippy * fmt	2023-02-17 11:34:21 +01:00
PSeitz	019db10e8e	refactor aggregations (#1875 ) * add specialized version for full cardinality Pre Columnar test aggregation::tests::bench::bench_aggregation_average_u64 ... bench: 6,681,850 ns/iter (+/- 1,217,385) test aggregation::tests::bench::bench_aggregation_average_u64_and_f64 ... bench: 10,576,327 ns/iter (+/- 494,380) Current test aggregation::tests::bench::bench_aggregation_average_u64 ... bench: 11,562,084 ns/iter (+/- 3,678,682) test aggregation::tests::bench::bench_aggregation_average_u64_and_f64 ... bench: 18,925,790 ns/iter (+/- 17,616,771) Post Change test aggregation::tests::bench::bench_aggregation_average_u64 ... bench: 9,123,811 ns/iter (+/- 399,720) test aggregation::tests::bench::bench_aggregation_average_u64_and_f64 ... bench: 13,111,825 ns/iter (+/- 273,547) * refactor aggregation collection * add buffering collector	2023-02-16 13:15:16 +01:00
PSeitz	347614c841	test error for avg agg on ip field (#1873 ) closes #1835	2023-02-14 23:22:56 +08:00
Paul Masurel	097fd6138d	Fix clippy comments (#1872 )	2023-02-14 23:12:45 +09:00
Paul Masurel	60cc2644d6	Fixing test_fail_on_flush_segment_but_one_worker_remains (#1869 ) The new fast field code, based on columnar, had a larger minimum memory footprint, causing the first docuemnt to trigger a flush of the asegment in this unit test. This PR prevents the allocation of a large capacity for the different hashmap tables using in the columnar writer. Closes #1859	2023-02-14 16:09:42 +09:00
Paul Masurel	10bccac61b	Bugfix in parse_into_milliseconds (#1867 )	2023-02-14 15:06:40 +09:00
PSeitz	7a9befd18d	fix sort order test for term aggregation (#1858 ) fix sort order test for term aggregation fix invalid request test	2023-02-10 10:26:58 +01:00
Paul Masurel	bd5eea9852	Integrated columnar work.	2023-02-09 13:14:31 +01:00

1 2 3 4

154 Commits