tantivy

mirror of https://github.com/quickwit-oss/tantivy.git synced 2026-01-06 01:02:55 +00:00

Author	SHA1	Message	Date
PSeitz	92c32979d2	fix postcard compatibility for top_hits, add postcard test (#2346 ) * fix postcard compatibility for top_hits, add postcard test * fix top_hits naming, delay data fetch closes #2347 * fix import	2024-04-09 06:17:25 +02:00
PSeitz	b0e65560a1	handle ip adresses in term aggregation (#2319 ) * handle ip adresses in term aggregation Stores IpAdresses during the segment term aggregation via u64 representation and convert to u128(IpV6Adress) via downcast when converting to intermediate results. Enable Downcasting on `ColumnValues` Expose u64 variant for u128 encoded data via `open_u64_lenient` method. Remove lifetime in VecColumn, to avoid 'static lifetime requirement coming from downcast trait. * rename method	2024-03-14 09:41:18 +01:00
PSeitz	d57622d54b	support bool type in term aggregation (#2318 ) * support bool type in term aggregation * add Bool to Intermediate Key	2024-02-20 03:22:22 +01:00
Tushar	0e04ec3136	feat(aggregators/metric): Add a top_hits aggregator (#2198 ) * feat(aggregators/metric): Implement a top_hits aggregator * fix: Expose get_fields * fix: Serializer for top_hits request Also removes extraneous the extraneous third-party serialization helper. * chore: Avert panick on parsing invalid top_hits query * refactor: Allow multiple field names from aggregations * perf: Replace binary heap with TopNComputer * fix: Avoid comparator inversion by ComparableDoc * fix: Rank missing field values lower than present values * refactor: Make KeyOrder a struct * feat: Rough attempt at docvalue_fields * feat: Complete stab at docvalue_fields - Rename "SearchResult" => "Retrieval" - Revert Vec => HashMap for aggregation accessors. - Split accessors for core aggregation and field retrieval. - Resolve globbed field names in docvalue_fields retrieval. - Handle strings/bytes and other column types with DynamicColumn * test(unit): Add tests for top_hits aggregator * fix: docfield_value field globbing * test(unit): Include dynamic fields * fix: Value -> OwnedValue * fix: Use OwnedValue's native Null variant * chore: Improve readability of test asserts * chore: Remove DocAddress from top_hits result * docs: Update aggregator doc * revert: accidental doc test * chore: enable time macros only for tests * chore: Apply suggestions from review * chore: Apply suggestions from review * fix: Retrieve all values for fields * test(unit): Update for multi-value retrieval * chore: Assert term existence * feat: Include all columns for a column name Since a (name, type) constitutes a unique column. * fix: Resolve json fields Introduces a translation step to bridge the difference between ColumnarReaders null `\0` separated json field keys to the common `.` separated used by SegmentReader. Although, this should probably be the default behavior for ColumnarReader's public API perhaps. * chore: Address review on mutability * chore: s/segment_id/segment_ordinal instances of SegmentOrdinal * chore: Revert erroneous grammar change	2024-01-26 16:46:41 +01:00
PSeitz	34920d31f5	Fix DateHistogram bucket gap (#2183 ) * Fix DateHistogram bucket gap Fixes a computation issue of the number of buckets needed in the DateHistogram. This is due to a missing normalization from request values (ms) to fast field values (ns), when converting an intermediate result to the final result. This results in a wrong computation by a factor 1_000_000. The Histogram normalizes values to nanoseconds, to make the user input like extended_bounds (ms precision) and the values from the fast field (ns precision for date type) compatible. This normalization happens only for date type fields, as other field types don't have precision settings. The normalization does not happen due a missing `column_type`, which is not correctly passed after merging an empty aggregation (which does not have a `column_type` set), with a regular aggregation. Another related issue is an empty aggregation, which will not have `column_type` set, will not convert the result to human readable format. This PR fixes the issue by: - Limit the allowed field types of DateHistogram to DateType - Instead of passing the column_type, which is only available on the segment level, we flag the aggregation as `is_date_agg`. - Fix the merge logic Add a flag to to normalization only once. This is not an issue currently, but it could become easily one. closes https://github.com/quickwit-oss/quickwit/issues/3837 * use older nightly for time crate (breaks build)	2023-09-21 10:41:35 +02:00
PSeitz	b1d8b072db	add missing aggregation part 2 (#2149 ) * add missing aggregation part 2 Add missing support for: - Mixed types columns - Key of type string on numerical fields The special aggregation is slower than the integrated one in TermsAggregation and therefore not chosen by default, although it can cover all use cases. * simplify, add num_docs to empty	2023-08-31 07:55:33 +02:00
Adam Reichold	c805f08ca7	Fix a few more upcoming Clippy lints (#2133 )	2023-07-24 17:07:57 +09:00
PSeitz	6239697a02	switch to ms in histogram for date type (#2045 ) * switch to ms in histogram for date type switch to ms in histogram, by adding a normalization step that converts to nanoseconds precision when creating the collector. closes #2028 related to #2026 * add missing unit long variants * use single thread to avoid handling test case * fix docs * revert CI * cleanup * improve docs * Update src/aggregation/bucket/histogram/histogram.rs Co-authored-by: Paul Masurel <paul@quickwit.io> --------- Co-authored-by: Paul Masurel <paul@quickwit.io>	2023-05-19 08:15:44 +02:00
PSeitz	ba3a885a3b	handle multiple agg results (#2035 ) handle multiple intermediate aggregation results with the same name.	2023-05-10 15:00:38 +02:00
PSeitz	cbf2bdc75b	change bucket count type (#2013 ) * change bucket count type closes #2012 * Update src/aggregation/agg_limits.rs Co-authored-by: Paul Masurel <paul@quickwit.io> * Update src/directory/managed_directory.rs Co-authored-by: Paul Masurel <paul@quickwit.io> * fix test --------- Co-authored-by: Paul Masurel <paul@quickwit.io>	2023-04-27 15:47:31 +08:00
PSeitz	c599bf3b6c	chore!:drop JSON support on intermediate agg result (#1992 ) * chore!:drop JSON support on intermediate agg result add support for other formats by removing skip_serialize and untagged JSON support is broken anyway due it's lack on f64::INF etc. handling * Update src/aggregation/intermediate_agg_result.rs Co-authored-by: Paul Masurel <paul@quickwit.io> * move from impl --------- Co-authored-by: Paul Masurel <paul@quickwit.io>	2023-04-26 13:05:16 +02:00
PSeitz	2e369db936	switch to Aggregation without serde_untagged (#2003 ) * refactor result handling * remove Internal stuff * merge different accessors * switch to Aggregation without serde_untagged * fix doctests	2023-04-25 08:54:51 +02:00
PSeitz	e522163a1c	use json in agg tests (#1998 ) * switch to JSON in tests, add flat aggregation types * use method * clippy * remove commented file	2023-04-17 14:08:48 +02:00
PSeitz	41af70799d	add percentiles aggregations (#1984 ) * add percentiles aggregations add percentiles aggregation fix disabled agg benchmark * Update src/aggregation/metric/percentiles.rs Co-authored-by: Paul Masurel <paul@quickwit.io> * Apply suggestions from code review Co-authored-by: Paul Masurel <paul@quickwit.io> * fix import * fix import --------- Co-authored-by: Paul Masurel <paul@quickwit.io>	2023-04-07 07:18:28 +02:00
PSeitz	5c380b76e7	Better mixed types support in aggs and fix serialization issue (#1971 ) * Better mixed types support in aggs and fix serialization issue - Improve support for mixed types in JSON field aggregations (pick the right field, #1913) - Resolve the issue with JSON serialization for numeric keys (fixes #1967) - Add JSON round-trip test for term buckets - Remove `u64_lenient`, as this is a footgun without the type - move aggregation benchmarks * remove shadowing	2023-03-31 05:52:11 +02:00
PSeitz	d6a95381ee	add memory check for term agg (#1957 )	2023-03-24 06:47:45 +01:00
PSeitz	9e2faecf5b	add memory limit for aggregations (#1942 ) * add memory limit for aggregations introduce AggregationLimits to set memory consumption limit and bucket limits memory limit is checked during aggregation, bucket limit is checked before returning the aggregation request. * Apply suggestions from code review Co-authored-by: Paul Masurel <paul@quickwit.io> * add ByteCount with human readable format --------- Co-authored-by: Paul Masurel <paul@quickwit.io>	2023-03-16 06:21:07 +01:00
PSeitz	2fb3740cb0	handle missing column for aggs (#1920 ) * handle missing column for aggs add empty column fallback for missing column in aggs. Fix sort for term agg on sub-agg with missing value (null is smallest) * add error when field is not fast	2023-03-15 06:09:59 +01:00
PSeitz	ca20bfa776	add date_histogram (#1900 ) * add date_histogram * add return result	2023-03-02 05:17:35 +01:00
PSeitz	e510f699c8	feat: add support for u64,i64,f64 fields in term aggregation (#1883 ) * feat: add support for u64,i64,f64 fields in term aggregation * hash enum values * fix build * Apply suggestions from code review Co-authored-by: Paul Masurel <paul@quickwit.io> --------- Co-authored-by: Paul Masurel <paul@quickwit.io>	2023-02-27 15:04:41 +08:00
PSeitz	c7278b3258	remove schema in aggs (#1888 ) * switch to ColumnType, move tests * remove Schema dependency in agg	2023-02-22 04:50:28 +01:00
PSeitz	74bf60b4f7	implement SegmentAggregationCollector on bucket aggs (#1878 )	2023-02-17 12:53:29 +01:00
PSeitz	7a9befd18d	fix sort order test for term aggregation (#1858 ) fix sort order test for term aggregation fix invalid request test	2023-02-10 10:26:58 +01:00
Paul Masurel	bd5eea9852	Integrated columnar work.	2023-02-09 13:14:31 +01:00
PSeitz	f687b3a5aa	start migrate Field to &str (#1772 ) start migrate Field to &str in preparation of columnar return Result for get_field	2023-01-18 16:12:07 +09:00
Adrien Guillo	f2dad194ea	Add count, min, max, and sum aggregations	2023-01-16 12:22:20 -05:00
PSeitz	6ca9a477f3	reuse stats for average (#1785 ) * reuse stats for average * fix count type	2023-01-13 23:32:27 +08:00
PSeitz	f9171a3981	fix clippy (#1725 ) * fix clippy * fix clippy fastfield codecs * fix clippy bitpacker * fix clippy common * fix clippy stacker * fix clippy sstable * fmt	2022-12-20 07:30:06 +01:00
PSeitz	ee1f2c1f28	add aggregation support for date type (#1693 ) * add aggregation support for date type fixes #1332 * serialize key_as_string as rfc3339 in date histogram * update docs * enable date for range aggregation	2022-11-28 09:12:08 +09:00
Pascal Seitz	279b1b28d3	switch to fx hashmap	2022-10-27 16:19:59 +08:00
Adam Reichold	bbb058d976	Replace FNV by rustc-hash Both construction have similar goals but rustc-hash ist better suited for contemporary CPU as it works one word at a time instead of byte per byte.	2022-10-27 00:35:09 +02:00
Bruce Mitchener	cf02e32578	Improvements to doc linking, grammar, etc.	2022-09-19 18:10:22 +07:00
Kian-Meng Ang	625bcb4877	Fix typos and markdowns Found via these commands: codespell -L crate,ser,panting,beauti,hart,ue,atleast,childs,ond,pris,hel,mot markdownlint .md doc/src/.md --disable MD013 MD025 MD033 MD001 MD024 MD036 MD041 MD003	2022-08-13 18:25:47 +08:00
k-yomo	80a1418284	Use FnvHashMap for keyed bucket entries	2022-07-26 18:24:54 +09:00
k-yomo	5ab5f070ed	Fix to use bool directory for the keyed parameter	2022-07-26 18:18:38 +09:00
k-yomo	5b564916f0	Add support for keyed parameter in range and histgram aggregations	2022-07-26 04:28:21 +09:00
Evance Soumaoro	f26b686a1c	expose IntermediateAggregationResults->into_final_bucket_result	2022-07-21 11:19:23 +00:00
Evance Soumaoro	a4be239d38	Updated DateTime to hold timestamp in microseconds, while making date field precision configurable (#1396 )	2022-07-12 10:04:28 +09:00
Pascal Seitz	1bd44a5f61	use total_cmp	2022-07-04 12:48:23 +08:00
PSeitz	db1836691e	fix visibility (#1398 )	2022-06-28 16:21:39 +09:00
Pascal Seitz	44ea7313ca	set max bucket size as parameter	2022-05-13 13:21:52 +08:00
Pascal Seitz	3f88718f38	refactor aggregations	2022-05-12 12:26:43 +08:00
Pascal Seitz	d11a8cce26	minor docs fix	2022-05-06 17:52:36 +08:00
Pascal Seitz	c45eb9a9fa	improve readability, add json test	2022-04-26 11:22:34 +08:00
Pascal Seitz	1be6c6111c	support order property on term aggregations support order property on term aggregations order can be by doc_count, key, or a metric sub_aggregation	2022-04-20 00:34:38 +08:00
Pascal Seitz	dd13dedaeb	forward errors, remove unwrap	2022-04-13 19:51:18 +08:00
Pascal Seitz	46724b4a05	add segment_size, add get term dict fields, add tests	2022-04-13 19:51:18 +08:00
Pascal Seitz	24432bf523	add term aggregation	2022-04-13 19:51:18 +08:00
Pascal Seitz	f619658e2c	rename	2022-03-17 16:37:57 +08:00
Pascal Seitz	47dcbdbeae	handle empty results, empty indices, add tests	2022-03-17 10:24:34 +08:00

1 2

57 Commits