mirror of https://github.com/quickwit-oss/tantivy.git synced 2026-01-14 21:12:54 +00:00

Files

Tushar 0e04ec3136 feat(aggregators/metric): Add a top_hits aggregator (#2198 )

* feat(aggregators/metric): Implement a top_hits aggregator

* fix: Expose get_fields

* fix: Serializer for top_hits request

Also removes extraneous the extraneous third-party
serialization helper.

* chore: Avert panick on parsing invalid top_hits query

* refactor: Allow multiple field names from aggregations

* perf: Replace binary heap with TopNComputer

* fix: Avoid comparator inversion by ComparableDoc

* fix: Rank missing field values lower than present values

* refactor: Make KeyOrder a struct

* feat: Rough attempt at docvalue_fields

* feat: Complete stab at docvalue_fields

- Rename "SearchResult*" => "Retrieval*"
- Revert Vec => HashMap for aggregation accessors.
- Split accessors for core aggregation and field retrieval.
- Resolve globbed field names in docvalue_fields retrieval.
- Handle strings/bytes and other column types with DynamicColumn

* test(unit): Add tests for top_hits aggregator

* fix: docfield_value field globbing

* test(unit): Include dynamic fields

* fix: Value -> OwnedValue

* fix: Use OwnedValue's native Null variant

* chore: Improve readability of test asserts

* chore: Remove DocAddress from top_hits result

* docs: Update aggregator doc

* revert: accidental doc test

* chore: enable time macros only for tests

* chore: Apply suggestions from review

* chore: Apply suggestions from review

* fix: Retrieve all values for fields

* test(unit): Update for multi-value retrieval

* chore: Assert term existence

* feat: Include all columns for a column name

Since a (name, type) constitutes a unique column.

* fix: Resolve json fields

Introduces a translation step to bridge the difference between
ColumnarReaders null `\0` separated json field keys to the common
`.` separated used by SegmentReader. Although, this should probably
be the default behavior for ColumnarReader's public API perhaps.

* chore: Address review on mutability

* chore: s/segment_id/segment_ordinal instances of SegmentOrdinal

* chore: Revert erroneous grammar change

2024-01-26 16:46:41 +01:00

bucket

feat(aggregators/metric): Add a top_hits aggregator (#2198 )

2024-01-26 16:46:41 +01:00

metric

feat(aggregators/metric): Add a top_hits aggregator (#2198 )

2024-01-26 16:46:41 +01:00

agg_bench.rs

fix clippy (#2223 )

2023-10-24 10:05:22 +02:00

agg_limits.rs

Fixing functional tests. (#2239 )

2023-11-05 18:18:39 +09:00

agg_req_with_accessor.rs

feat(aggregators/metric): Add a top_hits aggregator (#2198 )

2024-01-26 16:46:41 +01:00

agg_req.rs

feat(aggregators/metric): Add a top_hits aggregator (#2198 )

2024-01-26 16:46:41 +01:00

agg_result.rs

feat(aggregators/metric): Add a top_hits aggregator (#2198 )

2024-01-26 16:46:41 +01:00

agg_tests.rs

support escaped dot, add agg test (#2250 )

2023-11-20 03:00:57 +01:00

buf_collector.rs

switch to Aggregation without serde_untagged (#2003 )

2023-04-25 08:54:51 +02:00

collector.rs

feat(aggregators/metric): Add a top_hits aggregator (#2198 )

2024-01-26 16:46:41 +01:00

date.rs

Inline format arguments where makes sense (#2038 )

2023-05-10 18:03:59 +09:00

error.rs

add percentiles aggregations (#1984 )

2023-04-07 07:18:28 +02:00

intermediate_agg_result.rs

feat(aggregators/metric): Add a top_hits aggregator (#2198 )

2024-01-26 16:46:41 +01:00

mod.rs

POC: Tantivy documents as a trait (#2071 )

2023-10-02 10:01:16 +02:00

README.md

Improve Docs Readability (#1380 )

2022-06-02 09:32:57 +09:00

segment_agg_result.rs

feat(aggregators/metric): Add a top_hits aggregator (#2198 )

2024-01-26 16:46:41 +01:00

README.md

Contributing

When adding new bucket aggregation make sure to extend the "test_aggregation_flushing" test for at least 2 levels.

Code Organization

Tantivy's aggregations have been designed to mimic the aggregations of elasticsearch.

The code is organized in submodules:

bucket

Contains all bucket aggregations, like range aggregation. These bucket aggregations group documents into buckets and can contain sub-aggregations.

metric

Contains all metric aggregations, like average aggregation. Metric aggregations do not have sub aggregations.

agg_req

agg_req contains the users aggregation request. Deserialization from json is compatible with elasticsearch aggregation requests.

agg_req_with_accessor

agg_req_with_accessor contains the users aggregation request enriched with fast field accessors etc, which are used during collection.

segment_agg_result

segment_agg_result contains the aggregation result tree, which is used for collection of a segment. The tree from agg_req_with_accessor is passed during collection.

intermediate_agg_result

intermediate_agg_result contains the aggregation tree for merging with other trees.

agg_result

agg_result contains the final aggregation tree.