Property test for Comparator/ValueRange consistency, and fixes.

test_order_by_u64_prop
Use a Buffer generic scratch buffer parameter on TopNComputer and push directly from ColumnValues into a TopNComputer buffer in some cases.
2026-01-08 10:02:55 +00:00 · 2026-01-04 19:19:08 -08:00 · 2026-01-04 15:23:30 -08:00 · 2026-01-04 15:23:28 -08:00 · 2026-01-04 15:16:05 -08:00 · 2026-01-04 15:16:00 -08:00
205 changed files with 15670 additions and 4332 deletions
--- a/.github/workflows/coverage.yml
+++ b/.github/workflows/coverage.yml
@@ -15,11 +15,11 @@ jobs:
    steps:
      - uses: actions/checkout@v4
      - name: Install Rust
-        run: rustup toolchain install nightly-2024-07-01 --profile minimal --component llvm-tools-preview
+        run: rustup toolchain install nightly-2025-12-01 --profile minimal --component llvm-tools-preview
      - uses: Swatinem/rust-cache@v2
      - uses: taiki-e/install-action@cargo-llvm-cov
      - name: Generate code coverage
-        run: cargo +nightly-2024-07-01 llvm-cov --all-features --workspace --doctests --lcov --output-path lcov.info
+        run: cargo +nightly-2025-12-01 llvm-cov --all-features --workspace --doctests --lcov --output-path lcov.info
      - name: Upload coverage to Codecov
        uses: codecov/codecov-action@v3
        continue-on-error: true
--- a/.github/workflows/test.yml
+++ b/.github/workflows/test.yml
@@ -39,11 +39,11 @@ jobs:

    - name: Check Formatting
      run: cargo +nightly fmt --all -- --check
-    
+
    - name: Check Stable Compilation
      run: cargo build --all-features

-    
+
    - name: Check Bench Compilation
      run: cargo +nightly bench --no-run --profile=dev --all-features

@@ -59,10 +59,10 @@ jobs:

    strategy:
      matrix:
-        features: [
-            { label: "all", flags: "mmap,stopwords,lz4-compression,zstd-compression,failpoints" },
-            { label: "quickwit", flags: "mmap,quickwit,failpoints" }
-        ]
+        features:
+          - { label: "all", flags: "mmap,stopwords,lz4-compression,zstd-compression,failpoints,stemmer" }
+          - { label: "quickwit", flags: "mmap,quickwit,failpoints" }
+          - { label: "none", flags: "" }

    name: test-${{ matrix.features.label}}

@@ -80,7 +80,21 @@ jobs:
    - uses: Swatinem/rust-cache@v2

    - name: Run tests
-      run: cargo +stable nextest run --features ${{ matrix.features.flags }} --verbose --workspace
+      run: |
+        # if matrix.feature.flags is empty then run on --lib to avoid compiling examples
+        # (as most of them rely on mmap) otherwise run all
+        if [ -z "${{ matrix.features.flags }}" ]; then
+          cargo +stable nextest run --lib --no-default-features --verbose --workspace
+        else
+          cargo +stable nextest run --features ${{ matrix.features.flags }} --no-default-features --verbose --workspace
+        fi

    - name: Run doctests
-      run: cargo +stable test --doc --features ${{ matrix.features.flags }} --verbose --workspace
+      run: |
+        # if matrix.feature.flags is empty then run on --lib to avoid compiling examples
+        # (as most of them rely on mmap) otherwise run all
+        if [ -z "${{ matrix.features.flags }}" ]; then
+          echo "no doctest for no feature flag"
+        else
+          cargo +stable test --doc --features ${{ matrix.features.flags }} --verbose --workspace
+        fi
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -14,6 +14,18 @@ Tantivy 0.25
 - Support mixed field types in query parser [#2676](https://github.com/quickwit-oss/tantivy/pull/2676)(@trinity-1686a)
 - Add per-field size details [#2679](https://github.com/quickwit-oss/tantivy/pull/2679)(@fulmicoton)

+Tantivy 0.24.2
+================================
+- Fix TopNComputer for reverse order. [#2672](https://github.com/quickwit-oss/tantivy/pull/2672)(@stuhood @PSeitz) 
+
+Affected queries are [order_by_fast_field](https://docs.rs/tantivy/latest/tantivy/collector/struct.TopDocs.html#method.order_by_fast_field) and
+[order_by_u64_field](https://docs.rs/tantivy/latest/tantivy/collector/struct.TopDocs.html#method.order_by_u64_field)
+for `Order::Asc`
+
+Tantivy 0.24.1
+================================
+- Fix: bump required rust version to 1.81
+  
 Tantivy 0.24
 ================================
 Tantivy 0.24 will be backwards compatible with indices created with v0.22 and v0.21. The new minimum rust version will be 1.75. Tantivy 0.23 will be skipped.
@@ -66,7 +78,7 @@ This will slightly increase space and access time. [#2439](https://github.com/qu

 - **Store DateTime as nanoseconds in doc store** DateTime in the doc store was truncated to microseconds previously. This removes this truncation, while still keeping backwards compatibility. [#2486](https://github.com/quickwit-oss/tantivy/pull/2486)(@PSeitz)

- **Performace/Memory**
+- **Performance/Memory**
    - lift clauses in LogicalAst for optimized ast during execution [#2449](https://github.com/quickwit-oss/tantivy/pull/2449)(@PSeitz)
    - Use Vec instead of BTreeMap to back OwnedValue object [#2364](https://github.com/quickwit-oss/tantivy/pull/2364)(@fulmicoton)
    - Replace TantivyDocument with CompactDoc. CompactDoc is much smaller and provides similar performance. [#2402](https://github.com/quickwit-oss/tantivy/pull/2402)(@PSeitz)
@@ -96,6 +108,14 @@ This will slightly increase space and access time. [#2439](https://github.com/qu
 - Fix trait bound of StoreReader::iter [#2360](https://github.com/quickwit-oss/tantivy/pull/2360)(@adamreichold)
 - remove read_postings_no_deletes [#2526](https://github.com/quickwit-oss/tantivy/pull/2526)(@PSeitz)

+Tantivy 0.22.1
+================================
+- Fix TopNComputer for reverse order. [#2672](https://github.com/quickwit-oss/tantivy/pull/2672)(@stuhood @PSeitz) 
+
+Affected queries are [order_by_fast_field](https://docs.rs/tantivy/latest/tantivy/collector/struct.TopDocs.html#method.order_by_fast_field) and
+[order_by_u64_field](https://docs.rs/tantivy/latest/tantivy/collector/struct.TopDocs.html#method.order_by_u64_field)
+for `Order::Asc`
+
 Tantivy 0.22
 ================================

--- a/Cargo.toml
+++ b/Cargo.toml
@@ -1,6 +1,6 @@
 [package]
 name = "tantivy"
-version = "0.24.0"
+version = "0.26.0"
 authors = ["Paul Masurel <paul.masurel@gmail.com>"]
 license = "MIT"
 categories = ["database-implementations", "data-structures"]
@@ -37,7 +37,7 @@ fs4 = { version = "0.13.1", optional = true }
 levenshtein_automata = "0.2.1"
 uuid = { version = "1.0.0", features = ["v4", "serde"] }
 crossbeam-channel = "0.5.4"
-rust-stemmers = "1.2.0"
+rust-stemmers = { version = "1.2.0", optional = true }
 downcast-rs = "2.0.1"
 bitpacking = { version = "0.9.2", default-features = false, features = [
    "bitpacker4x",
@@ -57,29 +57,30 @@ measure_time = "0.9.0"
 arc-swap = "1.5.0"
 bon = "3.3.1"

-columnar = { version = "0.5", path = "./columnar", package = "tantivy-columnar" }
-sstable = { version = "0.5", path = "./sstable", package = "tantivy-sstable", optional = true }
-stacker = { version = "0.5", path = "./stacker", package = "tantivy-stacker" }
-query-grammar = { version = "0.24.0", path = "./query-grammar", package = "tantivy-query-grammar" }
-tantivy-bitpacker = { version = "0.8", path = "./bitpacker" }
-common = { version = "0.9", path = "./common/", package = "tantivy-common" }
-tokenizer-api = { version = "0.5", path = "./tokenizer-api", package = "tantivy-tokenizer-api" }
+columnar = { version = "0.6", path = "./columnar", package = "tantivy-columnar" }
+sstable = { version = "0.6", path = "./sstable", package = "tantivy-sstable", optional = true }
+stacker = { version = "0.6", path = "./stacker", package = "tantivy-stacker" }
+query-grammar = { version = "0.25.0", path = "./query-grammar", package = "tantivy-query-grammar" }
+tantivy-bitpacker = { version = "0.9", path = "./bitpacker" }
+common = { version = "0.10", path = "./common/", package = "tantivy-common" }
+tokenizer-api = { version = "0.6", path = "./tokenizer-api", package = "tantivy-tokenizer-api" }
 sketches-ddsketch = { version = "0.3.0", features = ["use_serde"] }
 hyperloglogplus = { version = "0.4.1", features = ["const-loop"] }
 futures-util = { version = "0.3.28", optional = true }
 futures-channel = { version = "0.3.28", optional = true }
 fnv = "1.0.7"
+typetag = "0.2.21"

 [target.'cfg(windows)'.dependencies]
 winapi = "0.3.9"

 [dev-dependencies]
-binggan = "0.14.0"
+binggan = "0.14.2"
 rand = "0.8.5"
 maplit = "1.0.2"
 matches = "0.1.9"
 pretty_assertions = "1.2.1"
-proptest = "1.0.0"
+proptest = "1.7.0"
 test-log = "0.2.10"
 futures = "0.3.21"
 paste = "1.0.11"
@@ -87,7 +88,7 @@ more-asserts = "0.3.1"
 rand_distr = "0.4.3"
 time = { version = "0.3.10", features = ["serde-well-known", "macros"] }
 postcard = { version = "1.0.4", features = [
-  "use-std",
+    "use-std",
 ], default-features = false }

 [target.'cfg(not(windows))'.dev-dependencies]
@@ -112,7 +113,8 @@ debug-assertions = true
 overflow-checks = true

 [features]
-default = ["mmap", "stopwords", "lz4-compression", "columnar-zstd-compression"]
+default = ["mmap", "stopwords", "lz4-compression", "columnar-zstd-compression", "stemmer"]
+stemmer = ["rust-stemmers"]
 mmap = ["fs4", "tempfile", "memmap2"]
 stopwords = []

@@ -167,3 +169,23 @@ harness = false
 [[bench]]
 name = "agg_bench"
 harness = false
+
+[[bench]]
+name = "exists_json"
+harness = false
+
+[[bench]]
+name = "range_query"
+harness = false
+
+[[bench]]
+name = "and_or_queries"
+harness = false
+
+[[bench]]
+name = "range_queries"
+harness = false
+
+[[bench]]
+name = "bool_queries_with_range"
+harness = false
--- a/README.md
+++ b/README.md
@@ -23,8 +23,6 @@ performance for different types of queries/collections.

 Your mileage WILL vary depending on the nature of queries and their load.

-<img src="doc/assets/images/searchbenchmark.png">
-
 Details about the benchmark can be found at this [repository](https://github.com/quickwit-oss/search-benchmark-game).

 ## Features
@@ -125,6 +123,7 @@ You can also find other bindings on [GitHub](https://github.com/search?q=tantivy
 - [seshat](https://github.com/matrix-org/seshat/): A matrix message database/indexer
 - [tantiny](https://github.com/baygeldin/tantiny): Tiny full-text search for Ruby
 - [lnx](https://github.com/lnx-search/lnx): adaptable, typo tolerant search engine with a REST API
+- [Bichon](https://github.com/rustmailer/bichon): A lightweight, high-performance Rust email archiver with WebUI
 - and [more](https://github.com/search?q=tantivy)!

 ### On average, how much faster is Tantivy compared to Lucene?
--- a/RELEASE.md
+++ b/RELEASE.md
@@ -1,4 +1,4 @@
-# Release a new Tantivy Version
+# Releasing a new Tantivy Version

 ## Steps

@@ -10,12 +10,29 @@
 6. Set git tag with new version


-In conjucation with `cargo-release` Steps 1-4 (I'm not sure if the change detection works):
-Set new packages to version 0.0.0
+[`cargo-release`](https://github.com/crate-ci/cargo-release) will help us with steps 1-5:

 Replace prev-tag-name
 ```bash
-cargo release --workspace --no-publish -v --prev-tag-name 0.19 --push-remote origin minor --no-tag --execute
+cargo release --workspace --no-publish -v --prev-tag-name 0.24 --push-remote origin minor --no-tag
 ```

-no-tag or it will create tags for all the subpackages
+`no-tag` or it will create tags for all the subpackages
+
+cargo release will _not_ ignore unchanged packages, but it will print warnings for them.
+e.g. "warning: updating ownedbytes to 0.10.0 despite no changes made since tag 0.24"
+
+We need to manually ignore these unchanged packages
+```bash
+cargo release --workspace --no-publish -v --prev-tag-name 0.24 --push-remote origin minor --no-tag --exclude tokenizer-api
+```
+
+Add `--execute` to actually publish the packages, otherwise it will only print the commands that would be run.
+
+### Tag Version
+```bash
+git tag 0.25.0
+git push upstream tag 0.25.0
+```
+
+
--- a/TODO.txt
+++ b/TODO.txt
@@ -10,7 +10,7 @@ rename FastFieldReaders::open to load
 remove fast field reader

 find a way to unify the two DateTime.
-readd type check in the filter wrapper
+re-add type check in the filter wrapper

 add unit test on columnar list columns.

--- a/benches/agg_bench.rs
+++ b/benches/agg_bench.rs
@@ -1,5 +1,6 @@
 use binggan::plugins::PeakMemAllocPlugin;
 use binggan::{black_box, InputGroup, PeakMemAlloc, INSTRUMENTED_SYSTEM};
+use rand::distributions::WeightedIndex;
 use rand::prelude::SliceRandom;
 use rand::rngs::StdRng;
 use rand::{Rng, SeedableRng};
@@ -54,11 +55,19 @@ fn bench_agg(mut group: InputGroup<Index>) {
    register!(group, extendedstats_f64);
    register!(group, percentiles_f64);
    register!(group, terms_few);
+    register!(group, terms_all_unique);
    register!(group, terms_many);
    register!(group, terms_many_top_1000);
    register!(group, terms_many_order_by_term);
    register!(group, terms_many_with_top_hits);
+    register!(group, terms_all_unique_with_avg_sub_agg);
    register!(group, terms_many_with_avg_sub_agg);
+    register!(group, terms_few_with_avg_sub_agg);
+    register!(group, terms_status_with_avg_sub_agg);
+    register!(group, terms_status);
+    register!(group, terms_few_with_histogram);
+    register!(group, terms_status_with_histogram);
+
    register!(group, terms_many_json_mixed_type_with_avg_sub_agg);

    register!(group, cardinality_agg);
@@ -71,8 +80,15 @@ fn bench_agg(mut group: InputGroup<Index>) {
    register!(group, histogram);
    register!(group, histogram_hard_bounds);
    register!(group, histogram_with_avg_sub_agg);
+    register!(group, histogram_with_term_agg_few);
    register!(group, avg_and_range_with_avg_sub_agg);

+    // Filter aggregation benchmarks
+    register!(group, filter_agg_all_query_count_agg);
+    register!(group, filter_agg_term_query_count_agg);
+    register!(group, filter_agg_all_query_with_sub_aggs);
+    register!(group, filter_agg_term_query_with_sub_aggs);
+
    group.run();
 }

@@ -123,12 +139,12 @@ fn extendedstats_f64(index: &Index) {
 }
 fn percentiles_f64(index: &Index) {
    let agg_req = json!({
-      "mypercentiles": {
-        "percentiles": {
-          "field": "score_f64",
-          "percents": [ 95, 99, 99.9 ]
+        "mypercentiles": {
+            "percentiles": {
+                "field": "score_f64",
+                "percents": [ 95, 99, 99.9 ]
+            }
        }
-      }
    });
    execute_agg(index, agg_req);
 }
@@ -165,6 +181,19 @@ fn terms_few(index: &Index) {
    });
    execute_agg(index, agg_req);
 }
+fn terms_status(index: &Index) {
+    let agg_req = json!({
+        "my_texts": { "terms": { "field": "text_few_terms_status" } },
+    });
+    execute_agg(index, agg_req);
+}
+fn terms_all_unique(index: &Index) {
+    let agg_req = json!({
+        "my_texts": { "terms": { "field": "text_all_unique_terms" } },
+    });
+    execute_agg(index, agg_req);
+}
+
 fn terms_many(index: &Index) {
    let agg_req = json!({
        "my_texts": { "terms": { "field": "text_many_terms" } },
@@ -213,6 +242,63 @@ fn terms_many_with_avg_sub_agg(index: &Index) {
    });
    execute_agg(index, agg_req);
 }
+fn terms_all_unique_with_avg_sub_agg(index: &Index) {
+    let agg_req = json!({
+        "my_texts": {
+            "terms": { "field": "text_all_unique_terms" },
+            "aggs": {
+                "average_f64": { "avg": { "field": "score_f64" } }
+            }
+        },
+    });
+    execute_agg(index, agg_req);
+}
+fn terms_few_with_histogram(index: &Index) {
+    let agg_req = json!({
+        "my_texts": {
+            "terms": { "field": "text_few_terms" },
+            "aggs": {
+                "histo": {"histogram": { "field": "score_f64", "interval": 10 }}
+            }
+        }
+    });
+    execute_agg(index, agg_req);
+}
+fn terms_status_with_histogram(index: &Index) {
+    let agg_req = json!({
+        "my_texts": {
+            "terms": { "field": "text_few_terms_status" },
+            "aggs": {
+                "histo": {"histogram": { "field": "score_f64", "interval": 10 }}
+            }
+        }
+    });
+    execute_agg(index, agg_req);
+}
+
+fn terms_few_with_avg_sub_agg(index: &Index) {
+    let agg_req = json!({
+        "my_texts": {
+            "terms": { "field": "text_few_terms" },
+            "aggs": {
+                "average_f64": { "avg": { "field": "score_f64" } }
+            }
+        },
+    });
+    execute_agg(index, agg_req);
+}
+fn terms_status_with_avg_sub_agg(index: &Index) {
+    let agg_req = json!({
+        "my_texts": {
+            "terms": { "field": "text_few_terms_status" },
+            "aggs": {
+                "average_f64": { "avg": { "field": "score_f64" } }
+            }
+        },
+    });
+    execute_agg(index, agg_req);
+}
+
 fn terms_many_json_mixed_type_with_avg_sub_agg(index: &Index) {
    let agg_req = json!({
        "my_texts": {
@@ -339,6 +425,17 @@ fn histogram_with_avg_sub_agg(index: &Index) {
    });
    execute_agg(index, agg_req);
 }
+fn histogram_with_term_agg_few(index: &Index) {
+    let agg_req = json!({
+        "rangef64": {
+            "histogram": { "field": "score_f64", "interval": 10 },
+            "aggs": {
+                "my_texts": { "terms": { "field": "text_few_terms" } }
+            }
+        }
+    });
+    execute_agg(index, agg_req);
+}
 fn avg_and_range_with_avg_sub_agg(index: &Index) {
    let agg_req = json!({
        "rangef64": {
@@ -386,14 +483,21 @@ fn get_test_index_bench(cardinality: Cardinality) -> tantivy::Result<Index> {
        .set_stored();
    let text_field = schema_builder.add_text_field("text", text_fieldtype);
    let json_field = schema_builder.add_json_field("json", FAST);
+    let text_field_all_unique_terms =
+        schema_builder.add_text_field("text_all_unique_terms", STRING | FAST);
+    let text_field_many_terms = schema_builder.add_text_field("text_many_terms", STRING | FAST);
    let text_field_many_terms = schema_builder.add_text_field("text_many_terms", STRING | FAST);
    let text_field_few_terms = schema_builder.add_text_field("text_few_terms", STRING | FAST);
+    let text_field_few_terms_status =
+        schema_builder.add_text_field("text_few_terms_status", STRING | FAST);
    let score_fieldtype = tantivy::schema::NumericOptions::default().set_fast();
    let score_field = schema_builder.add_u64_field("score", score_fieldtype.clone());
    let score_field_f64 = schema_builder.add_f64_field("score_f64", score_fieldtype.clone());
    let score_field_i64 = schema_builder.add_i64_field("score_i64", score_fieldtype);
    let index = Index::create_from_tempdir(schema_builder.build())?;
    let few_terms_data = ["INFO", "ERROR", "WARN", "DEBUG"];
+    // Approximate production log proportions: INFO dominant, WARN and DEBUG occasional, ERROR rare.
+    let log_level_distribution = WeightedIndex::new([80u32, 3, 12, 5]).unwrap();

    let lg_norm = rand_distr::LogNormal::new(2.996f64, 0.979f64).unwrap();

@@ -409,15 +513,21 @@ fn get_test_index_bench(cardinality: Cardinality) -> tantivy::Result<Index> {
            index_writer.add_document(doc!())?;
        }
        if cardinality == Cardinality::Multivalued {
+            let log_level_sample_a = few_terms_data[log_level_distribution.sample(&mut rng)];
+            let log_level_sample_b = few_terms_data[log_level_distribution.sample(&mut rng)];
            index_writer.add_document(doc!(
                json_field => json!({"mixed_type": 10.0}),
                json_field => json!({"mixed_type": 10.0}),
                text_field => "cool",
                text_field => "cool",
+                text_field_all_unique_terms => "cool",
+                text_field_all_unique_terms => "coolo",
                text_field_many_terms => "cool",
                text_field_many_terms => "cool",
                text_field_few_terms => "cool",
                text_field_few_terms => "cool",
+                text_field_few_terms_status => log_level_sample_a,
+                text_field_few_terms_status => log_level_sample_b,
                score_field => 1u64,
                score_field => 1u64,
                score_field_f64 => lg_norm.sample(&mut rng),
@@ -442,8 +552,10 @@ fn get_test_index_bench(cardinality: Cardinality) -> tantivy::Result<Index> {
            index_writer.add_document(doc!(
                text_field => "cool",
                json_field => json,
+                text_field_all_unique_terms => format!("unique_term_{}", rng.gen::<u64>()),
                text_field_many_terms => many_terms_data.choose(&mut rng).unwrap().to_string(),
                text_field_few_terms => few_terms_data.choose(&mut rng).unwrap().to_string(),
+                text_field_few_terms_status => few_terms_data[log_level_distribution.sample(&mut rng)],
                score_field => val as u64,
                score_field_f64 => lg_norm.sample(&mut rng),
                score_field_i64 => val as i64,
@@ -460,3 +572,61 @@ fn get_test_index_bench(cardinality: Cardinality) -> tantivy::Result<Index> {

    Ok(index)
 }
+
+// Filter aggregation benchmarks
+
+fn filter_agg_all_query_count_agg(index: &Index) {
+    let agg_req = json!({
+        "filtered": {
+            "filter": "*",
+            "aggs": {
+                "count": { "value_count": { "field": "score" } }
+            }
+        }
+    });
+    execute_agg(index, agg_req);
+}
+
+fn filter_agg_term_query_count_agg(index: &Index) {
+    let agg_req = json!({
+        "filtered": {
+            "filter": "text:cool",
+            "aggs": {
+                "count": { "value_count": { "field": "score" } }
+            }
+        }
+    });
+    execute_agg(index, agg_req);
+}
+
+fn filter_agg_all_query_with_sub_aggs(index: &Index) {
+    let agg_req = json!({
+        "filtered": {
+            "filter": "*",
+            "aggs": {
+                "avg_score": { "avg": { "field": "score" } },
+                "stats_score": { "stats": { "field": "score_f64" } },
+                "terms_text": {
+                    "terms": { "field": "text_few_terms" }
+                }
+            }
+        }
+    });
+    execute_agg(index, agg_req);
+}
+
+fn filter_agg_term_query_with_sub_aggs(index: &Index) {
+    let agg_req = json!({
+        "filtered": {
+            "filter": "text:cool",
+            "aggs": {
+                "avg_score": { "avg": { "field": "score" } },
+                "stats_score": { "stats": { "field": "score_f64" } },
+                "terms_text": {
+                    "terms": { "field": "text_few_terms" }
+                }
+            }
+        }
+    });
+    execute_agg(index, agg_req);
+}
--- a/benches/and_or_queries.rs
+++ b/benches/and_or_queries.rs
@@ -0,0 +1,218 @@
+// Benchmarks boolean conjunction queries using binggan.
+//
+// What’s measured:
+// - Or and And queries with varying selectivity (only `Term` queries for now on leafs)
+// - Nested AND/OR combinations (on multiple fields)
+// - No-scoring path using the Count collector (focus on iterator/skip performance)
+// - Top-K retrieval (k=10) using the TopDocs collector
+//
+// Corpus model:
+// - Synthetic docs; each token a/b/c is independently included per doc
+// - If none of a/b/c are included, emit a neutral filler token to keep doc length similar
+//
+// Notes:
+// - After optimization, when scoring is disabled Tantivy reads doc-only postings
+//   (IndexRecordOption::Basic), avoiding frequency decoding overhead.
+// - This bench isolates boolean iteration speed and intersection/union cost.
+// - Use `cargo bench --bench boolean_conjunction` to run.
+
+use binggan::{black_box, BenchGroup, BenchRunner};
+use rand::prelude::*;
+use rand::rngs::StdRng;
+use rand::SeedableRng;
+use tantivy::collector::sort_key::SortByStaticFastValue;
+use tantivy::collector::{Collector, Count, TopDocs};
+use tantivy::query::{Query, QueryParser};
+use tantivy::schema::{Schema, FAST, TEXT};
+use tantivy::{doc, Index, Order, ReloadPolicy, Searcher};
+
+#[derive(Clone)]
+struct BenchIndex {
+    #[allow(dead_code)]
+    index: Index,
+    searcher: Searcher,
+    query_parser: QueryParser,
+}
+
+/// Build a single index containing both fields (title, body) and
+/// return two BenchIndex views:
+/// - single_field: QueryParser defaults to only "body"
+/// - multi_field:  QueryParser defaults to ["title", "body"]
+fn build_shared_indices(num_docs: usize, p_a: f32, p_b: f32, p_c: f32) -> (BenchIndex, BenchIndex) {
+    // Unified schema (two text fields)
+    let mut schema_builder = Schema::builder();
+    let f_title = schema_builder.add_text_field("title", TEXT);
+    let f_body = schema_builder.add_text_field("body", TEXT);
+    let f_score = schema_builder.add_u64_field("score", FAST);
+    let f_score2 = schema_builder.add_u64_field("score2", FAST);
+    let schema = schema_builder.build();
+    let index = Index::create_in_ram(schema.clone());
+
+    // Populate index with stable RNG for reproducibility.
+    let mut rng = StdRng::from_seed([7u8; 32]);
+
+    // Populate: spread each present token 90/10 to body/title
+    {
+        let mut writer = index.writer_with_num_threads(1, 500_000_000).unwrap();
+        for _ in 0..num_docs {
+            let has_a = rng.gen_bool(p_a as f64);
+            let has_b = rng.gen_bool(p_b as f64);
+            let has_c = rng.gen_bool(p_c as f64);
+            let score = rng.gen_range(0u64..100u64);
+            let score2 = rng.gen_range(0u64..100_000u64);
+            let mut title_tokens: Vec<&str> = Vec::new();
+            let mut body_tokens: Vec<&str> = Vec::new();
+            if has_a {
+                if rng.gen_bool(0.1) {
+                    title_tokens.push("a");
+                } else {
+                    body_tokens.push("a");
+                }
+            }
+            if has_b {
+                if rng.gen_bool(0.1) {
+                    title_tokens.push("b");
+                } else {
+                    body_tokens.push("b");
+                }
+            }
+            if has_c {
+                if rng.gen_bool(0.1) {
+                    title_tokens.push("c");
+                } else {
+                    body_tokens.push("c");
+                }
+            }
+            if title_tokens.is_empty() && body_tokens.is_empty() {
+                body_tokens.push("z");
+            }
+            writer
+                .add_document(doc!(
+                    f_title=>title_tokens.join(" "),
+                    f_body=>body_tokens.join(" "),
+                    f_score=>score,
+                    f_score2=>score2,
+                ))
+                .unwrap();
+        }
+        writer.commit().unwrap();
+    }
+
+    // Prepare reader/searcher once.
+    let reader = index
+        .reader_builder()
+        .reload_policy(ReloadPolicy::Manual)
+        .try_into()
+        .unwrap();
+    let searcher = reader.searcher();
+
+    // Build two query parsers with different default fields.
+    let qp_single = QueryParser::for_index(&index, vec![f_body]);
+    let qp_multi = QueryParser::for_index(&index, vec![f_title, f_body]);
+
+    let single_view = BenchIndex {
+        index: index.clone(),
+        searcher: searcher.clone(),
+        query_parser: qp_single,
+    };
+    let multi_view = BenchIndex {
+        index,
+        searcher,
+        query_parser: qp_multi,
+    };
+    (single_view, multi_view)
+}
+
+fn main() {
+    // Prepare corpora with varying selectivity. Build one index per corpus
+    // and derive two views (single-field vs multi-field) from it.
+    let scenarios = vec![
+        (
+            "N=1M, p(a)=5%, p(b)=1%, p(c)=15%".to_string(),
+            1_000_000,
+            0.05,
+            0.01,
+            0.15,
+        ),
+        (
+            "N=1M, p(a)=1%, p(b)=1%, p(c)=15%".to_string(),
+            1_000_000,
+            0.01,
+            0.01,
+            0.15,
+        ),
+    ];
+
+    let queries = &["a", "+a +b", "+a +b +c", "a OR b", "a OR b OR c"];
+
+    let mut runner = BenchRunner::new();
+    for (label, n, pa, pb, pc) in scenarios {
+        let (single_view, multi_view) = build_shared_indices(n, pa, pb, pc);
+
+        for (view_name, bench_index) in [("single_field", single_view), ("multi_field", multi_view)]
+        {
+            // Single-field group: default field is body only
+            let mut group = runner.new_group();
+            group.set_name(format!("{} — {}", view_name, label));
+            for query_str in queries {
+                add_bench_task(&mut group, &bench_index, query_str, Count, "count");
+                add_bench_task(
+                    &mut group,
+                    &bench_index,
+                    query_str,
+                    TopDocs::with_limit(10).order_by_score(),
+                    "top10",
+                );
+                add_bench_task(
+                    &mut group,
+                    &bench_index,
+                    query_str,
+                    TopDocs::with_limit(10).order_by_fast_field::<u64>("score", Order::Asc),
+                    "top10_by_ff",
+                );
+                add_bench_task(
+                    &mut group,
+                    &bench_index,
+                    query_str,
+                    TopDocs::with_limit(10).order_by((
+                        SortByStaticFastValue::<u64>::for_field("score"),
+                        SortByStaticFastValue::<u64>::for_field("score2"),
+                    )),
+                    "top10_by_2ff",
+                );
+            }
+            group.run();
+        }
+    }
+}
+
+fn add_bench_task<C: Collector + 'static>(
+    bench_group: &mut BenchGroup,
+    bench_index: &BenchIndex,
+    query_str: &str,
+    collector: C,
+    collector_name: &str,
+) {
+    let task_name = format!("{}_{}", query_str.replace(" ", "_"), collector_name);
+    let query = bench_index.query_parser.parse_query(query_str).unwrap();
+    let search_task = SearchTask {
+        searcher: bench_index.searcher.clone(),
+        collector,
+        query,
+    };
+    bench_group.register(task_name, move |_| black_box(search_task.run()));
+}
+
+struct SearchTask<C: Collector> {
+    searcher: Searcher,
+    collector: C,
+    query: Box<dyn Query>,
+}
+
+impl<C: Collector> SearchTask<C> {
+    #[inline(never)]
+    pub fn run(&self) -> usize {
+        self.searcher.search(&self.query, &self.collector).unwrap();
+        1
+    }
+}
--- a/benches/bool_queries_with_range.rs
+++ b/benches/bool_queries_with_range.rs
@@ -0,0 +1,288 @@
+use binggan::{black_box, BenchGroup, BenchRunner};
+use rand::prelude::*;
+use rand::rngs::StdRng;
+use rand::SeedableRng;
+use tantivy::collector::{Collector, Count, DocSetCollector, TopDocs};
+use tantivy::query::{Query, QueryParser};
+use tantivy::schema::{Schema, FAST, INDEXED, TEXT};
+use tantivy::{doc, Index, Order, ReloadPolicy, Searcher};
+
+#[derive(Clone)]
+struct BenchIndex {
+    #[allow(dead_code)]
+    index: Index,
+    searcher: Searcher,
+    query_parser: QueryParser,
+}
+
+fn build_shared_indices(num_docs: usize, p_title_a: f32, distribution: &str) -> BenchIndex {
+    // Unified schema
+    let mut schema_builder = Schema::builder();
+    let f_title = schema_builder.add_text_field("title", TEXT);
+    let f_num_rand = schema_builder.add_u64_field("num_rand", INDEXED);
+    let f_num_asc = schema_builder.add_u64_field("num_asc", INDEXED);
+    let f_num_rand_fast = schema_builder.add_u64_field("num_rand_fast", INDEXED | FAST);
+    let f_num_asc_fast = schema_builder.add_u64_field("num_asc_fast", INDEXED | FAST);
+    let schema = schema_builder.build();
+    let index = Index::create_in_ram(schema.clone());
+
+    // Populate index with stable RNG for reproducibility.
+    let mut rng = StdRng::from_seed([7u8; 32]);
+
+    {
+        let mut writer = index.writer_with_num_threads(1, 4_000_000_000).unwrap();
+
+        match distribution {
+            "dense" => {
+                for doc_id in 0..num_docs {
+                    // Always add title to avoid empty documents
+                    let title_token = if rng.gen_bool(p_title_a as f64) {
+                        "a"
+                    } else {
+                        "b"
+                    };
+
+                    let num_rand = rng.gen_range(0u64..1000u64);
+
+                    let num_asc = (doc_id / 10000) as u64;
+
+                    writer
+                        .add_document(doc!(
+                            f_title=>title_token,
+                            f_num_rand=>num_rand,
+                            f_num_asc=>num_asc,
+                            f_num_rand_fast=>num_rand,
+                            f_num_asc_fast=>num_asc,
+                        ))
+                        .unwrap();
+                }
+            }
+            "sparse" => {
+                for doc_id in 0..num_docs {
+                    // Always add title to avoid empty documents
+                    let title_token = if rng.gen_bool(p_title_a as f64) {
+                        "a"
+                    } else {
+                        "b"
+                    };
+
+                    let num_rand = rng.gen_range(0u64..10000000u64);
+
+                    let num_asc = doc_id as u64;
+
+                    writer
+                        .add_document(doc!(
+                            f_title=>title_token,
+                            f_num_rand=>num_rand,
+                            f_num_asc=>num_asc,
+                            f_num_rand_fast=>num_rand,
+                            f_num_asc_fast=>num_asc,
+                        ))
+                        .unwrap();
+                }
+            }
+            _ => {
+                panic!("Unsupported distribution type");
+            }
+        }
+        writer.commit().unwrap();
+    }
+
+    // Prepare reader/searcher once.
+    let reader = index
+        .reader_builder()
+        .reload_policy(ReloadPolicy::Manual)
+        .try_into()
+        .unwrap();
+    let searcher = reader.searcher();
+
+    // Build query parser for title field
+    let qp_title = QueryParser::for_index(&index, vec![f_title]);
+
+    BenchIndex {
+        index,
+        searcher,
+        query_parser: qp_title,
+    }
+}
+
+fn main() {
+    // Prepare corpora with varying scenarios
+    let scenarios = vec![
+        (
+            "dense and 99% a".to_string(),
+            10_000_000,
+            0.99,
+            "dense",
+            0,
+            9,
+        ),
+        (
+            "dense and 99% a".to_string(),
+            10_000_000,
+            0.99,
+            "dense",
+            990,
+            999,
+        ),
+        (
+            "sparse and 99% a".to_string(),
+            10_000_000,
+            0.99,
+            "sparse",
+            0,
+            9,
+        ),
+        (
+            "sparse and 99% a".to_string(),
+            10_000_000,
+            0.99,
+            "sparse",
+            9_999_990,
+            9_999_999,
+        ),
+    ];
+
+    let mut runner = BenchRunner::new();
+    for (scenario_id, n, p_title_a, num_rand_distribution, range_low, range_high) in scenarios {
+        // Build index for this scenario
+        let bench_index = build_shared_indices(n, p_title_a, num_rand_distribution);
+
+        // Create benchmark group
+        let mut group = runner.new_group();
+
+        // Now set the name (this moves scenario_id)
+        group.set_name(scenario_id);
+
+        // Define all four field types
+        let field_names = ["num_rand", "num_asc", "num_rand_fast", "num_asc_fast"];
+
+        // Define the three terms we want to test with
+        let terms = ["a", "b", "z"];
+
+        // Generate all combinations of terms and field names
+        let mut queries = Vec::new();
+        for &term in &terms {
+            for &field_name in &field_names {
+                let query_str = format!(
+                    "{} AND {}:[{} TO {}]",
+                    term, field_name, range_low, range_high
+                );
+                queries.push((query_str, field_name.to_string()));
+            }
+        }
+
+        let query_str = format!(
+            "{}:[{} TO {}] AND {}:[{} TO {}]",
+            "num_rand_fast", range_low, range_high, "num_asc_fast", range_low, range_high
+        );
+        queries.push((query_str, "num_asc_fast".to_string()));
+
+        // Run all benchmark tasks for each query and its corresponding field name
+        for (query_str, field_name) in queries {
+            run_benchmark_tasks(&mut group, &bench_index, &query_str, &field_name);
+        }
+
+        group.run();
+    }
+}
+
+/// Run all benchmark tasks for a given query string and field name
+fn run_benchmark_tasks(
+    bench_group: &mut BenchGroup,
+    bench_index: &BenchIndex,
+    query_str: &str,
+    field_name: &str,
+) {
+    // Test count
+    add_bench_task(bench_group, bench_index, query_str, Count, "count");
+
+    // Test all results
+    add_bench_task(
+        bench_group,
+        bench_index,
+        query_str,
+        DocSetCollector,
+        "all results",
+    );
+
+    // Test top 100 by the field (if it's a FAST field)
+    if field_name.ends_with("_fast") {
+        // Ascending order
+        {
+            let collector_name = format!("top100_by_{}_asc", field_name);
+            let field_name_owned = field_name.to_string();
+            add_bench_task(
+                bench_group,
+                bench_index,
+                query_str,
+                TopDocs::with_limit(100).order_by_fast_field::<u64>(field_name_owned, Order::Asc),
+                &collector_name,
+            );
+        }
+
+        // Descending order
+        {
+            let collector_name = format!("top100_by_{}_desc", field_name);
+            let field_name_owned = field_name.to_string();
+            add_bench_task(
+                bench_group,
+                bench_index,
+                query_str,
+                TopDocs::with_limit(100).order_by_fast_field::<u64>(field_name_owned, Order::Desc),
+                &collector_name,
+            );
+        }
+    }
+}
+
+fn add_bench_task<C: Collector + 'static>(
+    bench_group: &mut BenchGroup,
+    bench_index: &BenchIndex,
+    query_str: &str,
+    collector: C,
+    collector_name: &str,
+) {
+    let task_name = format!("{}_{}", query_str.replace(" ", "_"), collector_name);
+    let query = bench_index.query_parser.parse_query(query_str).unwrap();
+    let search_task = SearchTask {
+        searcher: bench_index.searcher.clone(),
+        collector,
+        query,
+    };
+    bench_group.register(task_name, move |_| black_box(search_task.run()));
+}
+
+struct SearchTask<C: Collector> {
+    searcher: Searcher,
+    collector: C,
+    query: Box<dyn Query>,
+}
+
+impl<C: Collector> SearchTask<C> {
+    #[inline(never)]
+    pub fn run(&self) -> usize {
+        let result = self.searcher.search(&self.query, &self.collector).unwrap();
+        if let Some(count) = (&result as &dyn std::any::Any).downcast_ref::<usize>() {
+            *count
+        } else if let Some(top_docs) = (&result as &dyn std::any::Any)
+            .downcast_ref::<Vec<(Option<u64>, tantivy::DocAddress)>>()
+        {
+            top_docs.len()
+        } else if let Some(top_docs) =
+            (&result as &dyn std::any::Any).downcast_ref::<Vec<(u64, tantivy::DocAddress)>>()
+        {
+            top_docs.len()
+        } else if let Some(doc_set) = (&result as &dyn std::any::Any)
+            .downcast_ref::<std::collections::HashSet<tantivy::DocAddress>>()
+        {
+            doc_set.len()
+        } else {
+            eprintln!(
+                "Unknown collector result type: {:?}",
+                std::any::type_name::<C::Fruit>()
+            );
+            0
+        }
+    }
+}
--- a/benches/exists_json.rs
+++ b/benches/exists_json.rs
@@ -0,0 +1,69 @@
+use binggan::plugins::PeakMemAllocPlugin;
+use binggan::{black_box, InputGroup, PeakMemAlloc, INSTRUMENTED_SYSTEM};
+use serde_json::json;
+use tantivy::collector::Count;
+use tantivy::query::ExistsQuery;
+use tantivy::schema::{Schema, FAST, TEXT};
+use tantivy::{doc, Index};
+
+#[global_allocator]
+pub static GLOBAL: &PeakMemAlloc<std::alloc::System> = &INSTRUMENTED_SYSTEM;
+
+fn main() {
+    let doc_count: usize = 500_000;
+    let subfield_counts: &[usize] = &[1, 2, 3, 4, 5, 6, 7, 8, 16, 256, 4096, 65536, 262144];
+
+    let indices: Vec<(String, Index)> = subfield_counts
+        .iter()
+        .map(|&sub_fields| {
+            (
+                format!("subfields={sub_fields}"),
+                build_index_with_json_subfields(doc_count, sub_fields),
+            )
+        })
+        .collect();
+
+    let mut group = InputGroup::new_with_inputs(indices);
+    group.add_plugin(PeakMemAllocPlugin::new(GLOBAL));
+
+    group.config().num_iter_group = Some(1);
+    group.config().num_iter_bench = Some(1);
+    group.register("exists_json", exists_json_union);
+
+    group.run();
+}
+
+fn exists_json_union(index: &Index) {
+    let reader = index.reader().expect("reader");
+    let searcher = reader.searcher();
+    let query = ExistsQuery::new("json".to_string(), true);
+    let count = searcher.search(&query, &Count).expect("exists search");
+    // Prevents optimizer from eliding the search
+    black_box(count);
+}
+
+fn build_index_with_json_subfields(num_docs: usize, num_subfields: usize) -> Index {
+    // Schema: single JSON field stored as FAST to support ExistsQuery.
+    let mut schema_builder = Schema::builder();
+    let json_field = schema_builder.add_json_field("json", TEXT | FAST);
+    let schema = schema_builder.build();
+
+    let index = Index::create_from_tempdir(schema).expect("create index");
+    {
+        let mut index_writer = index
+            .writer_with_num_threads(1, 200_000_000)
+            .expect("writer");
+        for i in 0..num_docs {
+            let sub = i % num_subfields;
+            // Only one subpath set per document; rotate subpaths so that
+            // no single subpath is full, but the union covers all docs.
+            let v = json!({ format!("field_{sub}"): i as u64 });
+            index_writer
+                .add_document(doc!(json_field => v))
+                .expect("add_document");
+        }
+        index_writer.commit().expect("commit");
+    }
+
+    index
+}
--- a/benches/range_queries.rs
+++ b/benches/range_queries.rs
@@ -0,0 +1,365 @@
+use std::ops::Bound;
+
+use binggan::{black_box, BenchGroup, BenchRunner};
+use rand::prelude::*;
+use rand::rngs::StdRng;
+use rand::SeedableRng;
+use tantivy::collector::{Count, DocSetCollector, TopDocs};
+use tantivy::query::RangeQuery;
+use tantivy::schema::{Schema, FAST, INDEXED};
+use tantivy::{doc, Index, Order, ReloadPolicy, Searcher, Term};
+
+#[derive(Clone)]
+struct BenchIndex {
+    #[allow(dead_code)]
+    index: Index,
+    searcher: Searcher,
+}
+
+fn build_shared_indices(num_docs: usize, distribution: &str) -> BenchIndex {
+    // Schema with fast fields only
+    let mut schema_builder = Schema::builder();
+    let f_num_rand_fast = schema_builder.add_u64_field("num_rand_fast", INDEXED | FAST);
+    let f_num_asc_fast = schema_builder.add_u64_field("num_asc_fast", INDEXED | FAST);
+    let schema = schema_builder.build();
+    let index = Index::create_in_ram(schema.clone());
+
+    // Populate index with stable RNG for reproducibility.
+    let mut rng = StdRng::from_seed([7u8; 32]);
+
+    {
+        let mut writer = index.writer_with_num_threads(1, 4_000_000_000).unwrap();
+
+        match distribution {
+            "dense" => {
+                for doc_id in 0..num_docs {
+                    let num_rand = rng.gen_range(0u64..1000u64);
+                    let num_asc = (doc_id / 10000) as u64;
+
+                    writer
+                        .add_document(doc!(
+                            f_num_rand_fast=>num_rand,
+                            f_num_asc_fast=>num_asc,
+                        ))
+                        .unwrap();
+                }
+            }
+            "sparse" => {
+                for doc_id in 0..num_docs {
+                    let num_rand = rng.gen_range(0u64..10000000u64);
+                    let num_asc = doc_id as u64;
+
+                    writer
+                        .add_document(doc!(
+                            f_num_rand_fast=>num_rand,
+                            f_num_asc_fast=>num_asc,
+                        ))
+                        .unwrap();
+                }
+            }
+            _ => {
+                panic!("Unsupported distribution type");
+            }
+        }
+        writer.commit().unwrap();
+    }
+
+    // Prepare reader/searcher once.
+    let reader = index
+        .reader_builder()
+        .reload_policy(ReloadPolicy::Manual)
+        .try_into()
+        .unwrap();
+    let searcher = reader.searcher();
+
+    BenchIndex { index, searcher }
+}
+
+fn main() {
+    // Prepare corpora with varying scenarios
+    let scenarios = vec![
+        // Dense distribution - random values in small range (0-999)
+        (
+            "dense_values_search_low_value_range".to_string(),
+            10_000_000,
+            "dense",
+            0,
+            9,
+        ),
+        (
+            "dense_values_search_high_value_range".to_string(),
+            10_000_000,
+            "dense",
+            990,
+            999,
+        ),
+        (
+            "dense_values_search_out_of_range".to_string(),
+            10_000_000,
+            "dense",
+            1000,
+            1002,
+        ),
+        (
+            "sparse_values_search_low_value_range".to_string(),
+            10_000_000,
+            "sparse",
+            0,
+            9,
+        ),
+        (
+            "sparse_values_search_high_value_range".to_string(),
+            10_000_000,
+            "sparse",
+            9_999_990,
+            9_999_999,
+        ),
+        (
+            "sparse_values_search_out_of_range".to_string(),
+            10_000_000,
+            "sparse",
+            10_000_000,
+            10_000_002,
+        ),
+    ];
+
+    let mut runner = BenchRunner::new();
+    for (scenario_id, n, num_rand_distribution, range_low, range_high) in scenarios {
+        // Build index for this scenario
+        let bench_index = build_shared_indices(n, num_rand_distribution);
+
+        // Create benchmark group
+        let mut group = runner.new_group();
+
+        // Now set the name (this moves scenario_id)
+        group.set_name(scenario_id);
+
+        // Define fast field types
+        let field_names = ["num_rand_fast", "num_asc_fast"];
+
+        // Generate range queries for fast fields
+        for &field_name in &field_names {
+            // Create the range query
+            let field = bench_index.searcher.schema().get_field(field_name).unwrap();
+            let lower_term = Term::from_field_u64(field, range_low);
+            let upper_term = Term::from_field_u64(field, range_high);
+
+            let query = RangeQuery::new(Bound::Included(lower_term), Bound::Included(upper_term));
+
+            run_benchmark_tasks(
+                &mut group,
+                &bench_index,
+                query,
+                field_name,
+                range_low,
+                range_high,
+            );
+        }
+
+        group.run();
+    }
+}
+
+/// Run all benchmark tasks for a given range query and field name
+fn run_benchmark_tasks(
+    bench_group: &mut BenchGroup,
+    bench_index: &BenchIndex,
+    query: RangeQuery,
+    field_name: &str,
+    range_low: u64,
+    range_high: u64,
+) {
+    // Test count
+    add_bench_task_count(
+        bench_group,
+        bench_index,
+        query.clone(),
+        "count",
+        field_name,
+        range_low,
+        range_high,
+    );
+
+    // Test top 100 by the field (ascending order)
+    {
+        let collector_name = format!("top100_by_{}_asc", field_name);
+        let field_name_owned = field_name.to_string();
+        add_bench_task_top100_asc(
+            bench_group,
+            bench_index,
+            query.clone(),
+            &collector_name,
+            field_name,
+            range_low,
+            range_high,
+            field_name_owned,
+        );
+    }
+
+    // Test top 100 by the field (descending order)
+    {
+        let collector_name = format!("top100_by_{}_desc", field_name);
+        let field_name_owned = field_name.to_string();
+        add_bench_task_top100_desc(
+            bench_group,
+            bench_index,
+            query,
+            &collector_name,
+            field_name,
+            range_low,
+            range_high,
+            field_name_owned,
+        );
+    }
+}
+
+fn add_bench_task_count(
+    bench_group: &mut BenchGroup,
+    bench_index: &BenchIndex,
+    query: RangeQuery,
+    collector_name: &str,
+    field_name: &str,
+    range_low: u64,
+    range_high: u64,
+) {
+    let task_name = format!(
+        "range_{}_[{} TO {}]_{}",
+        field_name, range_low, range_high, collector_name
+    );
+
+    let search_task = CountSearchTask {
+        searcher: bench_index.searcher.clone(),
+        query,
+    };
+    bench_group.register(task_name, move |_| black_box(search_task.run()));
+}
+
+fn add_bench_task_docset(
+    bench_group: &mut BenchGroup,
+    bench_index: &BenchIndex,
+    query: RangeQuery,
+    collector_name: &str,
+    field_name: &str,
+    range_low: u64,
+    range_high: u64,
+) {
+    let task_name = format!(
+        "range_{}_[{} TO {}]_{}",
+        field_name, range_low, range_high, collector_name
+    );
+
+    let search_task = DocSetSearchTask {
+        searcher: bench_index.searcher.clone(),
+        query,
+    };
+    bench_group.register(task_name, move |_| black_box(search_task.run()));
+}
+
+fn add_bench_task_top100_asc(
+    bench_group: &mut BenchGroup,
+    bench_index: &BenchIndex,
+    query: RangeQuery,
+    collector_name: &str,
+    field_name: &str,
+    range_low: u64,
+    range_high: u64,
+    field_name_owned: String,
+) {
+    let task_name = format!(
+        "range_{}_[{} TO {}]_{}",
+        field_name, range_low, range_high, collector_name
+    );
+
+    let search_task = Top100AscSearchTask {
+        searcher: bench_index.searcher.clone(),
+        query,
+        field_name: field_name_owned,
+    };
+    bench_group.register(task_name, move |_| black_box(search_task.run()));
+}
+
+fn add_bench_task_top100_desc(
+    bench_group: &mut BenchGroup,
+    bench_index: &BenchIndex,
+    query: RangeQuery,
+    collector_name: &str,
+    field_name: &str,
+    range_low: u64,
+    range_high: u64,
+    field_name_owned: String,
+) {
+    let task_name = format!(
+        "range_{}_[{} TO {}]_{}",
+        field_name, range_low, range_high, collector_name
+    );
+
+    let search_task = Top100DescSearchTask {
+        searcher: bench_index.searcher.clone(),
+        query,
+        field_name: field_name_owned,
+    };
+    bench_group.register(task_name, move |_| black_box(search_task.run()));
+}
+
+struct CountSearchTask {
+    searcher: Searcher,
+    query: RangeQuery,
+}
+
+impl CountSearchTask {
+    #[inline(never)]
+    pub fn run(&self) -> usize {
+        self.searcher.search(&self.query, &Count).unwrap()
+    }
+}
+
+struct DocSetSearchTask {
+    searcher: Searcher,
+    query: RangeQuery,
+}
+
+impl DocSetSearchTask {
+    #[inline(never)]
+    pub fn run(&self) -> usize {
+        let result = self.searcher.search(&self.query, &DocSetCollector).unwrap();
+        result.len()
+    }
+}
+
+struct Top100AscSearchTask {
+    searcher: Searcher,
+    query: RangeQuery,
+    field_name: String,
+}
+
+impl Top100AscSearchTask {
+    #[inline(never)]
+    pub fn run(&self) -> usize {
+        let collector =
+            TopDocs::with_limit(100).order_by_fast_field::<u64>(&self.field_name, Order::Asc);
+        let result = self.searcher.search(&self.query, &collector).unwrap();
+        for (_score, doc_address) in &result {
+            let _doc: tantivy::TantivyDocument = self.searcher.doc(*doc_address).unwrap();
+        }
+        result.len()
+    }
+}
+
+struct Top100DescSearchTask {
+    searcher: Searcher,
+    query: RangeQuery,
+    field_name: String,
+}
+
+impl Top100DescSearchTask {
+    #[inline(never)]
+    pub fn run(&self) -> usize {
+        let collector =
+            TopDocs::with_limit(100).order_by_fast_field::<u64>(&self.field_name, Order::Desc);
+        let result = self.searcher.search(&self.query, &collector).unwrap();
+        for (_score, doc_address) in &result {
+            let _doc: tantivy::TantivyDocument = self.searcher.doc(*doc_address).unwrap();
+        }
+        result.len()
+    }
+}
--- a/benches/range_query.rs
+++ b/benches/range_query.rs
@@ -0,0 +1,260 @@
+use std::fmt::Display;
+use std::net::Ipv6Addr;
+use std::ops::RangeInclusive;
+
+use binggan::plugins::PeakMemAllocPlugin;
+use binggan::{black_box, BenchRunner, OutputValue, PeakMemAlloc, INSTRUMENTED_SYSTEM};
+use columnar::MonotonicallyMappableToU128;
+use rand::rngs::StdRng;
+use rand::{Rng, SeedableRng};
+use tantivy::collector::{Count, TopDocs};
+use tantivy::query::QueryParser;
+use tantivy::schema::*;
+use tantivy::{doc, Index};
+
+#[global_allocator]
+pub static GLOBAL: &PeakMemAlloc<std::alloc::System> = &INSTRUMENTED_SYSTEM;
+
+fn main() {
+    bench_range_query();
+}
+
+fn bench_range_query() {
+    let index = get_index_0_to_100();
+    let mut runner = BenchRunner::new();
+    runner.add_plugin(PeakMemAllocPlugin::new(GLOBAL));
+
+    runner.set_name("range_query on u64");
+    let field_name_and_descr: Vec<_> = vec![
+        ("id", "Single Valued Range Field"),
+        ("ids", "Multi Valued Range Field"),
+    ];
+    let range_num_hits = vec![
+        ("90_percent", get_90_percent()),
+        ("10_percent", get_10_percent()),
+        ("1_percent", get_1_percent()),
+    ];
+
+    test_range(&mut runner, &index, &field_name_and_descr, range_num_hits);
+
+    runner.set_name("range_query on ip");
+    let field_name_and_descr: Vec<_> = vec![
+        ("ip", "Single Valued Range Field"),
+        ("ips", "Multi Valued Range Field"),
+    ];
+    let range_num_hits = vec![
+        ("90_percent", get_90_percent_ip()),
+        ("10_percent", get_10_percent_ip()),
+        ("1_percent", get_1_percent_ip()),
+    ];
+
+    test_range(&mut runner, &index, &field_name_and_descr, range_num_hits);
+}
+
+fn test_range<T: Display>(
+    runner: &mut BenchRunner,
+    index: &Index,
+    field_name_and_descr: &[(&str, &str)],
+    range_num_hits: Vec<(&str, RangeInclusive<T>)>,
+) {
+    for (field, suffix) in field_name_and_descr {
+        let term_num_hits = vec![
+            ("", ""),
+            ("1_percent", "veryfew"),
+            ("10_percent", "few"),
+            ("90_percent", "most"),
+        ];
+        let mut group = runner.new_group();
+        group.set_name(suffix);
+        // all intersect combinations
+        for (range_name, range) in &range_num_hits {
+            for (term_name, term) in &term_num_hits {
+                let index = &index;
+                let test_name = if term_name.is_empty() {
+                    format!("id_range_hit_{}", range_name)
+                } else {
+                    format!(
+                        "id_range_hit_{}_intersect_with_term_{}",
+                        range_name, term_name
+                    )
+                };
+                group.register(test_name, move |_| {
+                    let query = if term_name.is_empty() {
+                        "".to_string()
+                    } else {
+                        format!("AND id_name:{}", term)
+                    };
+                    black_box(execute_query(field, range, &query, index));
+                });
+            }
+        }
+        group.run();
+    }
+}
+
+fn get_index_0_to_100() -> Index {
+    let mut rng = StdRng::from_seed([1u8; 32]);
+    let num_vals = 100_000;
+    let docs: Vec<_> = (0..num_vals)
+        .map(|_i| {
+            let id_name = if rng.gen_bool(0.01) {
+                "veryfew".to_string() // 1%
+            } else if rng.gen_bool(0.1) {
+                "few".to_string() // 9%
+            } else {
+                "most".to_string() // 90%
+            };
+            Doc {
+                id_name,
+                id: rng.gen_range(0..100),
+                // Multiply by 1000, so that we create most buckets in the compact space
+                // The benches depend on this range to select n-percent of elements with the
+                // methods below.
+                ip: Ipv6Addr::from_u128(rng.gen_range(0..100) * 1000),
+            }
+        })
+        .collect();
+
+    create_index_from_docs(&docs)
+}
+
+#[derive(Clone, Debug)]
+pub struct Doc {
+    pub id_name: String,
+    pub id: u64,
+    pub ip: Ipv6Addr,
+}
+
+pub fn create_index_from_docs(docs: &[Doc]) -> Index {
+    let mut schema_builder = Schema::builder();
+    let id_u64_field = schema_builder.add_u64_field("id", INDEXED | STORED | FAST);
+    let ids_u64_field =
+        schema_builder.add_u64_field("ids", NumericOptions::default().set_fast().set_indexed());
+
+    let id_f64_field = schema_builder.add_f64_field("id_f64", INDEXED | STORED | FAST);
+    let ids_f64_field = schema_builder.add_f64_field(
+        "ids_f64",
+        NumericOptions::default().set_fast().set_indexed(),
+    );
+
+    let id_i64_field = schema_builder.add_i64_field("id_i64", INDEXED | STORED | FAST);
+    let ids_i64_field = schema_builder.add_i64_field(
+        "ids_i64",
+        NumericOptions::default().set_fast().set_indexed(),
+    );
+
+    let text_field = schema_builder.add_text_field("id_name", STRING | STORED);
+    let text_field2 = schema_builder.add_text_field("id_name_fast", STRING | STORED | FAST);
+
+    let ip_field = schema_builder.add_ip_addr_field("ip", FAST);
+    let ips_field = schema_builder.add_ip_addr_field("ips", FAST);
+
+    let schema = schema_builder.build();
+
+    let index = Index::create_in_ram(schema);
+
+    {
+        let mut index_writer = index.writer_with_num_threads(1, 50_000_000).unwrap();
+        for doc in docs.iter() {
+            index_writer
+                .add_document(doc!(
+                    ids_i64_field => doc.id as i64,
+                    ids_i64_field => doc.id as i64,
+                    ids_f64_field => doc.id as f64,
+                    ids_f64_field => doc.id as f64,
+                    ids_u64_field => doc.id,
+                    ids_u64_field => doc.id,
+                    id_u64_field => doc.id,
+                    id_f64_field => doc.id as f64,
+                    id_i64_field => doc.id as i64,
+                    text_field => doc.id_name.to_string(),
+                    text_field2 => doc.id_name.to_string(),
+                    ips_field => doc.ip,
+                    ips_field => doc.ip,
+                    ip_field => doc.ip,
+                ))
+                .unwrap();
+        }
+
+        index_writer.commit().unwrap();
+    }
+    index
+}
+
+fn get_90_percent() -> RangeInclusive<u64> {
+    0..=90
+}
+
+fn get_10_percent() -> RangeInclusive<u64> {
+    0..=10
+}
+
+fn get_1_percent() -> RangeInclusive<u64> {
+    10..=10
+}
+
+fn get_90_percent_ip() -> RangeInclusive<Ipv6Addr> {
+    let start = Ipv6Addr::from_u128(0);
+    let end = Ipv6Addr::from_u128(90 * 1000);
+    start..=end
+}
+
+fn get_10_percent_ip() -> RangeInclusive<Ipv6Addr> {
+    let start = Ipv6Addr::from_u128(0);
+    let end = Ipv6Addr::from_u128(10 * 1000);
+    start..=end
+}
+
+fn get_1_percent_ip() -> RangeInclusive<Ipv6Addr> {
+    let start = Ipv6Addr::from_u128(10 * 1000);
+    let end = Ipv6Addr::from_u128(10 * 1000);
+    start..=end
+}
+
+struct NumHits {
+    count: usize,
+}
+impl OutputValue for NumHits {
+    fn column_title() -> &'static str {
+        "NumHits"
+    }
+    fn format(&self) -> Option<String> {
+        Some(self.count.to_string())
+    }
+}
+
+fn execute_query<T: Display>(
+    field: &str,
+    id_range: &RangeInclusive<T>,
+    suffix: &str,
+    index: &Index,
+) -> NumHits {
+    let gen_query_inclusive = |from: &T, to: &T| {
+        format!(
+            "{}:[{} TO {}] {}",
+            field,
+            &from.to_string(),
+            &to.to_string(),
+            suffix
+        )
+    };
+
+    let query = gen_query_inclusive(id_range.start(), id_range.end());
+    execute_query_(&query, index)
+}
+
+fn execute_query_(query: &str, index: &Index) -> NumHits {
+    let query_from_text = |text: &str| {
+        QueryParser::for_index(index, vec![])
+            .parse_query(text)
+            .unwrap()
+    };
+    let query = query_from_text(query);
+    let reader = index.reader().unwrap();
+    let searcher = reader.searcher();
+    let num_hits = searcher
+        .search(&query, &(TopDocs::with_limit(10).order_by_score(), Count))
+        .unwrap()
+        .1;
+    NumHits { count: num_hits }
+}
--- a/bitpacker/Cargo.toml
+++ b/bitpacker/Cargo.toml
@@ -1,6 +1,6 @@
 [package]
 name = "tantivy-bitpacker"
-version = "0.8.0"
+version = "0.9.0"
 edition = "2024"
 authors = ["Paul Masurel <paul.masurel@gmail.com>"]
 license = "MIT"
--- a/bitpacker/src/bitpacker.rs
+++ b/bitpacker/src/bitpacker.rs
@@ -48,7 +48,7 @@ impl BitPacker {

    pub fn flush<TWrite: io::Write + ?Sized>(&mut self, output: &mut TWrite) -> io::Result<()> {
        if self.mini_buffer_written > 0 {
-            let num_bytes = (self.mini_buffer_written + 7) / 8;
+            let num_bytes = self.mini_buffer_written.div_ceil(8);
            let bytes = self.mini_buffer.to_le_bytes();
            output.write_all(&bytes[..num_bytes])?;
            self.mini_buffer_written = 0;
@@ -138,7 +138,7 @@ impl BitUnpacker {

        // We use `usize` here to avoid overflow issues.
        let end_bit_read = (end_idx as usize) * self.num_bits;
-        let end_byte_read = (end_bit_read + 7) / 8;
+        let end_byte_read = end_bit_read.div_ceil(8);
        assert!(
            end_byte_read <= data.len(),
            "Requested index is out of bounds."
@@ -258,7 +258,7 @@ mod test {
            bitpacker.write(val, num_bits, &mut data).unwrap();
        }
        bitpacker.close(&mut data).unwrap();
-        assert_eq!(data.len(), ((num_bits as usize) * len + 7) / 8);
+        assert_eq!(data.len(), ((num_bits as usize) * len).div_ceil(8));
        let bitunpacker = BitUnpacker::new(num_bits);
        (bitunpacker, vals, data)
    }
@@ -304,7 +304,7 @@ mod test {
            bitpacker.write(val, num_bits, &mut buffer).unwrap();
        }
        bitpacker.flush(&mut buffer).unwrap();
-        assert_eq!(buffer.len(), (vals.len() * num_bits as usize + 7) / 8);
+        assert_eq!(buffer.len(), (vals.len() * num_bits as usize).div_ceil(8));
        let bitunpacker = BitUnpacker::new(num_bits);
        let max_val = if num_bits == 64 {
            u64::MAX
--- a/bitpacker/src/blocked_bitpacker.rs
+++ b/bitpacker/src/blocked_bitpacker.rs
@@ -140,10 +140,10 @@ impl BlockedBitpacker {
    pub fn iter(&self) -> impl Iterator<Item = u64> + '_ {
        // todo performance: we could decompress a whole block and cache it instead
        let bitpacked_elems = self.offset_and_bits.len() * BLOCK_SIZE;
-        let iter = (0..bitpacked_elems)
+
+        (0..bitpacked_elems)
            .map(move |idx| self.get(idx))
-            .chain(self.buffer.iter().cloned());
-        iter
+            .chain(self.buffer.iter().cloned())
    }
 }

--- a/bitpacker/src/filter_vec/avx2.rs
+++ b/bitpacker/src/filter_vec/avx2.rs
@@ -19,7 +19,7 @@ fn u32_to_i32(val: u32) -> i32 {
 #[inline]
 unsafe fn u32_to_i32_avx2(vals_u32x8s: DataType) -> DataType {
    const HIGHEST_BIT_MASK: DataType = from_u32x8([HIGHEST_BIT; NUM_LANES]);
-    op_xor(vals_u32x8s, HIGHEST_BIT_MASK)
+    unsafe { op_xor(vals_u32x8s, HIGHEST_BIT_MASK) }
 }

 pub fn filter_vec_in_place(range: RangeInclusive<u32>, offset: u32, output: &mut Vec<u32>) {
@@ -66,17 +66,19 @@ unsafe fn filter_vec_avx2_aux(
    ]);
    const SHIFT: __m256i = from_u32x8([NUM_LANES as u32; NUM_LANES]);
    for _ in 0..num_words {
-        let word = load_unaligned(input);
-        let word = u32_to_i32_avx2(word);
-        let keeper_bitset = compute_filter_bitset(word, range_simd.clone());
-        let added_len = keeper_bitset.count_ones();
-        let filtered_doc_ids = compact(ids, keeper_bitset);
-        store_unaligned(output_tail as *mut __m256i, filtered_doc_ids);
-        output_tail = output_tail.offset(added_len as isize);
-        ids = op_add(ids, SHIFT);
-        input = input.offset(1);
+        unsafe {
+            let word = load_unaligned(input);
+            let word = u32_to_i32_avx2(word);
+            let keeper_bitset = compute_filter_bitset(word, range_simd.clone());
+            let added_len = keeper_bitset.count_ones();
+            let filtered_doc_ids = compact(ids, keeper_bitset);
+            store_unaligned(output_tail as *mut __m256i, filtered_doc_ids);
+            output_tail = output_tail.offset(added_len as isize);
+            ids = op_add(ids, SHIFT);
+            input = input.offset(1);
+        }
    }
-    output_tail.offset_from(output) as usize
+    unsafe { output_tail.offset_from(output) as usize }
 }

 #[inline]
@@ -92,8 +94,7 @@ unsafe fn compute_filter_bitset(val: __m256i, range: std::ops::RangeInclusive<__
    let too_low = op_greater(*range.start(), val);
    let too_high = op_greater(val, *range.end());
    let inside = op_or(too_low, too_high);
-    255 - std::arch::x86_64::_mm256_movemask_ps(std::mem::transmute::<DataType, __m256>(inside))
-        as u8
+    255 - std::arch::x86_64::_mm256_movemask_ps(_mm256_castsi256_ps(inside)) as u8
 }

 union U8x32 {
--- a/columnar/Cargo.toml
+++ b/columnar/Cargo.toml
@@ -1,6 +1,6 @@
 [package]
 name = "tantivy-columnar"
-version = "0.5.0"
+version = "0.6.0"
 edition = "2024"
 license = "MIT"
 homepage = "https://github.com/quickwit-oss/tantivy"
@@ -12,11 +12,11 @@ categories = ["database-implementations", "data-structures", "compression"]
 itertools = "0.14.0"
 fastdivide = "0.4.0"

-stacker = { version= "0.5", path = "../stacker", package="tantivy-stacker"}
-sstable = { version= "0.5", path = "../sstable", package = "tantivy-sstable" }
-common = { version= "0.9", path = "../common", package = "tantivy-common" }
-tantivy-bitpacker = { version= "0.8", path = "../bitpacker/" }
-serde = "1.0.152"
+stacker = { version= "0.6", path = "../stacker", package="tantivy-stacker"}
+sstable = { version= "0.6", path = "../sstable", package = "tantivy-sstable" }
+common = { version= "0.10", path = "../common", package = "tantivy-common" }
+tantivy-bitpacker = { version= "0.9", path = "../bitpacker/" }
+serde = { version = "1.0.152", features = ["derive"] }
 downcast-rs = "2.0.1"

 [dev-dependencies]
--- a/columnar/README.md
+++ b/columnar/README.md
@@ -73,7 +73,7 @@ The crate introduces the following concepts.
 `Columnar` is an equivalent of a dataframe.
 It maps `column_key` to `Column`.

-A `Column<T>` asssociates a `RowId` (u32) to any
+A `Column<T>` associates a `RowId` (u32) to any
 number of values.

 This is made possible by wrapping a `ColumnIndex` and a `ColumnValue` object.
--- a/columnar/benches/bench_access.rs
+++ b/columnar/benches/bench_access.rs
@@ -1,6 +1,6 @@
 use binggan::{InputGroup, black_box};
 use common::*;
-use tantivy_columnar::Column;
+use tantivy_columnar::{Column, ValueRange};

 pub mod common;

@@ -46,16 +46,16 @@ fn bench_group(mut runner: InputGroup<Column>) {
    runner.register("access_first_vals", |column| {
        let mut sum = 0;
        const BLOCK_SIZE: usize = 32;
-        let mut docs = vec![0; BLOCK_SIZE];
-        let mut buffer = vec![None; BLOCK_SIZE];
+        let mut docs = Vec::with_capacity(BLOCK_SIZE);
+        let mut buffer = Vec::with_capacity(BLOCK_SIZE);
        for i in (0..NUM_DOCS).step_by(BLOCK_SIZE) {
-            // fill docs
-            #[allow(clippy::needless_range_loop)]
+            docs.clear();
            for idx in 0..BLOCK_SIZE {
-                docs[idx] = idx as u32 + i;
+                docs.push(idx as u32 + i);
            }

-            column.first_vals(&docs, &mut buffer);
+            buffer.clear();
+            column.first_vals_in_value_range(&mut docs, &mut buffer, ValueRange::All);
            for val in buffer.iter() {
                let Some(val) = val else { continue };
                sum += *val;
--- a/columnar/benches/bench_first_vals.rs
+++ b/columnar/benches/bench_first_vals.rs
@@ -89,13 +89,6 @@ fn main() {
        black_box(sum);
    });

-    group.register("first_block_fetch", |column| {
-        let mut block: Vec<Option<u64>> = vec![None; 64];
-        let fetch_docids = (0..64).collect::<Vec<_>>();
-        column.first_vals(&fetch_docids, &mut block);
-        black_box(block[0]);
-    });
-
    group.register("first_block_single_calls", |column| {
        let mut block: Vec<Option<u64>> = vec![None; 64];
        let fetch_docids = (0..64).collect::<Vec<_>>();
--- a/columnar/src/column/mod.rs
+++ b/columnar/src/column/mod.rs
@@ -1,6 +1,7 @@
 mod dictionary_encoded;
 mod serialize;

+use std::cell::RefCell;
 use std::fmt::{self, Debug};
 use std::io::Write;
 use std::ops::{Range, RangeInclusive};
@@ -19,6 +20,11 @@ use crate::column_values::monotonic_mapping::StrictlyMonotonicMappingToInternal;
 use crate::column_values::{ColumnValues, monotonic_map_column};
 use crate::{Cardinality, DocId, EmptyColumnValues, MonotonicallyMappableToU64, RowId};

+thread_local! {
+    static ROWS: RefCell<Vec<RowId>> = const { RefCell::new(Vec::new()) };
+    static DOCS: RefCell<Vec<DocId>> = const { RefCell::new(Vec::new()) };
+}
+
 #[derive(Clone)]
 pub struct Column<T = u64> {
    pub index: ColumnIndex,
@@ -89,31 +95,6 @@ impl<T: PartialOrd + Copy + Debug + Send + Sync + 'static> Column<T> {
        self.values_for_doc(row_id).next()
    }

-    /// Load the first value for each docid in the provided slice.
-    #[inline]
-    pub fn first_vals(&self, docids: &[DocId], output: &mut [Option<T>]) {
-        match &self.index {
-            ColumnIndex::Empty { .. } => {}
-            ColumnIndex::Full => self.values.get_vals_opt(docids, output),
-            ColumnIndex::Optional(optional_index) => {
-                for (i, docid) in docids.iter().enumerate() {
-                    output[i] = optional_index
-                        .rank_if_exists(*docid)
-                        .map(|rowid| self.values.get_val(rowid));
-                }
-            }
-            ColumnIndex::Multivalued(multivalued_index) => {
-                for (i, docid) in docids.iter().enumerate() {
-                    let range = multivalued_index.range(*docid);
-                    let is_empty = range.start == range.end;
-                    if !is_empty {
-                        output[i] = Some(self.values.get_val(range.start));
-                    }
-                }
-            }
-        }
-    }
-
    /// Translates a block of docids to row_ids.
    ///
    /// returns the row_ids and the matching docids on the same index
@@ -131,6 +112,8 @@ impl<T: PartialOrd + Copy + Debug + Send + Sync + 'static> Column<T> {
        self.index.docids_to_rowids(doc_ids, doc_ids_out, row_ids)
    }

+    /// Get an iterator over the values for the provided docid.
+    #[inline]
    pub fn values_for_doc(&self, doc_id: DocId) -> impl Iterator<Item = T> + '_ {
        self.index
            .value_row_ids(doc_id)
@@ -141,7 +124,7 @@ impl<T: PartialOrd + Copy + Debug + Send + Sync + 'static> Column<T> {
    #[inline]
    pub fn get_docids_for_value_range(
        &self,
-        value_range: RangeInclusive<T>,
+        value_range: ValueRange<T>,
        selected_docid_range: Range<u32>,
        doc_ids: &mut Vec<u32>,
    ) {
@@ -158,15 +141,6 @@ impl<T: PartialOrd + Copy + Debug + Send + Sync + 'static> Column<T> {
            .select_batch_in_place(selected_docid_range.start, doc_ids);
    }

-    /// Fills the output vector with the (possibly multiple values that are associated_with
-    /// `row_id`.
-    ///
-    /// This method clears the `output` vector.
-    pub fn fill_vals(&self, row_id: RowId, output: &mut Vec<T>) {
-        output.clear();
-        output.extend(self.values_for_doc(row_id));
-    }
-
    pub fn first_or_default_col(self, default_value: T) -> Arc<dyn ColumnValues<T>> {
        Arc::new(FirstValueWithDefault {
            column: self,
@@ -175,6 +149,194 @@ impl<T: PartialOrd + Copy + Debug + Send + Sync + 'static> Column<T> {
    }
 }

+// Separate impl block for methods requiring `Default` for `T`.
+impl<T: PartialOrd + Copy + Debug + Send + Sync + 'static + Default> Column<T> {
+    /// Load the first value for each docid in the provided slice.
+    ///
+    /// The `docids` vector is mutated: documents that do not match the `value_range` are removed.
+    /// The `values` vector is populated with the values of the remaining documents.
+    #[inline]
+    pub fn first_vals_in_value_range(
+        &self,
+        input_docs: &[DocId],
+        output: &mut Vec<crate::ComparableDoc<Option<T>, DocId>>,
+        value_range: ValueRange<T>,
+    ) {
+        match (&self.index, value_range) {
+            (ColumnIndex::Empty { .. }, value_range) => {
+                let nulls_match = match &value_range {
+                    ValueRange::All => true,
+                    ValueRange::Inclusive(_) => false,
+                    ValueRange::GreaterThan(_, nulls_match) => *nulls_match,
+                    ValueRange::GreaterThanOrEqual(_, nulls_match) => *nulls_match,
+                    ValueRange::LessThan(_, nulls_match) => *nulls_match,
+                    ValueRange::LessThanOrEqual(_, nulls_match) => *nulls_match,
+                };
+                if nulls_match {
+                    for &doc in input_docs {
+                        output.push(crate::ComparableDoc {
+                            doc,
+                            sort_key: None,
+                        });
+                    }
+                }
+            }
+            (ColumnIndex::Full, value_range) => {
+                self.values
+                    .get_vals_in_value_range(input_docs, input_docs, output, value_range);
+            }
+            (ColumnIndex::Optional(optional_index), value_range) => {
+                let nulls_match = match &value_range {
+                    ValueRange::All => true,
+                    ValueRange::Inclusive(_) => false,
+                    ValueRange::GreaterThan(_, nulls_match) => *nulls_match,
+                    ValueRange::GreaterThanOrEqual(_, nulls_match) => *nulls_match,
+                    ValueRange::LessThan(_, nulls_match) => *nulls_match,
+                    ValueRange::LessThanOrEqual(_, nulls_match) => *nulls_match,
+                };
+
+                let fallback_needed = ROWS.with(|rows_cell| {
+                    DOCS.with(|docs_cell| {
+                        let mut rows = rows_cell.borrow_mut();
+                        let mut docs = docs_cell.borrow_mut();
+                        rows.clear();
+                        docs.clear();
+
+                        let mut has_nulls = false;
+
+                        for &doc_id in input_docs {
+                            if let Some(row_id) = optional_index.rank_if_exists(doc_id) {
+                                rows.push(row_id);
+                                docs.push(doc_id);
+                            } else {
+                                has_nulls = true;
+                                if nulls_match {
+                                    break;
+                                }
+                            }
+                        }
+
+                        if !has_nulls || !nulls_match {
+                            self.values.get_vals_in_value_range(
+                                &rows,
+                                &docs,
+                                output,
+                                value_range.clone(),
+                            );
+                            return false;
+                        }
+                        true
+                    })
+                });
+
+                if fallback_needed {
+                    for &doc_id in input_docs {
+                        if let Some(row_id) = optional_index.rank_if_exists(doc_id) {
+                            let val = self.values.get_val(row_id);
+                            let value_matches = match &value_range {
+                                ValueRange::All => true,
+                                ValueRange::Inclusive(r) => r.contains(&val),
+                                ValueRange::GreaterThan(t, _) => val > *t,
+                                ValueRange::GreaterThanOrEqual(t, _) => val >= *t,
+                                ValueRange::LessThan(t, _) => val < *t,
+                                ValueRange::LessThanOrEqual(t, _) => val <= *t,
+                            };
+
+                            if value_matches {
+                                output.push(crate::ComparableDoc {
+                                    doc: doc_id,
+                                    sort_key: Some(val),
+                                });
+                            }
+                        } else if nulls_match {
+                            output.push(crate::ComparableDoc {
+                                doc: doc_id,
+                                sort_key: None,
+                            });
+                        }
+                    }
+                }
+            }
+            (ColumnIndex::Multivalued(multivalued_index), value_range) => {
+                let nulls_match = match &value_range {
+                    ValueRange::All => true,
+                    ValueRange::Inclusive(_) => false,
+                    ValueRange::GreaterThan(_, nulls_match) => *nulls_match,
+                    ValueRange::GreaterThanOrEqual(_, nulls_match) => *nulls_match,
+                    ValueRange::LessThan(_, nulls_match) => *nulls_match,
+                    ValueRange::LessThanOrEqual(_, nulls_match) => *nulls_match,
+                };
+                for i in 0..input_docs.len() {
+                    let docid = input_docs[i];
+                    let row_range = multivalued_index.range(docid);
+                    let is_empty = row_range.start == row_range.end;
+                    if !is_empty {
+                        let val = self.values.get_val(row_range.start);
+                        let matches = match &value_range {
+                            ValueRange::All => true,
+                            ValueRange::Inclusive(r) => r.contains(&val),
+                            ValueRange::GreaterThan(t, _) => val > *t,
+                            ValueRange::GreaterThanOrEqual(t, _) => val >= *t,
+                            ValueRange::LessThan(t, _) => val < *t,
+                            ValueRange::LessThanOrEqual(t, _) => val <= *t,
+                        };
+                        if matches {
+                            output.push(crate::ComparableDoc {
+                                doc: docid,
+                                sort_key: Some(val),
+                            });
+                        }
+                    } else if nulls_match {
+                        output.push(crate::ComparableDoc {
+                            doc: docid,
+                            sort_key: None,
+                        });
+                    }
+                }
+            }
+        }
+    }
+}
+
+/// A range of values.
+///
+/// This type is intended to be used in batch APIs, where the cost of unpacking the enum
+/// is outweighed by the time spent processing a batch.
+///
+/// Implementers should pattern match on the variants to use optimized loops for each case.
+#[derive(Clone, Debug)]
+pub enum ValueRange<T> {
+    /// A range that includes both start and end.
+    Inclusive(RangeInclusive<T>),
+    /// A range that matches all values.
+    All,
+    /// A range that matches all values greater than the threshold.
+    /// The boolean flag indicates if null values should be included.
+    GreaterThan(T, bool),
+    /// A range that matches all values greater than or equal to the threshold.
+    /// The boolean flag indicates if null values should be included.
+    GreaterThanOrEqual(T, bool),
+    /// A range that matches all values less than the threshold.
+    /// The boolean flag indicates if null values should be included.
+    LessThan(T, bool),
+    /// A range that matches all values less than or equal to the threshold.
+    /// The boolean flag indicates if null values should be included.
+    LessThanOrEqual(T, bool),
+}
+
+impl<T: PartialOrd> ValueRange<T> {
+    pub fn intersects(&self, min: T, max: T) -> bool {
+        match self {
+            ValueRange::Inclusive(range) => *range.start() <= max && *range.end() >= min,
+            ValueRange::All => true,
+            ValueRange::GreaterThan(val, _) => max > *val,
+            ValueRange::GreaterThanOrEqual(val, _) => max >= *val,
+            ValueRange::LessThan(val, _) => min < *val,
+            ValueRange::LessThanOrEqual(val, _) => min <= *val,
+        }
+    }
+}
+
 impl BinarySerializable for Cardinality {
    fn serialize<W: Write + ?Sized>(&self, writer: &mut W) -> std::io::Result<()> {
        self.to_code().serialize(writer)
--- a/columnar/src/column_index/merge/stacked.rs
+++ b/columnar/src/column_index/merge/stacked.rs
@@ -56,7 +56,7 @@ fn get_doc_ids_with_values<'a>(
        ColumnIndex::Full => Box::new(doc_range),
        ColumnIndex::Optional(optional_index) => Box::new(
            optional_index
-                .iter_docs()
+                .iter_non_null_docs()
                .map(move |row| row + doc_range.start),
        ),
        ColumnIndex::Multivalued(multivalued_index) => match multivalued_index {
@@ -73,7 +73,7 @@ fn get_doc_ids_with_values<'a>(
            MultiValueIndex::MultiValueIndexV2(multivalued_index) => Box::new(
                multivalued_index
                    .optional_index
-                    .iter_docs()
+                    .iter_non_null_docs()
                    .map(move |row| row + doc_range.start),
            ),
        },
@@ -105,10 +105,11 @@ fn get_num_values_iterator<'a>(
 ) -> Box<dyn Iterator<Item = u32> + 'a> {
    match column_index {
        ColumnIndex::Empty { .. } => Box::new(std::iter::empty()),
-        ColumnIndex::Full => Box::new(std::iter::repeat(1u32).take(num_docs as usize)),
-        ColumnIndex::Optional(optional_index) => {
-            Box::new(std::iter::repeat(1u32).take(optional_index.num_non_nulls() as usize))
-        }
+        ColumnIndex::Full => Box::new(std::iter::repeat_n(1u32, num_docs as usize)),
+        ColumnIndex::Optional(optional_index) => Box::new(std::iter::repeat_n(
+            1u32,
+            optional_index.num_non_nulls() as usize,
+        )),
        ColumnIndex::Multivalued(multivalued_index) => Box::new(
            multivalued_index
                .get_start_index_column()
@@ -177,7 +178,7 @@ impl<'a> Iterable<RowId> for StackedOptionalIndex<'a> {
                        ColumnIndex::Full => Box::new(columnar_row_range),
                        ColumnIndex::Optional(optional_index) => Box::new(
                            optional_index
-                                .iter_docs()
+                                .iter_non_null_docs()
                                .map(move |row_id: RowId| columnar_row_range.start + row_id),
                        ),
                        ColumnIndex::Multivalued(_) => {
--- a/columnar/src/column_index/multivalued_index.rs
+++ b/columnar/src/column_index/multivalued_index.rs
@@ -215,6 +215,32 @@ impl MultiValueIndex {
        }
    }

+    /// Returns an iterator over document ids that have at least one value.
+    pub fn iter_non_null_docs(&self) -> Box<dyn Iterator<Item = DocId> + '_> {
+        match self {
+            MultiValueIndex::MultiValueIndexV1(idx) => {
+                let mut doc: DocId = 0u32;
+                let num_docs = idx.num_docs();
+                Box::new(std::iter::from_fn(move || {
+                    // This is not the most efficient way to do this, but it's legacy code.
+                    while doc < num_docs {
+                        let cur = doc;
+                        doc += 1;
+                        let start = idx.start_index_column.get_val(cur);
+                        let end = idx.start_index_column.get_val(cur + 1);
+                        if end > start {
+                            return Some(cur);
+                        }
+                    }
+                    None
+                }))
+            }
+            MultiValueIndex::MultiValueIndexV2(idx) => {
+                Box::new(idx.optional_index.iter_non_null_docs())
+            }
+        }
+    }
+
    /// Converts a list of ranks (row ids of values) in a 1:n index to the corresponding list of
    /// docids. Positions are converted inplace to docids.
    ///
@@ -307,7 +333,7 @@ mod tests {
    use std::ops::Range;

    use super::MultiValueIndex;
-    use crate::{ColumnarReader, DynamicColumn};
+    use crate::{ColumnarReader, DynamicColumn, ValueRange};

    fn index_to_pos_helper(
        index: &MultiValueIndex,
@@ -387,7 +413,7 @@ mod tests {
        assert_eq!(row_id_range, 0..4);

        let check = |range, expected| {
-            let full_range = 0..=u64::MAX;
+            let full_range = ValueRange::All;
            let mut docids = Vec::new();
            column.get_docids_for_value_range(full_range, range, &mut docids);
            assert_eq!(docids, expected);
--- a/columnar/src/column_index/optional_index/mod.rs
+++ b/columnar/src/column_index/optional_index/mod.rs
@@ -1,4 +1,4 @@
-use std::io::{self, Write};
+use std::io;
 use std::sync::Arc;

 mod set;
@@ -11,7 +11,7 @@ use set_block::{
 };

 use crate::iterable::Iterable;
-use crate::{DocId, InvalidData, RowId};
+use crate::{DocId, RowId};

 /// The threshold for for number of elements after which we switch to dense block encoding.
 ///
@@ -88,7 +88,7 @@ pub struct OptionalIndex {

 impl Iterable<u32> for &OptionalIndex {
    fn boxed_iter(&self) -> Box<dyn Iterator<Item = u32> + '_> {
-        Box::new(self.iter_docs())
+        Box::new(self.iter_non_null_docs())
    }
 }

@@ -280,8 +280,9 @@ impl OptionalIndex {
        self.num_non_null_docs
    }

-    pub fn iter_docs(&self) -> impl Iterator<Item = RowId> + '_ {
-        // TODO optimize
+    pub fn iter_non_null_docs(&self) -> impl Iterator<Item = RowId> + '_ {
+        // TODO optimize. We could iterate over the blocks directly.
+        // We use the dense value ids and retrieve the doc ids via select.
        let mut select_batch = self.select_cursor();
        (0..self.num_non_null_docs).map(move |rank| select_batch.select(rank))
    }
@@ -334,38 +335,6 @@ enum Block<'a> {
    Sparse(SparseBlock<'a>),
 }

-#[derive(Debug, Copy, Clone)]
-enum OptionalIndexCodec {
-    Dense = 0,
-    Sparse = 1,
-}
-
-impl OptionalIndexCodec {
-    fn to_code(self) -> u8 {
-        self as u8
-    }
-
-    fn try_from_code(code: u8) -> Result<Self, InvalidData> {
-        match code {
-            0 => Ok(Self::Dense),
-            1 => Ok(Self::Sparse),
-            _ => Err(InvalidData),
-        }
-    }
-}
-
-impl BinarySerializable for OptionalIndexCodec {
-    fn serialize<W: Write + ?Sized>(&self, writer: &mut W) -> io::Result<()> {
-        writer.write_all(&[self.to_code()])
-    }
-
-    fn deserialize<R: io::Read>(reader: &mut R) -> io::Result<Self> {
-        let optional_codec_code = u8::deserialize(reader)?;
-        let optional_codec = Self::try_from_code(optional_codec_code)?;
-        Ok(optional_codec)
-    }
-}
-
 fn serialize_optional_index_block(block_els: &[u16], out: &mut impl io::Write) -> io::Result<()> {
    let is_sparse = is_sparse(block_els.len() as u32);
    if is_sparse {
--- a/columnar/src/column_index/optional_index/tests.rs
+++ b/columnar/src/column_index/optional_index/tests.rs
@@ -164,7 +164,11 @@ fn test_optional_index_large() {
 fn test_optional_index_iter_aux(row_ids: &[RowId], num_rows: RowId) {
    let optional_index = OptionalIndex::for_test(num_rows, row_ids);
    assert_eq!(optional_index.num_docs(), num_rows);
-    assert!(optional_index.iter_docs().eq(row_ids.iter().copied()));
+    assert!(
+        optional_index
+            .iter_non_null_docs()
+            .eq(row_ids.iter().copied())
+    );
 }

 #[test]
--- a/columnar/src/column_values/mod.rs
+++ b/columnar/src/column_values/mod.rs
@@ -7,13 +7,15 @@
 //! - Monotonically map values to u64/u128

 use std::fmt::Debug;
-use std::ops::{Range, RangeInclusive};
+use std::ops::Range;
 use std::sync::Arc;

 use downcast_rs::DowncastSync;
 pub use monotonic_mapping::{MonotonicallyMappableToU64, StrictlyMonotonicFn};
 pub use monotonic_mapping_u128::MonotonicallyMappableToU128;

+use crate::column::ValueRange;
+
 mod merge;
 pub(crate) mod monotonic_mapping;
 pub(crate) mod monotonic_mapping_u128;
@@ -109,6 +111,307 @@ pub trait ColumnValues<T: PartialOrd = u64>: Send + Sync + DowncastSync {
        }
    }

+    /// Load the values for the provided docids.
+    ///
+    /// The values are filtered by the provided value range.
+    fn get_vals_in_value_range(
+        &self,
+        input_indexes: &[u32],
+        input_doc_ids: &[u32],
+        output: &mut Vec<crate::ComparableDoc<Option<T>, crate::DocId>>,
+        value_range: ValueRange<T>,
+    ) {
+        let len = input_indexes.len();
+        let mut read_head = 0;
+
+        match value_range {
+            ValueRange::All => {
+                while read_head + 3 < len {
+                    let idx0 = input_indexes[read_head];
+                    let idx1 = input_indexes[read_head + 1];
+                    let idx2 = input_indexes[read_head + 2];
+                    let idx3 = input_indexes[read_head + 3];
+
+                    let doc0 = input_doc_ids[read_head];
+                    let doc1 = input_doc_ids[read_head + 1];
+                    let doc2 = input_doc_ids[read_head + 2];
+                    let doc3 = input_doc_ids[read_head + 3];
+
+                    let val0 = self.get_val(idx0);
+                    let val1 = self.get_val(idx1);
+                    let val2 = self.get_val(idx2);
+                    let val3 = self.get_val(idx3);
+
+                    output.push(crate::ComparableDoc {
+                        doc: doc0,
+                        sort_key: Some(val0),
+                    });
+                    output.push(crate::ComparableDoc {
+                        doc: doc1,
+                        sort_key: Some(val1),
+                    });
+                    output.push(crate::ComparableDoc {
+                        doc: doc2,
+                        sort_key: Some(val2),
+                    });
+                    output.push(crate::ComparableDoc {
+                        doc: doc3,
+                        sort_key: Some(val3),
+                    });
+
+                    read_head += 4;
+                }
+            }
+            ValueRange::Inclusive(ref range) => {
+                while read_head + 3 < len {
+                    let idx0 = input_indexes[read_head];
+                    let idx1 = input_indexes[read_head + 1];
+                    let idx2 = input_indexes[read_head + 2];
+                    let idx3 = input_indexes[read_head + 3];
+
+                    let doc0 = input_doc_ids[read_head];
+                    let doc1 = input_doc_ids[read_head + 1];
+                    let doc2 = input_doc_ids[read_head + 2];
+                    let doc3 = input_doc_ids[read_head + 3];
+
+                    let val0 = self.get_val(idx0);
+                    let val1 = self.get_val(idx1);
+                    let val2 = self.get_val(idx2);
+                    let val3 = self.get_val(idx3);
+
+                    if range.contains(&val0) {
+                        output.push(crate::ComparableDoc {
+                            doc: doc0,
+                            sort_key: Some(val0),
+                        });
+                    }
+                    if range.contains(&val1) {
+                        output.push(crate::ComparableDoc {
+                            doc: doc1,
+                            sort_key: Some(val1),
+                        });
+                    }
+                    if range.contains(&val2) {
+                        output.push(crate::ComparableDoc {
+                            doc: doc2,
+                            sort_key: Some(val2),
+                        });
+                    }
+                    if range.contains(&val3) {
+                        output.push(crate::ComparableDoc {
+                            doc: doc3,
+                            sort_key: Some(val3),
+                        });
+                    }
+
+                    read_head += 4;
+                }
+            }
+            ValueRange::GreaterThan(ref threshold, _) => {
+                while read_head + 3 < len {
+                    let idx0 = input_indexes[read_head];
+                    let idx1 = input_indexes[read_head + 1];
+                    let idx2 = input_indexes[read_head + 2];
+                    let idx3 = input_indexes[read_head + 3];
+
+                    let doc0 = input_doc_ids[read_head];
+                    let doc1 = input_doc_ids[read_head + 1];
+                    let doc2 = input_doc_ids[read_head + 2];
+                    let doc3 = input_doc_ids[read_head + 3];
+
+                    let val0 = self.get_val(idx0);
+                    let val1 = self.get_val(idx1);
+                    let val2 = self.get_val(idx2);
+                    let val3 = self.get_val(idx3);
+
+                    if val0 > *threshold {
+                        output.push(crate::ComparableDoc {
+                            doc: doc0,
+                            sort_key: Some(val0),
+                        });
+                    }
+                    if val1 > *threshold {
+                        output.push(crate::ComparableDoc {
+                            doc: doc1,
+                            sort_key: Some(val1),
+                        });
+                    }
+                    if val2 > *threshold {
+                        output.push(crate::ComparableDoc {
+                            doc: doc2,
+                            sort_key: Some(val2),
+                        });
+                    }
+                    if val3 > *threshold {
+                        output.push(crate::ComparableDoc {
+                            doc: doc3,
+                            sort_key: Some(val3),
+                        });
+                    }
+
+                    read_head += 4;
+                }
+            }
+            ValueRange::GreaterThanOrEqual(ref threshold, _) => {
+                while read_head + 3 < len {
+                    let idx0 = input_indexes[read_head];
+                    let idx1 = input_indexes[read_head + 1];
+                    let idx2 = input_indexes[read_head + 2];
+                    let idx3 = input_indexes[read_head + 3];
+
+                    let doc0 = input_doc_ids[read_head];
+                    let doc1 = input_doc_ids[read_head + 1];
+                    let doc2 = input_doc_ids[read_head + 2];
+                    let doc3 = input_doc_ids[read_head + 3];
+
+                    let val0 = self.get_val(idx0);
+                    let val1 = self.get_val(idx1);
+                    let val2 = self.get_val(idx2);
+                    let val3 = self.get_val(idx3);
+
+                    if val0 >= *threshold {
+                        output.push(crate::ComparableDoc {
+                            doc: doc0,
+                            sort_key: Some(val0),
+                        });
+                    }
+                    if val1 >= *threshold {
+                        output.push(crate::ComparableDoc {
+                            doc: doc1,
+                            sort_key: Some(val1),
+                        });
+                    }
+                    if val2 >= *threshold {
+                        output.push(crate::ComparableDoc {
+                            doc: doc2,
+                            sort_key: Some(val2),
+                        });
+                    }
+                    if val3 >= *threshold {
+                        output.push(crate::ComparableDoc {
+                            doc: doc3,
+                            sort_key: Some(val3),
+                        });
+                    }
+
+                    read_head += 4;
+                }
+            }
+            ValueRange::LessThan(ref threshold, _) => {
+                while read_head + 3 < len {
+                    let idx0 = input_indexes[read_head];
+                    let idx1 = input_indexes[read_head + 1];
+                    let idx2 = input_indexes[read_head + 2];
+                    let idx3 = input_indexes[read_head + 3];
+
+                    let doc0 = input_doc_ids[read_head];
+                    let doc1 = input_doc_ids[read_head + 1];
+                    let doc2 = input_doc_ids[read_head + 2];
+                    let doc3 = input_doc_ids[read_head + 3];
+
+                    let val0 = self.get_val(idx0);
+                    let val1 = self.get_val(idx1);
+                    let val2 = self.get_val(idx2);
+                    let val3 = self.get_val(idx3);
+
+                    if val0 < *threshold {
+                        output.push(crate::ComparableDoc {
+                            doc: doc0,
+                            sort_key: Some(val0),
+                        });
+                    }
+                    if val1 < *threshold {
+                        output.push(crate::ComparableDoc {
+                            doc: doc1,
+                            sort_key: Some(val1),
+                        });
+                    }
+                    if val2 < *threshold {
+                        output.push(crate::ComparableDoc {
+                            doc: doc2,
+                            sort_key: Some(val2),
+                        });
+                    }
+                    if val3 < *threshold {
+                        output.push(crate::ComparableDoc {
+                            doc: doc3,
+                            sort_key: Some(val3),
+                        });
+                    }
+
+                    read_head += 4;
+                }
+            }
+            ValueRange::LessThanOrEqual(ref threshold, _) => {
+                while read_head + 3 < len {
+                    let idx0 = input_indexes[read_head];
+                    let idx1 = input_indexes[read_head + 1];
+                    let idx2 = input_indexes[read_head + 2];
+                    let idx3 = input_indexes[read_head + 3];
+
+                    let doc0 = input_doc_ids[read_head];
+                    let doc1 = input_doc_ids[read_head + 1];
+                    let doc2 = input_doc_ids[read_head + 2];
+                    let doc3 = input_doc_ids[read_head + 3];
+
+                    let val0 = self.get_val(idx0);
+                    let val1 = self.get_val(idx1);
+                    let val2 = self.get_val(idx2);
+                    let val3 = self.get_val(idx3);
+
+                    if val0 <= *threshold {
+                        output.push(crate::ComparableDoc {
+                            doc: doc0,
+                            sort_key: Some(val0),
+                        });
+                    }
+                    if val1 <= *threshold {
+                        output.push(crate::ComparableDoc {
+                            doc: doc1,
+                            sort_key: Some(val1),
+                        });
+                    }
+                    if val2 <= *threshold {
+                        output.push(crate::ComparableDoc {
+                            doc: doc2,
+                            sort_key: Some(val2),
+                        });
+                    }
+                    if val3 <= *threshold {
+                        output.push(crate::ComparableDoc {
+                            doc: doc3,
+                            sort_key: Some(val3),
+                        });
+                    }
+
+                    read_head += 4;
+                }
+            }
+        }
+        // Process remaining elements (0 to 3)
+        while read_head < len {
+            let idx = input_indexes[read_head];
+            let doc = input_doc_ids[read_head];
+            let val = self.get_val(idx);
+            let matches = match value_range {
+                // 'value_range' is still moved here. This is the outer `value_range`
+                ValueRange::All => true,
+                ValueRange::Inclusive(ref r) => r.contains(&val),
+                ValueRange::GreaterThan(ref t, _) => val > *t,
+                ValueRange::GreaterThanOrEqual(ref t, _) => val >= *t,
+                ValueRange::LessThan(ref t, _) => val < *t,
+                ValueRange::LessThanOrEqual(ref t, _) => val <= *t,
+            };
+            if matches {
+                output.push(crate::ComparableDoc {
+                    doc,
+                    sort_key: Some(val),
+                });
+            }
+            read_head += 1;
+        }
+    }
+
    /// Fills an output buffer with the fast field values
    /// associated with the `DocId` going from
    /// `start` to `start + output.len()`.
@@ -129,15 +432,54 @@ pub trait ColumnValues<T: PartialOrd = u64>: Send + Sync + DowncastSync {
    /// Note that position == docid for single value fast fields
    fn get_row_ids_for_value_range(
        &self,
-        value_range: RangeInclusive<T>,
+        value_range: ValueRange<T>,
        row_id_range: Range<RowId>,
        row_id_hits: &mut Vec<RowId>,
    ) {
        let row_id_range = row_id_range.start..row_id_range.end.min(self.num_vals());
-        for idx in row_id_range {
-            let val = self.get_val(idx);
-            if value_range.contains(&val) {
-                row_id_hits.push(idx);
+        match value_range {
+            ValueRange::Inclusive(range) => {
+                for idx in row_id_range {
+                    let val = self.get_val(idx);
+                    if range.contains(&val) {
+                        row_id_hits.push(idx);
+                    }
+                }
+            }
+            ValueRange::GreaterThan(threshold, _) => {
+                for idx in row_id_range {
+                    let val = self.get_val(idx);
+                    if val > threshold {
+                        row_id_hits.push(idx);
+                    }
+                }
+            }
+            ValueRange::GreaterThanOrEqual(threshold, _) => {
+                for idx in row_id_range {
+                    let val = self.get_val(idx);
+                    if val >= threshold {
+                        row_id_hits.push(idx);
+                    }
+                }
+            }
+            ValueRange::LessThan(threshold, _) => {
+                for idx in row_id_range {
+                    let val = self.get_val(idx);
+                    if val < threshold {
+                        row_id_hits.push(idx);
+                    }
+                }
+            }
+            ValueRange::LessThanOrEqual(threshold, _) => {
+                for idx in row_id_range {
+                    let val = self.get_val(idx);
+                    if val <= threshold {
+                        row_id_hits.push(idx);
+                    }
+                }
+            }
+            ValueRange::All => {
+                row_id_hits.extend(row_id_range);
            }
        }
    }
@@ -193,6 +535,17 @@ impl<T: PartialOrd + Default> ColumnValues<T> for EmptyColumnValues {
    fn num_vals(&self) -> u32 {
        0
    }
+
+    fn get_vals_in_value_range(
+        &self,
+        input_indexes: &[u32],
+        input_doc_ids: &[u32],
+        output: &mut Vec<crate::ComparableDoc<Option<T>, crate::DocId>>,
+        value_range: ValueRange<T>,
+    ) {
+        let _ = (input_indexes, input_doc_ids, output, value_range);
+        panic!("Internal Error: Called get_vals_in_value_range of empty column.")
+    }
 }

 impl<T: Copy + PartialOrd + Debug + 'static> ColumnValues<T> for Arc<dyn ColumnValues<T>> {
@@ -206,6 +559,18 @@ impl<T: Copy + PartialOrd + Debug + 'static> ColumnValues<T> for Arc<dyn ColumnV
        self.as_ref().get_vals_opt(indexes, output)
    }

+    #[inline(always)]
+    fn get_vals_in_value_range(
+        &self,
+        input_indexes: &[u32],
+        input_doc_ids: &[u32],
+        output: &mut Vec<crate::ComparableDoc<Option<T>, crate::DocId>>,
+        value_range: ValueRange<T>,
+    ) {
+        self.as_ref()
+            .get_vals_in_value_range(input_indexes, input_doc_ids, output, value_range)
+    }
+
    #[inline(always)]
    fn min_value(&self) -> T {
        self.as_ref().min_value()
@@ -234,7 +599,7 @@ impl<T: Copy + PartialOrd + Debug + 'static> ColumnValues<T> for Arc<dyn ColumnV
    #[inline(always)]
    fn get_row_ids_for_value_range(
        &self,
-        range: RangeInclusive<T>,
+        range: ValueRange<T>,
        doc_id_range: Range<u32>,
        positions: &mut Vec<u32>,
    ) {
--- a/columnar/src/column_values/monotonic_column.rs
+++ b/columnar/src/column_values/monotonic_column.rs
@@ -1,8 +1,9 @@
 use std::fmt::Debug;
 use std::marker::PhantomData;
-use std::ops::{Range, RangeInclusive};
+use std::ops::Range;

 use crate::ColumnValues;
+use crate::column::ValueRange;
 use crate::column_values::monotonic_mapping::StrictlyMonotonicFn;

 struct MonotonicMappingColumn<C, T, Input> {
@@ -80,16 +81,52 @@ where

    fn get_row_ids_for_value_range(
        &self,
-        range: RangeInclusive<Output>,
+        range: ValueRange<Output>,
        doc_id_range: Range<u32>,
        positions: &mut Vec<u32>,
    ) {
-        self.from_column.get_row_ids_for_value_range(
-            self.monotonic_mapping.inverse(range.start().clone())
-                ..=self.monotonic_mapping.inverse(range.end().clone()),
-            doc_id_range,
-            positions,
-        )
+        match range {
+            ValueRange::Inclusive(range) => self.from_column.get_row_ids_for_value_range(
+                ValueRange::Inclusive(
+                    self.monotonic_mapping.inverse(range.start().clone())
+                        ..=self.monotonic_mapping.inverse(range.end().clone()),
+                ),
+                doc_id_range,
+                positions,
+            ),
+            ValueRange::All => self.from_column.get_row_ids_for_value_range(
+                ValueRange::All,
+                doc_id_range,
+                positions,
+            ),
+            ValueRange::GreaterThan(threshold, _) => self.from_column.get_row_ids_for_value_range(
+                ValueRange::GreaterThan(self.monotonic_mapping.inverse(threshold), false),
+                doc_id_range,
+                positions,
+            ),
+            ValueRange::GreaterThanOrEqual(threshold, _) => {
+                self.from_column.get_row_ids_for_value_range(
+                    ValueRange::GreaterThanOrEqual(
+                        self.monotonic_mapping.inverse(threshold),
+                        false,
+                    ),
+                    doc_id_range,
+                    positions,
+                )
+            }
+            ValueRange::LessThan(threshold, _) => self.from_column.get_row_ids_for_value_range(
+                ValueRange::LessThan(self.monotonic_mapping.inverse(threshold), false),
+                doc_id_range,
+                positions,
+            ),
+            ValueRange::LessThanOrEqual(threshold, _) => {
+                self.from_column.get_row_ids_for_value_range(
+                    ValueRange::LessThanOrEqual(self.monotonic_mapping.inverse(threshold), false),
+                    doc_id_range,
+                    positions,
+                )
+            }
+        }
    }

    // We voluntarily do not implement get_range as it yields a regression,
--- a/columnar/src/column_values/monotonic_mapping_u128.rs
+++ b/columnar/src/column_values/monotonic_mapping_u128.rs
@@ -1,7 +1,7 @@
 use std::fmt::Debug;
 use std::net::Ipv6Addr;

-/// Montonic maps a value to u128 value space
+/// Monotonic maps a value to u128 value space
 /// Monotonic mapping enables `PartialOrd` on u128 space without conversion to original space.
 pub trait MonotonicallyMappableToU128: 'static + PartialOrd + Copy + Debug + Send + Sync {
    /// Converts a value to u128.
--- a/columnar/src/column_values/u128_based/compact_space/build_compact_space.rs
+++ b/columnar/src/column_values/u128_based/compact_space/build_compact_space.rs
@@ -185,10 +185,10 @@ impl CompactSpaceBuilder {
        let mut covered_space = Vec::with_capacity(self.blanks.len());

        // beginning of the blanks
-        if let Some(first_blank_start) = self.blanks.first().map(RangeInclusive::start) {
-            if *first_blank_start != 0 {
-                covered_space.push(0..=first_blank_start - 1);
-            }
+        if let Some(first_blank_start) = self.blanks.first().map(RangeInclusive::start)
+            && *first_blank_start != 0
+        {
+            covered_space.push(0..=first_blank_start - 1);
        }

        // Between the blanks
@@ -202,10 +202,10 @@ impl CompactSpaceBuilder {
        covered_space.extend(between_blanks);

        // end of the blanks
-        if let Some(last_blank_end) = self.blanks.last().map(RangeInclusive::end) {
-            if *last_blank_end != u128::MAX {
-                covered_space.push(last_blank_end + 1..=u128::MAX);
-            }
+        if let Some(last_blank_end) = self.blanks.last().map(RangeInclusive::end)
+            && *last_blank_end != u128::MAX
+        {
+            covered_space.push(last_blank_end + 1..=u128::MAX);
        }

        if covered_space.is_empty() {
--- a/columnar/src/column_values/u128_based/compact_space/mod.rs
+++ b/columnar/src/column_values/u128_based/compact_space/mod.rs
@@ -25,6 +25,7 @@ use common::{BinarySerializable, CountingWriter, OwnedBytes, VInt, VIntU128};
 use tantivy_bitpacker::{BitPacker, BitUnpacker};

 use crate::RowId;
+use crate::column::ValueRange;
 use crate::column_values::ColumnValues;

 /// The cost per blank is quite hard actually, since blanks are delta encoded, the actual cost of
@@ -338,14 +339,48 @@ impl ColumnValues<u64> for CompactSpaceU64Accessor {
    #[inline]
    fn get_row_ids_for_value_range(
        &self,
-        value_range: RangeInclusive<u64>,
+        value_range: ValueRange<u64>,
        position_range: Range<u32>,
        positions: &mut Vec<u32>,
    ) {
-        let value_range = self.0.compact_to_u128(*value_range.start() as u32)
-            ..=self.0.compact_to_u128(*value_range.end() as u32);
-        self.0
-            .get_row_ids_for_value_range(value_range, position_range, positions)
+        match value_range {
+            ValueRange::Inclusive(value_range) => {
+                let value_range = ValueRange::Inclusive(
+                    self.0.compact_to_u128(*value_range.start() as u32)
+                        ..=self.0.compact_to_u128(*value_range.end() as u32),
+                );
+                self.0
+                    .get_row_ids_for_value_range(value_range, position_range, positions)
+            }
+            ValueRange::All => {
+                let position_range = position_range.start..position_range.end.min(self.num_vals());
+                positions.extend(position_range);
+            }
+            ValueRange::GreaterThan(threshold, _) => {
+                let value_range =
+                    ValueRange::GreaterThan(self.0.compact_to_u128(threshold as u32), false);
+                self.0
+                    .get_row_ids_for_value_range(value_range, position_range, positions)
+            }
+            ValueRange::GreaterThanOrEqual(threshold, _) => {
+                let value_range =
+                    ValueRange::GreaterThanOrEqual(self.0.compact_to_u128(threshold as u32), false);
+                self.0
+                    .get_row_ids_for_value_range(value_range, position_range, positions)
+            }
+            ValueRange::LessThan(threshold, _) => {
+                let value_range =
+                    ValueRange::LessThan(self.0.compact_to_u128(threshold as u32), false);
+                self.0
+                    .get_row_ids_for_value_range(value_range, position_range, positions)
+            }
+            ValueRange::LessThanOrEqual(threshold, _) => {
+                let value_range =
+                    ValueRange::LessThanOrEqual(self.0.compact_to_u128(threshold as u32), false);
+                self.0
+                    .get_row_ids_for_value_range(value_range, position_range, positions)
+            }
+        }
    }
 }

@@ -375,10 +410,47 @@ impl ColumnValues<u128> for CompactSpaceDecompressor {
    #[inline]
    fn get_row_ids_for_value_range(
        &self,
-        value_range: RangeInclusive<u128>,
+        value_range: ValueRange<u128>,
        position_range: Range<u32>,
        positions: &mut Vec<u32>,
    ) {
+        let value_range = match value_range {
+            ValueRange::Inclusive(value_range) => value_range,
+            ValueRange::All => {
+                let position_range = position_range.start..position_range.end.min(self.num_vals());
+                positions.extend(position_range);
+                return;
+            }
+            ValueRange::GreaterThan(threshold, _) => {
+                let max = self.max_value();
+                if threshold >= max {
+                    return;
+                }
+                (threshold + 1)..=max
+            }
+            ValueRange::GreaterThanOrEqual(threshold, _) => {
+                let max = self.max_value();
+                if threshold > max {
+                    return;
+                }
+                threshold..=max
+            }
+            ValueRange::LessThan(threshold, _) => {
+                let min = self.min_value();
+                if threshold <= min {
+                    return;
+                }
+                min..=(threshold - 1)
+            }
+            ValueRange::LessThanOrEqual(threshold, _) => {
+                let min = self.min_value();
+                if threshold < min {
+                    return;
+                }
+                min..=threshold
+            }
+        };
+
        if value_range.start() > value_range.end() {
            return;
        }
@@ -560,7 +632,7 @@ mod tests {
                    .collect::<Vec<_>>();
                let mut positions = Vec::new();
                decompressor.get_row_ids_for_value_range(
-                    range,
+                    ValueRange::Inclusive(range),
                    0..decompressor.num_vals(),
                    &mut positions,
                );
@@ -604,7 +676,11 @@ mod tests {
            let val = *val;
            let pos = pos as u32;
            let mut positions = Vec::new();
-            decomp.get_row_ids_for_value_range(val..=val, pos..pos + 1, &mut positions);
+            decomp.get_row_ids_for_value_range(
+                ValueRange::Inclusive(val..=val),
+                pos..pos + 1,
+                &mut positions,
+            );
            assert_eq!(positions, vec![pos]);
        }

@@ -746,7 +822,11 @@ mod tests {
        doc_id_range: Range<u32>,
    ) -> Vec<u32> {
        let mut positions = Vec::new();
-        column.get_row_ids_for_value_range(value_range, doc_id_range, &mut positions);
+        column.get_row_ids_for_value_range(
+            ValueRange::Inclusive(value_range),
+            doc_id_range,
+            &mut positions,
+        );
        positions
    }

--- a/columnar/src/column_values/u64_based/bitpacked.rs
+++ b/columnar/src/column_values/u64_based/bitpacked.rs
@@ -6,6 +6,7 @@ use common::{BinarySerializable, OwnedBytes};
 use fastdivide::DividerU64;
 use tantivy_bitpacker::{BitPacker, BitUnpacker, compute_num_bits};

+use crate::column::ValueRange;
 use crate::column_values::u64_based::{ColumnCodec, ColumnCodecEstimator, ColumnStats};
 use crate::{ColumnValues, RowId};

@@ -41,12 +42,6 @@ fn transform_range_before_linear_transformation(
    if range.is_empty() {
        return None;
    }
-    if stats.min_value > *range.end() {
-        return None;
-    }
-    if stats.max_value < *range.start() {
-        return None;
-    }
    let shifted_range =
        range.start().saturating_sub(stats.min_value)..=range.end().saturating_sub(stats.min_value);
    let start_before_gcd_multiplication: u64 = div_ceil(*shifted_range.start(), stats.gcd);
@@ -72,24 +67,273 @@ impl ColumnValues for BitpackedReader {
        self.stats.num_rows
    }

+    fn get_vals_in_value_range(
+        &self,
+        input_indexes: &[u32],
+        input_doc_ids: &[u32],
+        output: &mut Vec<crate::ComparableDoc<Option<u64>, crate::DocId>>,
+        value_range: ValueRange<u64>,
+    ) {
+        match value_range {
+            ValueRange::All => {
+                for (&idx, &doc) in input_indexes.iter().zip(input_doc_ids.iter()) {
+                    output.push(crate::ComparableDoc {
+                        doc,
+                        sort_key: Some(self.get_val(idx)),
+                    });
+                }
+            }
+            ValueRange::Inclusive(range) => {
+                if let Some(transformed_range) =
+                    transform_range_before_linear_transformation(&self.stats, range)
+                {
+                    for (&idx, &doc) in input_indexes.iter().zip(input_doc_ids.iter()) {
+                        let raw_val = self.get_val(idx);
+                        if transformed_range.contains(&raw_val) {
+                            output.push(crate::ComparableDoc {
+                                doc,
+                                sort_key: Some(
+                                    self.stats.min_value + self.stats.gcd.get() * raw_val,
+                                ),
+                            });
+                        }
+                    }
+                }
+            }
+            ValueRange::GreaterThan(threshold, _) => {
+                if threshold < self.stats.min_value {
+                    for (&idx, &doc) in input_indexes.iter().zip(input_doc_ids.iter()) {
+                        output.push(crate::ComparableDoc {
+                            doc,
+                            sort_key: Some(self.get_val(idx)),
+                        });
+                    }
+                } else if threshold >= self.stats.max_value {
+                    // All filtered out
+                } else {
+                    let raw_threshold = (threshold - self.stats.min_value) / self.stats.gcd.get();
+                    for (&idx, &doc) in input_indexes.iter().zip(input_doc_ids.iter()) {
+                        let raw_val = self.get_val(idx);
+                        if raw_val > raw_threshold {
+                            output.push(crate::ComparableDoc {
+                                doc,
+                                sort_key: Some(
+                                    self.stats.min_value + self.stats.gcd.get() * raw_val,
+                                ),
+                            });
+                        }
+                    }
+                }
+            }
+            ValueRange::GreaterThanOrEqual(threshold, _) => {
+                if threshold <= self.stats.min_value {
+                    for (&idx, &doc) in input_indexes.iter().zip(input_doc_ids.iter()) {
+                        output.push(crate::ComparableDoc {
+                            doc,
+                            sort_key: Some(self.get_val(idx)),
+                        });
+                    }
+                } else if threshold > self.stats.max_value {
+                    // All filtered out
+                } else {
+                    let diff = threshold - self.stats.min_value;
+                    let gcd = self.stats.gcd.get();
+                    let raw_threshold = (diff + gcd - 1) / gcd;
+                    for (&idx, &doc) in input_indexes.iter().zip(input_doc_ids.iter()) {
+                        let raw_val = self.get_val(idx);
+                        if raw_val >= raw_threshold {
+                            output.push(crate::ComparableDoc {
+                                doc,
+                                sort_key: Some(
+                                    self.stats.min_value + self.stats.gcd.get() * raw_val,
+                                ),
+                            });
+                        }
+                    }
+                }
+            }
+            ValueRange::LessThan(threshold, _) => {
+                if threshold > self.stats.max_value {
+                    for (&idx, &doc) in input_indexes.iter().zip(input_doc_ids.iter()) {
+                        output.push(crate::ComparableDoc {
+                            doc,
+                            sort_key: Some(self.get_val(idx)),
+                        });
+                    }
+                } else if threshold <= self.stats.min_value {
+                    // All filtered out
+                } else {
+                    let diff = threshold - self.stats.min_value;
+                    let gcd = self.stats.gcd.get();
+                    let raw_threshold = if diff % gcd == 0 {
+                        diff / gcd
+                    } else {
+                        diff / gcd + 1
+                    };
+
+                    for (&idx, &doc) in input_indexes.iter().zip(input_doc_ids.iter()) {
+                        let raw_val = self.get_val(idx);
+                        if raw_val < raw_threshold {
+                            output.push(crate::ComparableDoc {
+                                doc,
+                                sort_key: Some(
+                                    self.stats.min_value + self.stats.gcd.get() * raw_val,
+                                ),
+                            });
+                        }
+                    }
+                }
+            }
+            ValueRange::LessThanOrEqual(threshold, _) => {
+                if threshold >= self.stats.max_value {
+                    for (&idx, &doc) in input_indexes.iter().zip(input_doc_ids.iter()) {
+                        output.push(crate::ComparableDoc {
+                            doc,
+                            sort_key: Some(self.get_val(idx)),
+                        });
+                    }
+                } else if threshold < self.stats.min_value {
+                    // All filtered out
+                } else {
+                    let diff = threshold - self.stats.min_value;
+                    let gcd = self.stats.gcd.get();
+                    let raw_threshold = diff / gcd;
+
+                    for (&idx, &doc) in input_indexes.iter().zip(input_doc_ids.iter()) {
+                        let raw_val = self.get_val(idx);
+                        if raw_val <= raw_threshold {
+                            output.push(crate::ComparableDoc {
+                                doc,
+                                sort_key: Some(
+                                    self.stats.min_value + self.stats.gcd.get() * raw_val,
+                                ),
+                            });
+                        }
+                    }
+                }
+            }
+        }
+    }
    fn get_row_ids_for_value_range(
        &self,
-        range: RangeInclusive<u64>,
+        range: ValueRange<u64>,
        doc_id_range: Range<u32>,
        positions: &mut Vec<u32>,
    ) {
-        let Some(transformed_range) =
-            transform_range_before_linear_transformation(&self.stats, range)
-        else {
-            positions.clear();
-            return;
-        };
-        self.bit_unpacker.get_ids_for_value_range(
-            transformed_range,
-            doc_id_range,
-            &self.data,
-            positions,
-        );
+        match range {
+            ValueRange::All => {
+                positions.extend(doc_id_range);
+                return;
+            }
+            ValueRange::Inclusive(range) => {
+                let Some(transformed_range) =
+                    transform_range_before_linear_transformation(&self.stats, range)
+                else {
+                    positions.clear();
+                    return;
+                };
+
+                self.bit_unpacker.get_ids_for_value_range(
+                    transformed_range,
+                    doc_id_range,
+                    &self.data,
+                    positions,
+                );
+            }
+            ValueRange::GreaterThan(threshold, _) => {
+                if threshold < self.stats.min_value {
+                    positions.extend(doc_id_range);
+                    return;
+                }
+                if threshold >= self.stats.max_value {
+                    return;
+                }
+                let raw_threshold = (threshold - self.stats.min_value) / self.stats.gcd.get();
+                let max_raw = (self.stats.max_value - self.stats.min_value) / self.stats.gcd.get();
+                let transformed_range = (raw_threshold + 1)..=max_raw;
+
+                self.bit_unpacker.get_ids_for_value_range(
+                    transformed_range,
+                    doc_id_range,
+                    &self.data,
+                    positions,
+                );
+            }
+            ValueRange::GreaterThanOrEqual(threshold, _) => {
+                if threshold <= self.stats.min_value {
+                    positions.extend(doc_id_range);
+                    return;
+                }
+                if threshold > self.stats.max_value {
+                    return;
+                }
+                let diff = threshold - self.stats.min_value;
+                let gcd = self.stats.gcd.get();
+                let raw_threshold = (diff + gcd - 1) / gcd;
+                // We want raw >= raw_threshold.
+                let max_raw = (self.stats.max_value - self.stats.min_value) / self.stats.gcd.get();
+                let transformed_range = raw_threshold..=max_raw;
+
+                self.bit_unpacker.get_ids_for_value_range(
+                    transformed_range,
+                    doc_id_range,
+                    &self.data,
+                    positions,
+                );
+            }
+            ValueRange::LessThan(threshold, _) => {
+                if threshold > self.stats.max_value {
+                    positions.extend(doc_id_range);
+                    return;
+                }
+                if threshold <= self.stats.min_value {
+                    return;
+                }
+
+                let diff = threshold - self.stats.min_value;
+                let gcd = self.stats.gcd.get();
+                // We want raw < raw_threshold_limit
+                // raw <= raw_threshold_limit - 1
+                let raw_threshold_limit = if diff % gcd == 0 {
+                    diff / gcd
+                } else {
+                    diff / gcd + 1
+                };
+
+                if raw_threshold_limit == 0 {
+                    return;
+                }
+                let transformed_range = 0..=(raw_threshold_limit - 1);
+
+                self.bit_unpacker.get_ids_for_value_range(
+                    transformed_range,
+                    doc_id_range,
+                    &self.data,
+                    positions,
+                );
+            }
+            ValueRange::LessThanOrEqual(threshold, _) => {
+                if threshold >= self.stats.max_value {
+                    positions.extend(doc_id_range);
+                    return;
+                }
+                if threshold < self.stats.min_value {
+                    return;
+                }
+                let diff = threshold - self.stats.min_value;
+                let gcd = self.stats.gcd.get();
+                // We want raw <= raw_threshold.
+                let raw_threshold = diff / gcd;
+                let transformed_range = 0..=raw_threshold;
+
+                self.bit_unpacker.get_ids_for_value_range(
+                    transformed_range,
+                    doc_id_range,
+                    &self.data,
+                    positions,
+                );
+            }
+        }
    }
 }

@@ -105,7 +349,7 @@ impl ColumnCodecEstimator for BitpackedCodecEstimator {

    fn estimate(&self, stats: &ColumnStats) -> Option<u64> {
        let num_bits_per_value = num_bits(stats);
-        Some(stats.num_bytes() + (stats.num_rows as u64 * (num_bits_per_value as u64) + 7) / 8)
+        Some(stats.num_bytes() + (stats.num_rows as u64 * (num_bits_per_value as u64)).div_ceil(8))
    }

    fn serialize(
--- a/columnar/src/column_values/u64_based/line.rs
+++ b/columnar/src/column_values/u64_based/line.rs
@@ -8,7 +8,7 @@ use crate::column_values::ColumnValues;
 const MID_POINT: u64 = (1u64 << 32) - 1u64;

 /// `Line` describes a line function `y: ax + b` using integer
-/// arithmetics.
+/// arithmetic.
 ///
 /// The slope is in fact a decimal split into a 32 bit integer value,
 /// and a 32-bit decimal value.
@@ -94,7 +94,7 @@ impl Line {
        // `(i, ys[])`.
        //
        // The best intercept therefore has the form
-        // `y[i] - line.eval(i)` (using wrapping arithmetics).
+        // `y[i] - line.eval(i)` (using wrapping arithmetic).
        // In other words, the best intercept is one of the `y - Line::eval(ys[i])`
        // and our task is just to pick the one that minimizes our error.
        //
--- a/columnar/src/column_values/u64_based/linear.rs
+++ b/columnar/src/column_values/u64_based/linear.rs
@@ -117,7 +117,7 @@ impl ColumnCodecEstimator for LinearCodecEstimator {
        Some(
            stats.num_bytes()
                + linear_params.num_bytes()
-                + (num_bits as u64 * stats.num_rows as u64 + 7) / 8,
+                + (num_bits as u64 * stats.num_rows as u64).div_ceil(8),
        )
    }

--- a/columnar/src/column_values/u64_based/mod.rs
+++ b/columnar/src/column_values/u64_based/mod.rs
@@ -52,7 +52,7 @@ pub trait ColumnCodecEstimator<T = u64>: 'static {
    ) -> io::Result<()>;
 }

-/// A column codec describes a colunm serialization format.
+/// A column codec describes a column serialization format.
 pub trait ColumnCodec<T: PartialOrd = u64> {
    /// Specialized `ColumnValues` type.
    type ColumnValues: ColumnValues<T> + 'static;
--- a/columnar/src/column_values/u64_based/tests.rs
+++ b/columnar/src/column_values/u64_based/tests.rs
@@ -131,7 +131,7 @@ pub(crate) fn create_and_validate<TColumnCodec: ColumnCodec>(
            .collect();
        let mut positions = Vec::new();
        reader.get_row_ids_for_value_range(
-            vals[test_rand_idx]..=vals[test_rand_idx],
+            crate::column::ValueRange::Inclusive(vals[test_rand_idx]..=vals[test_rand_idx]),
            0..vals.len() as u32,
            &mut positions,
        );
--- a/columnar/src/columnar/merge/mod.rs
+++ b/columnar/src/columnar/merge/mod.rs
@@ -367,7 +367,7 @@ fn is_empty_after_merge(
                    ColumnIndex::Empty { .. } => true,
                    ColumnIndex::Full => alive_bitset.len() == 0,
                    ColumnIndex::Optional(optional_index) => {
-                        for doc in optional_index.iter_docs() {
+                        for doc in optional_index.iter_non_null_docs() {
                            if alive_bitset.contains(doc) {
                                return false;
                            }
--- a/columnar/src/columnar/writer/column_operation.rs
+++ b/columnar/src/columnar/writer/column_operation.rs
@@ -244,7 +244,7 @@ impl SymbolValue for UnorderedId {

 fn compute_num_bytes_for_u64(val: u64) -> usize {
    let msb = (64u32 - val.leading_zeros()) as usize;
-    (msb + 7) / 8
+    msb.div_ceil(8)
 }

 fn encode_zig_zag(n: i64) -> u64 {
--- a/columnar/src/comparable_doc.rs
+++ b/columnar/src/comparable_doc.rs
@@ -0,0 +1,22 @@
+use serde::{Deserialize, Serialize};
+
+/// Contains a feature (field, score, etc.) of a document along with the document address.
+///
+/// Used only by TopNComputer, which implements the actual comparison via a `Comparator`.
+#[derive(Clone, Default, Eq, PartialEq, Serialize, Deserialize)]
+pub struct ComparableDoc<T, D> {
+    /// The feature of the document. In practice, this is
+    /// is a type which can be compared with a `Comparator<T>`.
+    pub sort_key: T,
+    /// The document address. In practice, this is either a `DocId` or `DocAddress`.
+    pub doc: D,
+}
+
+impl<T: std::fmt::Debug, D: std::fmt::Debug> std::fmt::Debug for ComparableDoc<T, D> {
+    fn fmt(&self, f: &mut std::fmt::Formatter) -> std::fmt::Result {
+        f.debug_struct("ComparableDoc")
+            .field("feature", &self.sort_key)
+            .field("doc", &self.doc)
+            .finish()
+    }
+}
--- a/columnar/src/dynamic_column.rs
+++ b/columnar/src/dynamic_column.rs
@@ -3,7 +3,8 @@ use std::sync::Arc;
 use std::{fmt, io};

 use common::file_slice::FileSlice;
-use common::{ByteCount, DateTime, HasLen, OwnedBytes};
+use common::{ByteCount, DateTime, OwnedBytes};
+use serde::{Deserialize, Serialize};

 use crate::column::{BytesColumn, Column, StrColumn};
 use crate::column_values::{StrictlyMonotonicFn, monotonic_map_column};
@@ -317,10 +318,89 @@ impl DynamicColumnHandle {
    }

    pub fn num_bytes(&self) -> ByteCount {
-        self.file_slice.len().into()
+        self.file_slice.num_bytes()
+    }
+
+    /// Legacy helper returning the column space usage.
+    pub fn column_and_dictionary_num_bytes(&self) -> io::Result<ColumnSpaceUsage> {
+        self.space_usage()
+    }
+
+    /// Return the space usage of the column, optionally broken down by dictionary and column
+    /// values.
+    ///
+    /// For dictionary encoded columns (strings and bytes), this splits the total footprint into
+    /// the dictionary and the remaining column data (including index and values).
+    /// For all other column types, the dictionary size is `None` and the column size
+    /// equals the total bytes.
+    pub fn space_usage(&self) -> io::Result<ColumnSpaceUsage> {
+        let total_num_bytes = self.num_bytes();
+        let dynamic_column = self.open()?;
+        let dictionary_num_bytes = match &dynamic_column {
+            DynamicColumn::Bytes(bytes_column) => bytes_column.dictionary().num_bytes(),
+            DynamicColumn::Str(str_column) => str_column.dictionary().num_bytes(),
+            _ => {
+                return Ok(ColumnSpaceUsage::new(self.num_bytes(), None));
+            }
+        };
+        assert!(dictionary_num_bytes <= total_num_bytes);
+        let column_num_bytes =
+            ByteCount::from(total_num_bytes.get_bytes() - dictionary_num_bytes.get_bytes());
+        Ok(ColumnSpaceUsage::new(
+            column_num_bytes,
+            Some(dictionary_num_bytes),
+        ))
    }

    pub fn column_type(&self) -> ColumnType {
        self.column_type
    }
 }
+
+/// Represents space usage of a column.
+///
+/// `column_num_bytes` tracks the column payload (index, values and footer).
+/// For dictionary encoded columns, `dictionary_num_bytes` captures the dictionary footprint.
+/// [`ColumnSpaceUsage::total_num_bytes`] returns the sum of both parts.
+#[derive(Clone, Debug, Serialize, Deserialize)]
+pub struct ColumnSpaceUsage {
+    column_num_bytes: ByteCount,
+    dictionary_num_bytes: Option<ByteCount>,
+}
+
+impl ColumnSpaceUsage {
+    pub(crate) fn new(
+        column_num_bytes: ByteCount,
+        dictionary_num_bytes: Option<ByteCount>,
+    ) -> Self {
+        ColumnSpaceUsage {
+            column_num_bytes,
+            dictionary_num_bytes,
+        }
+    }
+
+    pub fn column_num_bytes(&self) -> ByteCount {
+        self.column_num_bytes
+    }
+
+    pub fn dictionary_num_bytes(&self) -> Option<ByteCount> {
+        self.dictionary_num_bytes
+    }
+
+    pub fn total_num_bytes(&self) -> ByteCount {
+        self.column_num_bytes + self.dictionary_num_bytes.unwrap_or_default()
+    }
+
+    /// Merge two space usage values by summing their components.
+    pub fn merge(&self, other: &ColumnSpaceUsage) -> ColumnSpaceUsage {
+        let dictionary_num_bytes = match (self.dictionary_num_bytes, other.dictionary_num_bytes) {
+            (Some(lhs), Some(rhs)) => Some(lhs + rhs),
+            (Some(val), None) | (None, Some(val)) => Some(val),
+            (None, None) => None,
+        };
+        ColumnSpaceUsage {
+            column_num_bytes: self.column_num_bytes + other.column_num_bytes,
+            dictionary_num_bytes,
+        }
+    }
+}
--- a/columnar/src/lib.rs
+++ b/columnar/src/lib.rs
@@ -29,6 +29,7 @@ mod column;
 pub mod column_index;
 pub mod column_values;
 mod columnar;
+mod comparable_doc;
 mod dictionary;
 mod dynamic_column;
 mod iterable;
@@ -36,7 +37,7 @@ pub(crate) mod utils;
 mod value;

 pub use block_accessor::ColumnBlockAccessor;
-pub use column::{BytesColumn, Column, StrColumn};
+pub use column::{BytesColumn, Column, StrColumn, ValueRange};
 pub use column_index::ColumnIndex;
 pub use column_values::{
    ColumnValues, EmptyColumnValues, MonotonicallyMappableToU64, MonotonicallyMappableToU128,
@@ -45,10 +46,11 @@ pub use columnar::{
    CURRENT_VERSION, ColumnType, ColumnarReader, ColumnarWriter, HasAssociatedColumnType,
    MergeRowOrder, ShuffleMergeOrder, StackMergeOrder, Version, merge_columnar,
 };
+pub use comparable_doc::ComparableDoc;
 use sstable::VoidSSTable;
 pub use value::{NumericalType, NumericalValue};

-pub use self::dynamic_column::{DynamicColumn, DynamicColumnHandle};
+pub use self::dynamic_column::{ColumnSpaceUsage, DynamicColumn, DynamicColumnHandle};

 pub type RowId = u32;
 pub type DocId = u32;
--- a/columnar/src/value.rs
+++ b/columnar/src/value.rs
@@ -1,3 +1,5 @@
+use std::str::FromStr;
+
 use common::DateTime;

 use crate::InvalidData;
@@ -9,6 +11,23 @@ pub enum NumericalValue {
    F64(f64),
 }

+impl FromStr for NumericalValue {
+    type Err = ();
+
+    fn from_str(s: &str) -> Result<Self, ()> {
+        if let Ok(val_i64) = s.parse::<i64>() {
+            return Ok(val_i64.into());
+        }
+        if let Ok(val_u64) = s.parse::<u64>() {
+            return Ok(val_u64.into());
+        }
+        if let Ok(val_f64) = s.parse::<f64>() {
+            return Ok(NumericalValue::from(val_f64).normalize());
+        }
+        Err(())
+    }
+}
+
 impl NumericalValue {
    pub fn numerical_type(&self) -> NumericalType {
        match self {
@@ -26,7 +45,7 @@ impl NumericalValue {
                if val <= i64::MAX as u64 {
                    NumericalValue::I64(val as i64)
                } else {
-                    NumericalValue::F64(val as f64)
+                    NumericalValue::U64(val)
                }
            }
            NumericalValue::I64(val) => NumericalValue::I64(val),
@@ -141,6 +160,7 @@ impl Coerce for DateTime {
 #[cfg(test)]
 mod tests {
    use super::NumericalType;
+    use crate::NumericalValue;

    #[test]
    fn test_numerical_type_code() {
@@ -153,4 +173,58 @@ mod tests {
        }
        assert_eq!(num_numerical_type, 3);
    }
+
+    #[test]
+    fn test_parse_numerical() {
+        assert_eq!(
+            "123".parse::<NumericalValue>().unwrap(),
+            NumericalValue::I64(123)
+        );
+        assert_eq!(
+            "18446744073709551615".parse::<NumericalValue>().unwrap(),
+            NumericalValue::U64(18446744073709551615u64)
+        );
+        assert_eq!(
+            "1.0".parse::<NumericalValue>().unwrap(),
+            NumericalValue::I64(1i64)
+        );
+        assert_eq!(
+            "1.1".parse::<NumericalValue>().unwrap(),
+            NumericalValue::F64(1.1f64)
+        );
+        assert_eq!(
+            "-1.0".parse::<NumericalValue>().unwrap(),
+            NumericalValue::I64(-1i64)
+        );
+    }
+
+    #[test]
+    fn test_normalize_numerical() {
+        assert_eq!(
+            NumericalValue::from(1u64).normalize(),
+            NumericalValue::I64(1i64),
+        );
+        let limit_val = i64::MAX as u64 + 1u64;
+        assert_eq!(
+            NumericalValue::from(limit_val).normalize(),
+            NumericalValue::U64(limit_val),
+        );
+        assert_eq!(
+            NumericalValue::from(-1i64).normalize(),
+            NumericalValue::I64(-1i64),
+        );
+        assert_eq!(
+            NumericalValue::from(-2.0f64).normalize(),
+            NumericalValue::I64(-2i64),
+        );
+        assert_eq!(
+            NumericalValue::from(-2.1f64).normalize(),
+            NumericalValue::F64(-2.1f64),
+        );
+        let large_float = 2.0f64.powf(70.0f64);
+        assert_eq!(
+            NumericalValue::from(large_float).normalize(),
+            NumericalValue::F64(large_float),
+        );
+    }
 }
--- a/common/Cargo.toml
+++ b/common/Cargo.toml
@@ -1,6 +1,6 @@
 [package]
 name = "tantivy-common"
-version = "0.9.0"
+version = "0.10.0"
 authors = ["Paul Masurel <paul@quickwit.io>", "Pascal Seitz <pascal@quickwit.io>"]
 license = "MIT"
 edition = "2024"
--- a/common/src/bitset.rs
+++ b/common/src/bitset.rs
@@ -183,7 +183,7 @@ pub struct BitSet {
 }

 fn num_buckets(max_val: u32) -> u32 {
-    (max_val + 63u32) / 64u32
+    max_val.div_ceil(64u32)
 }

 impl BitSet {
--- a/common/src/vint.rs
+++ b/common/src/vint.rs
@@ -28,7 +28,9 @@ impl BinarySerializable for VIntU128 {
        writer.write_all(&buffer)
    }

+    #[allow(clippy::unbuffered_bytes)]
    fn deserialize<R: Read>(reader: &mut R) -> io::Result<Self> {
+        #[allow(clippy::unbuffered_bytes)]
        let mut bytes = reader.bytes();
        let mut result = 0u128;
        let mut shift = 0u64;
@@ -195,7 +197,9 @@ impl BinarySerializable for VInt {
        writer.write_all(&buffer[0..num_bytes])
    }

+    #[allow(clippy::unbuffered_bytes)]
    fn deserialize<R: Read>(reader: &mut R) -> io::Result<Self> {
+        #[allow(clippy::unbuffered_bytes)]
        let mut bytes = reader.bytes();
        let mut result = 0u64;
        let mut shift = 0u64;
--- a/doc/assets/images/searchbenchmark.png
+++ b/doc/assets/images/searchbenchmark.png
--- a/examples/basic_search.rs
+++ b/examples/basic_search.rs
@@ -208,7 +208,7 @@ fn main() -> tantivy::Result<()> {
    // is the role of the `TopDocs` collector.

    // We can now perform our query.
-    let top_docs = searcher.search(&query, &TopDocs::with_limit(10))?;
+    let top_docs = searcher.search(&query, &TopDocs::with_limit(10).order_by_score())?;

    // The actual documents still need to be
    // retrieved from Tantivy's store.
@@ -226,7 +226,7 @@ fn main() -> tantivy::Result<()> {
    let query = query_parser.parse_query("title:sea^20 body:whale^70")?;

    let (_score, doc_address) = searcher
-        .search(&query, &TopDocs::with_limit(1))?
+        .search(&query, &TopDocs::with_limit(1).order_by_score())?
        .into_iter()
        .next()
        .unwrap();
--- a/examples/custom_tokenizer.rs
+++ b/examples/custom_tokenizer.rs
@@ -100,7 +100,7 @@ fn main() -> tantivy::Result<()> {
    // here we want to get a hit on the 'ken' in Frankenstein
    let query = query_parser.parse_query("ken")?;

-    let top_docs = searcher.search(&query, &TopDocs::with_limit(10))?;
+    let top_docs = searcher.search(&query, &TopDocs::with_limit(10).order_by_score())?;

    for (_, doc_address) in top_docs {
        let retrieved_doc: TantivyDocument = searcher.doc(doc_address)?;
--- a/examples/date_time_field.rs
+++ b/examples/date_time_field.rs
@@ -50,14 +50,14 @@ fn main() -> tantivy::Result<()> {
    {
        // Simple exact search on the date
        let query = query_parser.parse_query("occurred_at:\"2022-06-22T12:53:50.53Z\"")?;
-        let count_docs = searcher.search(&*query, &TopDocs::with_limit(5))?;
+        let count_docs = searcher.search(&*query, &TopDocs::with_limit(5).order_by_score())?;
        assert_eq!(count_docs.len(), 1);
    }
    {
        // Range query on the date field
        let query = query_parser
            .parse_query(r#"occurred_at:[2022-06-22T12:58:00Z TO 2022-06-23T00:00:00Z}"#)?;
-        let count_docs = searcher.search(&*query, &TopDocs::with_limit(4))?;
+        let count_docs = searcher.search(&*query, &TopDocs::with_limit(4).order_by_score())?;
        assert_eq!(count_docs.len(), 1);
        for (_score, doc_address) in count_docs {
            let retrieved_doc = searcher.doc::<TantivyDocument>(doc_address)?;
--- a/examples/deleting_updating_documents.rs
+++ b/examples/deleting_updating_documents.rs
@@ -28,7 +28,7 @@ fn extract_doc_given_isbn(
    // The second argument is here to tell we don't care about decoding positions,
    // or term frequencies.
    let term_query = TermQuery::new(isbn_term.clone(), IndexRecordOption::Basic);
-    let top_docs = searcher.search(&term_query, &TopDocs::with_limit(1))?;
+    let top_docs = searcher.search(&term_query, &TopDocs::with_limit(1).order_by_score())?;

    if let Some((_score, doc_address)) = top_docs.first() {
        let doc = searcher.doc(*doc_address)?;
--- a/examples/filter_aggregation.rs
+++ b/examples/filter_aggregation.rs
@@ -0,0 +1,212 @@
+// # Filter Aggregation Example
+//
+// This example demonstrates filter aggregations - creating buckets of documents
+// matching specific queries, with nested aggregations computed on each bucket.
+//
+// Filter aggregations are useful for computing metrics on different subsets of
+// your data in a single query, like "average price overall + average price for
+// electronics + count of in-stock items".
+
+use serde_json::json;
+use tantivy::aggregation::agg_req::Aggregations;
+use tantivy::aggregation::AggregationCollector;
+use tantivy::query::AllQuery;
+use tantivy::schema::{Schema, FAST, INDEXED, TEXT};
+use tantivy::{doc, Index};
+
+fn main() -> tantivy::Result<()> {
+    // Create a simple product schema
+    let mut schema_builder = Schema::builder();
+    schema_builder.add_text_field("category", TEXT | FAST);
+    schema_builder.add_text_field("brand", TEXT | FAST);
+    schema_builder.add_u64_field("price", FAST);
+    schema_builder.add_f64_field("rating", FAST);
+    schema_builder.add_bool_field("in_stock", FAST | INDEXED);
+    let schema = schema_builder.build();
+
+    // Create index and add sample products
+    let index = Index::create_in_ram(schema.clone());
+    let mut writer = index.writer(50_000_000)?;
+
+    writer.add_document(doc!(
+        schema.get_field("category")? => "electronics",
+        schema.get_field("brand")? => "apple",
+        schema.get_field("price")? => 999u64,
+        schema.get_field("rating")? => 4.5f64,
+        schema.get_field("in_stock")? => true
+    ))?;
+    writer.add_document(doc!(
+        schema.get_field("category")? => "electronics",
+        schema.get_field("brand")? => "samsung",
+        schema.get_field("price")? => 799u64,
+        schema.get_field("rating")? => 4.2f64,
+        schema.get_field("in_stock")? => true
+    ))?;
+    writer.add_document(doc!(
+        schema.get_field("category")? => "clothing",
+        schema.get_field("brand")? => "nike",
+        schema.get_field("price")? => 120u64,
+        schema.get_field("rating")? => 4.1f64,
+        schema.get_field("in_stock")? => false
+    ))?;
+    writer.add_document(doc!(
+        schema.get_field("category")? => "books",
+        schema.get_field("brand")? => "penguin",
+        schema.get_field("price")? => 25u64,
+        schema.get_field("rating")? => 4.8f64,
+        schema.get_field("in_stock")? => true
+    ))?;
+
+    writer.commit()?;
+
+    let reader = index.reader()?;
+    let searcher = reader.searcher();
+
+    // Example 1: Basic filter with metric aggregation
+    println!("=== Example 1: Electronics average price ===");
+    let agg_req = json!({
+        "electronics": {
+            "filter": "category:electronics",
+            "aggs": {
+                "avg_price": { "avg": { "field": "price" } }
+            }
+        }
+    });
+
+    let agg: Aggregations = serde_json::from_value(agg_req)?;
+    let collector = AggregationCollector::from_aggs(agg, Default::default());
+    let result = searcher.search(&AllQuery, &collector)?;
+
+    let expected = json!({
+        "electronics": {
+            "doc_count": 2,
+            "avg_price": { "value": 899.0 }
+        }
+    });
+    assert_eq!(serde_json::to_value(&result)?, expected);
+    println!("{}\n", serde_json::to_string_pretty(&result)?);
+
+    // Example 2: Multiple independent filters
+    println!("=== Example 2: Multiple filters in one query ===");
+    let agg_req = json!({
+        "electronics": {
+            "filter": "category:electronics",
+            "aggs": { "avg_price": { "avg": { "field": "price" } } }
+        },
+        "in_stock": {
+            "filter": "in_stock:true",
+            "aggs": { "count": { "value_count": { "field": "brand" } } }
+        },
+        "high_rated": {
+            "filter": "rating:[4.5 TO *]",
+            "aggs": { "count": { "value_count": { "field": "brand" } } }
+        }
+    });
+
+    let agg: Aggregations = serde_json::from_value(agg_req)?;
+    let collector = AggregationCollector::from_aggs(agg, Default::default());
+    let result = searcher.search(&AllQuery, &collector)?;
+
+    let expected = json!({
+        "electronics": {
+            "doc_count": 2,
+            "avg_price": { "value": 899.0 }
+        },
+        "in_stock": {
+            "doc_count": 3,
+            "count": { "value": 3.0 }
+        },
+        "high_rated": {
+            "doc_count": 2,
+            "count": { "value": 2.0 }
+        }
+    });
+    assert_eq!(serde_json::to_value(&result)?, expected);
+    println!("{}\n", serde_json::to_string_pretty(&result)?);
+
+    // Example 3: Nested filters - progressive refinement
+    println!("=== Example 3: Nested filters ===");
+    let agg_req = json!({
+        "in_stock": {
+            "filter": "in_stock:true",
+            "aggs": {
+                "electronics": {
+                    "filter": "category:electronics",
+                    "aggs": {
+                        "expensive": {
+                            "filter": "price:[800 TO *]",
+                            "aggs": {
+                                "avg_rating": { "avg": { "field": "rating" } }
+                            }
+                        }
+                    }
+                }
+            }
+        }
+    });
+
+    let agg: Aggregations = serde_json::from_value(agg_req)?;
+    let collector = AggregationCollector::from_aggs(agg, Default::default());
+    let result = searcher.search(&AllQuery, &collector)?;
+
+    let expected = json!({
+        "in_stock": {
+            "doc_count": 3,  // apple, samsung, penguin
+            "electronics": {
+                "doc_count": 2,  // apple, samsung
+                "expensive": {
+                    "doc_count": 1,  // only apple (999)
+                    "avg_rating": { "value": 4.5 }
+                }
+            }
+        }
+    });
+    assert_eq!(serde_json::to_value(&result)?, expected);
+    println!("{}\n", serde_json::to_string_pretty(&result)?);
+
+    // Example 4: Filter with sub-aggregation (terms)
+    println!("=== Example 4: Filter with terms sub-aggregation ===");
+    let agg_req = json!({
+        "electronics": {
+            "filter": "category:electronics",
+            "aggs": {
+                "by_brand": {
+                    "terms": { "field": "brand" },
+                    "aggs": {
+                        "avg_price": { "avg": { "field": "price" } }
+                    }
+                }
+            }
+        }
+    });
+
+    let agg: Aggregations = serde_json::from_value(agg_req)?;
+    let collector = AggregationCollector::from_aggs(agg, Default::default());
+    let result = searcher.search(&AllQuery, &collector)?;
+
+    let expected = json!({
+        "electronics": {
+            "doc_count": 2,
+            "by_brand": {
+                "buckets": [
+                    {
+                        "key": "samsung",
+                        "doc_count": 1,
+                        "avg_price": { "value": 799.0 }
+                    },
+                    {
+                        "key": "apple",
+                        "doc_count": 1,
+                        "avg_price": { "value": 999.0 }
+                    }
+                ],
+                "sum_other_doc_count": 0,
+                "doc_count_error_upper_bound": 0
+            }
+        }
+    });
+    assert_eq!(serde_json::to_value(&result)?, expected);
+    println!("{}", serde_json::to_string_pretty(&result)?);
+
+    Ok(())
+}
--- a/examples/fuzzy_search.rs
+++ b/examples/fuzzy_search.rs
@@ -85,7 +85,6 @@ fn main() -> tantivy::Result<()> {
    index_writer.add_document(doc!(
        title => "The Diary of a Young Girl",
    ))?;
-    index_writer.commit()?;

    // ### Committing
    //
@@ -146,7 +145,7 @@ fn main() -> tantivy::Result<()> {
        let query = FuzzyTermQuery::new(term, 2, true);

        let (top_docs, count) = searcher
-            .search(&query, &(TopDocs::with_limit(5), Count))
+            .search(&query, &(TopDocs::with_limit(5).order_by_score(), Count))
            .unwrap();
        assert_eq!(count, 3);
        assert_eq!(top_docs.len(), 3);
--- a/examples/ip_field.rs
+++ b/examples/ip_field.rs
@@ -69,25 +69,25 @@ fn main() -> tantivy::Result<()> {
    {
        // Inclusive range queries
        let query = query_parser.parse_query("ip:[192.168.0.80 TO 192.168.0.100]")?;
-        let count_docs = searcher.search(&*query, &TopDocs::with_limit(5))?;
+        let count_docs = searcher.search(&*query, &TopDocs::with_limit(5).order_by_score())?;
        assert_eq!(count_docs.len(), 1);
    }
    {
        // Exclusive range queries
        let query = query_parser.parse_query("ip:{192.168.0.80 TO 192.168.1.100]")?;
-        let count_docs = searcher.search(&*query, &TopDocs::with_limit(2))?;
+        let count_docs = searcher.search(&*query, &TopDocs::with_limit(2).order_by_score())?;
        assert_eq!(count_docs.len(), 0);
    }
    {
        // Find docs with IP addresses smaller equal 192.168.1.100
        let query = query_parser.parse_query("ip:[* TO 192.168.1.100]")?;
-        let count_docs = searcher.search(&*query, &TopDocs::with_limit(2))?;
+        let count_docs = searcher.search(&*query, &TopDocs::with_limit(2).order_by_score())?;
        assert_eq!(count_docs.len(), 2);
    }
    {
        // Find docs with IP addresses smaller than 192.168.1.100
        let query = query_parser.parse_query("ip:[* TO 192.168.1.100}")?;
-        let count_docs = searcher.search(&*query, &TopDocs::with_limit(2))?;
+        let count_docs = searcher.search(&*query, &TopDocs::with_limit(2).order_by_score())?;
        assert_eq!(count_docs.len(), 2);
    }

--- a/examples/json_field.rs
+++ b/examples/json_field.rs
@@ -59,12 +59,12 @@ fn main() -> tantivy::Result<()> {
    let query_parser = QueryParser::for_index(&index, vec![event_type, attributes]);
    {
        let query = query_parser.parse_query("target:submit-button")?;
-        let count_docs = searcher.search(&*query, &TopDocs::with_limit(2))?;
+        let count_docs = searcher.search(&*query, &TopDocs::with_limit(2).order_by_score())?;
        assert_eq!(count_docs.len(), 2);
    }
    {
        let query = query_parser.parse_query("target:submit")?;
-        let count_docs = searcher.search(&*query, &TopDocs::with_limit(2))?;
+        let count_docs = searcher.search(&*query, &TopDocs::with_limit(2).order_by_score())?;
        assert_eq!(count_docs.len(), 2);
    }
    {
@@ -74,33 +74,33 @@ fn main() -> tantivy::Result<()> {
    }
    {
        let query = query_parser.parse_query("click AND cart.product_id:133")?;
-        let hits = searcher.search(&*query, &TopDocs::with_limit(2))?;
+        let hits = searcher.search(&*query, &TopDocs::with_limit(2).order_by_score())?;
        assert_eq!(hits.len(), 1);
    }
    {
        // The sub-fields in the json field marked as default field still need to be explicitly
        // addressed
        let query = query_parser.parse_query("click AND 133")?;
-        let hits = searcher.search(&*query, &TopDocs::with_limit(2))?;
+        let hits = searcher.search(&*query, &TopDocs::with_limit(2).order_by_score())?;
        assert_eq!(hits.len(), 0);
    }
    {
        // Default json fields are ignored if they collide with the schema
        let query = query_parser.parse_query("event_type:holiday-sale")?;
-        let hits = searcher.search(&*query, &TopDocs::with_limit(2))?;
+        let hits = searcher.search(&*query, &TopDocs::with_limit(2).order_by_score())?;
        assert_eq!(hits.len(), 0);
    }
    // # Query via full attribute path
    {
        // This only searches in our schema's `event_type` field
        let query = query_parser.parse_query("event_type:click")?;
-        let hits = searcher.search(&*query, &TopDocs::with_limit(2))?;
+        let hits = searcher.search(&*query, &TopDocs::with_limit(2).order_by_score())?;
        assert_eq!(hits.len(), 2);
    }
    {
        // Default json fields can still be accessed by full path
        let query = query_parser.parse_query("attributes.event_type:holiday-sale")?;
-        let hits = searcher.search(&*query, &TopDocs::with_limit(2))?;
+        let hits = searcher.search(&*query, &TopDocs::with_limit(2).order_by_score())?;
        assert_eq!(hits.len(), 1);
    }
    Ok(())
--- a/examples/phrase_prefix_search.rs
+++ b/examples/phrase_prefix_search.rs
@@ -63,7 +63,7 @@ fn main() -> Result<()> {
    // but not "in the Gulf Stream".
    let query = query_parser.parse_query("\"in the su\"*")?;

-    let top_docs = searcher.search(&query, &TopDocs::with_limit(10))?;
+    let top_docs = searcher.search(&query, &TopDocs::with_limit(10).order_by_score())?;
    let mut titles = top_docs
        .into_iter()
        .map(|(_score, doc_address)| {
--- a/examples/pre_tokenized_text.rs
+++ b/examples/pre_tokenized_text.rs
@@ -107,7 +107,8 @@ fn main() -> tantivy::Result<()> {
        IndexRecordOption::Basic,
    );

-    let (top_docs, count) = searcher.search(&query, &(TopDocs::with_limit(2), Count))?;
+    let (top_docs, count) =
+        searcher.search(&query, &(TopDocs::with_limit(2).order_by_score(), Count))?;

    assert_eq!(count, 2);

@@ -128,7 +129,8 @@ fn main() -> tantivy::Result<()> {
        IndexRecordOption::Basic,
    );

-    let (_top_docs, count) = searcher.search(&query, &(TopDocs::with_limit(2), Count))?;
+    let (_top_docs, count) =
+        searcher.search(&query, &(TopDocs::with_limit(2).order_by_score(), Count))?;

    assert_eq!(count, 0);

--- a/examples/snippet.rs
+++ b/examples/snippet.rs
@@ -50,7 +50,7 @@ fn main() -> tantivy::Result<()> {
    let query_parser = QueryParser::for_index(&index, vec![title, body]);
    let query = query_parser.parse_query("sycamore spring")?;

-    let top_docs = searcher.search(&query, &TopDocs::with_limit(10))?;
+    let top_docs = searcher.search(&query, &TopDocs::with_limit(10).order_by_score())?;

    let snippet_generator = SnippetGenerator::create(&searcher, &*query, body)?;

--- a/examples/stop_words.rs
+++ b/examples/stop_words.rs
@@ -102,7 +102,7 @@ fn main() -> tantivy::Result<()> {
    // stop words are applied on the query as well.
    // The following will be equivalent to `title:frankenstein`
    let query = query_parser.parse_query("title:\"the Frankenstein\"")?;
-    let top_docs = searcher.search(&query, &TopDocs::with_limit(10))?;
+    let top_docs = searcher.search(&query, &TopDocs::with_limit(10).order_by_score())?;

    for (score, doc_address) in top_docs {
        let retrieved_doc: TantivyDocument = searcher.doc(doc_address)?;
--- a/examples/warmer.rs
+++ b/examples/warmer.rs
@@ -164,7 +164,7 @@ fn main() -> tantivy::Result<()> {
        move |doc_id: DocId| Reverse(price[doc_id as usize])
    };

-    let most_expensive_first = TopDocs::with_limit(10).custom_score(score_by_price);
+    let most_expensive_first = TopDocs::with_limit(10).order_by(score_by_price);

    let hits = searcher.search(&query, &most_expensive_first)?;
    assert_eq!(
--- a/query-grammar/Cargo.toml
+++ b/query-grammar/Cargo.toml
@@ -1,6 +1,6 @@
 [package]
 name = "tantivy-query-grammar"
-version = "0.24.0"
+version = "0.25.0"
 authors = ["Paul Masurel <paul.masurel@gmail.com>"]
 license = "MIT"
 categories = ["database-implementations", "data-structures"]
@@ -15,3 +15,5 @@ edition = "2024"
 nom = "7"
 serde = { version = "1.0.219", features = ["derive"] }
 serde_json = "1.0.140"
+ordered-float = "5.0.0"
+fnv = "1.0.7"
--- a/query-grammar/src/infallible.rs
+++ b/query-grammar/src/infallible.rs
@@ -117,6 +117,22 @@ where F: nom::Parser<I, (O, ErrorList), Infallible> {
    }
 }

+pub(crate) fn terminated_infallible<I, O1, O2, F, G>(
+    mut first: F,
+    mut second: G,
+) -> impl FnMut(I) -> JResult<I, O1>
+where
+    F: nom::Parser<I, (O1, ErrorList), Infallible>,
+    G: nom::Parser<I, (O2, ErrorList), Infallible>,
+{
+    move |input: I| {
+        let (input, (o1, mut err)) = first.parse(input)?;
+        let (input, (_, mut err2)) = second.parse(input)?;
+        err.append(&mut err2);
+        Ok((input, (o1, err)))
+    }
+}
+
 pub(crate) fn delimited_infallible<I, O1, O2, O3, F, G, H>(
    mut first: F,
    mut second: G,
--- a/query-grammar/src/lib.rs
+++ b/query-grammar/src/lib.rs
@@ -31,7 +31,17 @@ pub fn parse_query_lenient(query: &str) -> (UserInputAst, Vec<LenientError>) {

 #[cfg(test)]
 mod tests {
-    use crate::{parse_query, parse_query_lenient};
+    use crate::{UserInputAst, parse_query, parse_query_lenient};
+
+    #[test]
+    fn test_deduplication() {
+        let ast: UserInputAst = parse_query("a a").unwrap();
+        let json = serde_json::to_string(&ast).unwrap();
+        assert_eq!(
+            json,
+            r#"{"type":"bool","clauses":[[null,{"type":"literal","field_name":null,"phrase":"a","delimiter":"none","slop":0,"prefix":false}]]}"#
+        );
+    }

    #[test]
    fn test_parse_query_serialization() {
--- a/query-grammar/src/query_grammar.rs
+++ b/query-grammar/src/query_grammar.rs
@@ -1,6 +1,7 @@
 use std::borrow::Cow;
 use std::iter::once;

+use fnv::FnvHashSet;
 use nom::IResult;
 use nom::branch::alt;
 use nom::bytes::complete::tag;
@@ -68,7 +69,7 @@ fn interpret_escape(source: &str) -> String {

 /// Consume a word outside of any context.
 // TODO should support escape sequences
-fn word(inp: &str) -> IResult<&str, Cow<str>> {
+fn word(inp: &str) -> IResult<&str, Cow<'_, str>> {
    map_res(
        recognize(tuple((
            alt((
@@ -305,15 +306,14 @@ fn term_group_infallible(inp: &str) -> JResult<&str, UserInputAst> {
    let (inp, (field_name, _, _, _)) =
        tuple((field_name, multispace0, char('('), multispace0))(inp).expect("precondition failed");

-    let res = delimited_infallible(
+    delimited_infallible(
        nothing,
        map(ast_infallible, |(mut ast, errors)| {
            ast.set_default_field(field_name.to_string());
            (ast, errors)
        }),
        opt_i_err(char(')'), "expected ')'"),
-    )(inp);
-    res
+    )(inp)
 }

 fn exists(inp: &str) -> IResult<&str, UserInputLeaf> {
@@ -367,7 +367,10 @@ fn literal(inp: &str) -> IResult<&str, UserInputAst> {
    // something (a field name) got parsed before
    alt((
        map(
-            tuple((opt(field_name), alt((range, set, exists, term_or_phrase)))),
+            tuple((
+                opt(field_name),
+                alt((range, set, exists, regex, term_or_phrase)),
+            )),
            |(field_name, leaf): (Option<String>, UserInputLeaf)| leaf.set_field(field_name).into(),
        ),
        term_group,
@@ -389,6 +392,10 @@ fn literal_no_group_infallible(inp: &str) -> JResult<&str, Option<UserInputAst>>
                        value((), peek(one_of("{[><"))),
                        map(range_infallible, |(range, errs)| (Some(range), errs)),
                    ),
+                    (
+                        value((), peek(one_of("/"))),
+                        map(regex_infallible, |(regex, errs)| (Some(regex), errs)),
+                    ),
                ),
                delimited_infallible(space0_infallible, term_or_phrase_infallible, nothing),
            ),
@@ -689,6 +696,61 @@ fn set_infallible(mut inp: &str) -> JResult<&str, UserInputLeaf> {
    }
 }

+fn regex(inp: &str) -> IResult<&str, UserInputLeaf> {
+    map(
+        terminated(
+            delimited(
+                char('/'),
+                many1(alt((preceded(char('\\'), char('/')), none_of("/")))),
+                char('/'),
+            ),
+            peek(alt((multispace1, eof))),
+        ),
+        |elements| UserInputLeaf::Regex {
+            field: None,
+            pattern: elements.into_iter().collect::<String>(),
+        },
+    )(inp)
+}
+
+fn regex_infallible(inp: &str) -> JResult<&str, UserInputLeaf> {
+    match terminated_infallible(
+        delimited_infallible(
+            opt_i_err(char('/'), "missing delimiter /"),
+            opt_i(many1(alt((preceded(char('\\'), char('/')), none_of("/"))))),
+            opt_i_err(char('/'), "missing delimiter /"),
+        ),
+        opt_i_err(
+            peek(alt((multispace1, eof))),
+            "expected whitespace or end of input",
+        ),
+    )(inp)
+    {
+        Ok((rest, (elements_part, errors))) => {
+            let pattern = match elements_part {
+                Some(elements_part) => elements_part.into_iter().collect(),
+                None => String::new(),
+            };
+            let res = UserInputLeaf::Regex {
+                field: None,
+                pattern,
+            };
+            Ok((rest, (res, errors)))
+        }
+        Err(e) => {
+            let errs = vec![LenientErrorInternal {
+                pos: inp.len(),
+                message: e.to_string(),
+            }];
+            let res = UserInputLeaf::Regex {
+                field: None,
+                pattern: String::new(),
+            };
+            Ok((inp, (res, errs)))
+        }
+    }
+}
+
 fn negate(expr: UserInputAst) -> UserInputAst {
    expr.unary(Occur::MustNot)
 }
@@ -696,7 +758,17 @@ fn negate(expr: UserInputAst) -> UserInputAst {
 fn leaf(inp: &str) -> IResult<&str, UserInputAst> {
    alt((
        delimited(char('('), ast, char(')')),
-        map(char('*'), |_| UserInputAst::from(UserInputLeaf::All)),
+        map(
+            terminated(
+                char('*'),
+                peek(alt((
+                    value((), multispace1),
+                    value((), char(')')),
+                    value((), eof),
+                ))),
+            ),
+            |_| UserInputAst::from(UserInputLeaf::All),
+        ),
        map(preceded(tuple((tag("NOT"), multispace1)), leaf), negate),
        literal,
    ))(inp)
@@ -717,7 +789,17 @@ fn leaf_infallible(inp: &str) -> JResult<&str, Option<UserInputAst>> {
                ),
            ),
            (
-                value((), char('*')),
+                value(
+                    (),
+                    terminated(
+                        char('*'),
+                        peek(alt((
+                            value((), multispace1),
+                            value((), char(')')),
+                            value((), eof),
+                        ))),
+                    ),
+                ),
                map(nothing, |_| {
                    (Some(UserInputAst::from(UserInputLeaf::All)), Vec::new())
                }),
@@ -753,7 +835,7 @@ fn boosted_leaf(inp: &str) -> IResult<&str, UserInputAst> {
        tuple((leaf, fallible(boost))),
        |(leaf, boost_opt)| match boost_opt {
            Some(boost) if (boost - 1.0).abs() > f64::EPSILON => {
-                UserInputAst::Boost(Box::new(leaf), boost)
+                UserInputAst::Boost(Box::new(leaf), boost.into())
            }
            _ => leaf,
        },
@@ -765,7 +847,7 @@ fn boosted_leaf_infallible(inp: &str) -> JResult<&str, Option<UserInputAst>> {
        tuple_infallible((leaf_infallible, boost)),
        |((leaf, boost_opt), error)| match boost_opt {
            Some(boost) if (boost - 1.0).abs() > f64::EPSILON => (
-                leaf.map(|leaf| UserInputAst::Boost(Box::new(leaf), boost)),
+                leaf.map(|leaf| UserInputAst::Boost(Box::new(leaf), boost.into())),
                error,
            ),
            _ => (leaf, error),
@@ -1016,12 +1098,25 @@ pub fn parse_to_ast_lenient(query_str: &str) -> (UserInputAst, Vec<LenientError>
    (rewrite_ast(res), errors)
 }

-/// Removes unnecessary children clauses in AST
-///
-/// Motivated by [issue #1433](https://github.com/quickwit-oss/tantivy/issues/1433)
 fn rewrite_ast(mut input: UserInputAst) -> UserInputAst {
-    if let UserInputAst::Clause(terms) = &mut input {
-        for term in terms {
+    if let UserInputAst::Clause(sub_clauses) = &mut input {
+        // call rewrite_ast recursively on children clauses if applicable
+        let mut new_clauses = Vec::with_capacity(sub_clauses.len());
+        for (occur, clause) in sub_clauses.drain(..) {
+            let rewritten_clause = rewrite_ast(clause);
+            new_clauses.push((occur, rewritten_clause));
+        }
+        *sub_clauses = new_clauses;
+
+        // remove duplicate child clauses
+        // e.g. (+a +b) OR (+c +d) OR (+a +b)  => (+a +b) OR (+c +d)
+        let mut seen = FnvHashSet::default();
+        sub_clauses.retain(|term| seen.insert(term.clone()));
+
+        // Removes unnecessary children clauses in AST
+        //
+        // Motivated by [issue #1433](https://github.com/quickwit-oss/tantivy/issues/1433)
+        for term in sub_clauses {
            rewrite_ast_clause(term);
        }
    }
@@ -1596,6 +1691,21 @@ mod test {
        test_parse_query_to_ast_helper("abc:a b", "(*\"abc\":a *b)");
        test_parse_query_to_ast_helper("abc:\"a b\"", "\"abc\":\"a b\"");
        test_parse_query_to_ast_helper("foo:[1 TO 5]", "\"foo\":[\"1\" TO \"5\"]");
+
+        // Phrase prefixed with *
+        test_parse_query_to_ast_helper("foo:(*A)", "\"foo\":*A");
+        test_parse_query_to_ast_helper("*A", "*A");
+        test_parse_query_to_ast_helper("(*A)", "*A");
+        test_parse_query_to_ast_helper("foo:(A OR B)", "(?\"foo\":A ?\"foo\":B)");
+        test_parse_query_to_ast_helper("foo:(A* OR B*)", "(?\"foo\":A* ?\"foo\":B*)");
+        test_parse_query_to_ast_helper("foo:(*A OR *B)", "(?\"foo\":*A ?\"foo\":*B)");
+    }
+
+    #[test]
+    fn test_parse_query_all() {
+        test_parse_query_to_ast_helper("*", "*");
+        test_parse_query_to_ast_helper("(*)", "*");
+        test_parse_query_to_ast_helper("(* )", "*");
    }

    #[test]
@@ -1694,6 +1804,63 @@ mod test {
        test_is_parse_err(r#"!bc:def"#, "!bc:def");
    }

+    #[test]
+    fn test_regex_parser() {
+        let r = parse_to_ast(r#"a:/joh?n(ath[oa]n)/"#);
+        assert!(r.is_ok(), "Failed to parse custom query: {r:?}");
+        let (_, input) = r.unwrap();
+        match input {
+            UserInputAst::Leaf(leaf) => match leaf.as_ref() {
+                UserInputLeaf::Regex { field, pattern } => {
+                    assert_eq!(field, &Some("a".to_string()));
+                    assert_eq!(pattern, "joh?n(ath[oa]n)");
+                }
+                _ => panic!("Expected a regex leaf, got {leaf:?}"),
+            },
+            _ => panic!("Expected a leaf"),
+        }
+        let r = parse_to_ast(r#"a:/\\/cgi-bin\\/luci.*/"#);
+        assert!(r.is_ok(), "Failed to parse custom query: {r:?}");
+        let (_, input) = r.unwrap();
+        match input {
+            UserInputAst::Leaf(leaf) => match leaf.as_ref() {
+                UserInputLeaf::Regex { field, pattern } => {
+                    assert_eq!(field, &Some("a".to_string()));
+                    assert_eq!(pattern, "\\/cgi-bin\\/luci.*");
+                }
+                _ => panic!("Expected a regex leaf, got {leaf:?}"),
+            },
+            _ => panic!("Expected a leaf"),
+        }
+    }
+
+    #[test]
+    fn test_regex_parser_lenient() {
+        let literal = |query| literal_infallible(query).unwrap().1;
+
+        let (res, errs) = literal(r#"a:/joh?n(ath[oa]n)/"#);
+        let expected = UserInputLeaf::Regex {
+            field: Some("a".to_string()),
+            pattern: "joh?n(ath[oa]n)".to_string(),
+        }
+        .into();
+        assert_eq!(res.unwrap(), expected);
+        assert!(errs.is_empty(), "Expected no errors, got: {errs:?}");
+
+        let (res, errs) = literal("title:/joh?n(ath[oa]n)");
+        let expected = UserInputLeaf::Regex {
+            field: Some("title".to_string()),
+            pattern: "joh?n(ath[oa]n)".to_string(),
+        }
+        .into();
+        assert_eq!(res.unwrap(), expected);
+        assert_eq!(errs.len(), 1, "Expected 1 error, got: {errs:?}");
+        assert_eq!(
+            errs[0].message, "missing delimiter /",
+            "Unexpected error message",
+        );
+    }
+
    #[test]
    fn test_space_before_value() {
        test_parse_query_to_ast_helper("field : a", r#""field":a"#);
--- a/query-grammar/src/user_input_ast.rs
+++ b/query-grammar/src/user_input_ast.rs
@@ -5,7 +5,7 @@ use serde::Serialize;

 use crate::Occur;

-#[derive(PartialEq, Clone, Serialize)]
+#[derive(PartialEq, Eq, Hash, Clone, Serialize)]
 #[serde(tag = "type")]
 #[serde(rename_all = "snake_case")]
 pub enum UserInputLeaf {
@@ -23,6 +23,10 @@ pub enum UserInputLeaf {
    Exists {
        field: String,
    },
+    Regex {
+        field: Option<String>,
+        pattern: String,
+    },
 }

 impl UserInputLeaf {
@@ -46,6 +50,7 @@ impl UserInputLeaf {
            UserInputLeaf::Exists { field: _ } => UserInputLeaf::Exists {
                field: field.expect("Exist query without a field isn't allowed"),
            },
+            UserInputLeaf::Regex { field: _, pattern } => UserInputLeaf::Regex { field, pattern },
        }
    }

@@ -103,11 +108,19 @@ impl Debug for UserInputLeaf {
            UserInputLeaf::Exists { field } => {
                write!(formatter, "$exists(\"{field}\")")
            }
+            UserInputLeaf::Regex { field, pattern } => {
+                if let Some(field) = field {
+                    // TODO properly escape field (in case of \")
+                    write!(formatter, "\"{field}\":")?;
+                }
+                // TODO properly escape pattern (in case of \")
+                write!(formatter, "/{pattern}/")
+            }
        }
    }
 }

-#[derive(Copy, Clone, Eq, PartialEq, Debug, Serialize)]
+#[derive(Copy, Clone, Eq, PartialEq, Hash, Debug, Serialize)]
 #[serde(rename_all = "snake_case")]
 pub enum Delimiter {
    SingleQuotes,
@@ -115,7 +128,7 @@ pub enum Delimiter {
    None,
 }

-#[derive(PartialEq, Clone, Serialize)]
+#[derive(PartialEq, Eq, Hash, Clone, Serialize)]
 #[serde(rename_all = "snake_case")]
 pub struct UserInputLiteral {
    pub field_name: Option<String>,
@@ -154,7 +167,7 @@ impl fmt::Debug for UserInputLiteral {
    }
 }

-#[derive(PartialEq, Debug, Clone, Serialize)]
+#[derive(PartialEq, Eq, Hash, Debug, Clone, Serialize)]
 #[serde(tag = "type", content = "value")]
 #[serde(rename_all = "snake_case")]
 pub enum UserInputBound {
@@ -191,11 +204,11 @@ impl UserInputBound {
    }
 }

-#[derive(PartialEq, Clone, Serialize)]
+#[derive(PartialEq, Eq, Hash, Clone, Serialize)]
 #[serde(into = "UserInputAstSerde")]
 pub enum UserInputAst {
    Clause(Vec<(Option<Occur>, UserInputAst)>),
-    Boost(Box<UserInputAst>, f64),
+    Boost(Box<UserInputAst>, ordered_float::OrderedFloat<f64>),
    Leaf(Box<UserInputLeaf>),
 }

@@ -217,9 +230,10 @@ impl From<UserInputAst> for UserInputAstSerde {
    fn from(ast: UserInputAst) -> Self {
        match ast {
            UserInputAst::Clause(clause) => UserInputAstSerde::Bool { clauses: clause },
-            UserInputAst::Boost(underlying, boost) => {
-                UserInputAstSerde::Boost { underlying, boost }
-            }
+            UserInputAst::Boost(underlying, boost) => UserInputAstSerde::Boost {
+                underlying,
+                boost: boost.into_inner(),
+            },
            UserInputAst::Leaf(leaf) => UserInputAstSerde::Leaf(leaf),
        }
    }
@@ -378,7 +392,7 @@ mod tests {
    #[test]
    fn test_boost_serialization() {
        let inner_ast = UserInputAst::Leaf(Box::new(UserInputLeaf::All));
-        let boost_ast = UserInputAst::Boost(Box::new(inner_ast), 2.5);
+        let boost_ast = UserInputAst::Boost(Box::new(inner_ast), 2.5.into());
        let json = serde_json::to_string(&boost_ast).unwrap();
        assert_eq!(
            json,
@@ -405,7 +419,7 @@ mod tests {
                    }))),
                ),
            ])),
-            2.5,
+            2.5.into(),
        );
        let json = serde_json::to_string(&boost_ast).unwrap();
        assert_eq!(
--- a/src/aggregation/README.md
+++ b/src/aggregation/README.md
@@ -20,17 +20,16 @@ Contains all metric aggregations, like average aggregation. Metric aggregations
 #### agg_req
 agg_req contains the users aggregation request. Deserialization from json is compatible with elasticsearch aggregation requests.

-#### agg_req_with_accessor
-agg_req_with_accessor contains the users aggregation request enriched with fast field accessors etc, which are
+#### agg_data
+agg_data contains the users aggregation request enriched with fast field accessors etc, which are
 used during collection.

 #### segment_agg_result
 segment_agg_result contains the aggregation result tree, which is used for collection of a segment.
-The tree from agg_req_with_accessor is passed during collection.
+agg_data is passed during collection.

 #### intermediate_agg_result
 intermediate_agg_result contains the aggregation tree for merging with other trees.

 #### agg_result
 agg_result contains the final aggregation tree.
-
--- a/src/aggregation/accessor_helpers.rs
+++ b/src/aggregation/accessor_helpers.rs
@@ -0,0 +1,105 @@
+//! This will enhance the request tree with access to the fastfield and metadata.
+
+use std::io;
+
+use columnar::{Column, ColumnType};
+
+use crate::aggregation::{f64_to_fastfield_u64, Key};
+use crate::index::SegmentReader;
+
+/// Get the missing value as internal u64 representation
+///
+/// For terms we use u64::MAX as sentinel value
+/// For numerical data we convert the value into the representation
+/// we would get from the fast field, when we open it as u64_lenient_for_type.
+///
+/// That way we can use it the same way as if it would come from the fastfield.
+pub(crate) fn get_missing_val_as_u64_lenient(
+    column_type: ColumnType,
+    column_max_value: u64,
+    missing: &Key,
+    field_name: &str,
+) -> crate::Result<Option<u64>> {
+    let missing_val = match missing {
+        Key::Str(_) if column_type == ColumnType::Str => Some(column_max_value + 1),
+        // Allow fallback to number on text fields
+        Key::F64(_) if column_type == ColumnType::Str => Some(column_max_value + 1),
+        Key::U64(_) if column_type == ColumnType::Str => Some(column_max_value + 1),
+        Key::I64(_) if column_type == ColumnType::Str => Some(column_max_value + 1),
+        Key::F64(val) if column_type.numerical_type().is_some() => {
+            f64_to_fastfield_u64(*val, &column_type)
+        }
+        // NOTE: We may loose precision of the passed missing value by casting i64 and u64 to f64.
+        Key::I64(val) if column_type.numerical_type().is_some() => {
+            f64_to_fastfield_u64(*val as f64, &column_type)
+        }
+        Key::U64(val) if column_type.numerical_type().is_some() => {
+            f64_to_fastfield_u64(*val as f64, &column_type)
+        }
+        _ => {
+            return Err(crate::TantivyError::InvalidArgument(format!(
+                "Missing value {missing:?} for field {field_name} is not supported for column \
+                 type {column_type:?}"
+            )));
+        }
+    };
+    Ok(missing_val)
+}
+
+pub(crate) fn get_numeric_or_date_column_types() -> &'static [ColumnType] {
+    &[
+        ColumnType::F64,
+        ColumnType::U64,
+        ColumnType::I64,
+        ColumnType::DateTime,
+    ]
+}
+
+/// Get fast field reader or empty as default.
+pub(crate) fn get_ff_reader(
+    reader: &SegmentReader,
+    field_name: &str,
+    allowed_column_types: Option<&[ColumnType]>,
+) -> crate::Result<(columnar::Column<u64>, ColumnType)> {
+    let ff_fields = reader.fast_fields();
+    let ff_field_with_type = ff_fields
+        .u64_lenient_for_type(allowed_column_types, field_name)?
+        .unwrap_or_else(|| {
+            (
+                Column::build_empty_column(reader.num_docs()),
+                ColumnType::U64,
+            )
+        });
+    Ok(ff_field_with_type)
+}
+
+pub(crate) fn get_dynamic_columns(
+    reader: &SegmentReader,
+    field_name: &str,
+) -> crate::Result<Vec<columnar::DynamicColumn>> {
+    let ff_fields = reader.fast_fields().dynamic_column_handles(field_name)?;
+    let cols = ff_fields
+        .iter()
+        .map(|h| h.open())
+        .collect::<io::Result<_>>()?;
+    assert!(!ff_fields.is_empty(), "field {field_name} not found");
+    Ok(cols)
+}
+
+/// Get all fast field reader or empty as default.
+///
+/// Is guaranteed to return at least one column.
+pub(crate) fn get_all_ff_reader_or_empty(
+    reader: &SegmentReader,
+    field_name: &str,
+    allowed_column_types: Option<&[ColumnType]>,
+    fallback_type: ColumnType,
+) -> crate::Result<Vec<(columnar::Column<u64>, ColumnType)>> {
+    let ff_fields = reader.fast_fields();
+    let mut ff_field_with_type =
+        ff_fields.u64_lenient_for_type_all(allowed_column_types, field_name)?;
+    if ff_field_with_type.is_empty() {
+        ff_field_with_type.push((Column::build_empty_column(reader.num_docs()), fallback_type));
+    }
+    Ok(ff_field_with_type)
+}
--- a/src/aggregation/agg_data.rs
+++ b/src/aggregation/agg_data.rs
--- a/src/aggregation/agg_limits.rs
+++ b/src/aggregation/agg_limits.rs
@@ -35,6 +35,7 @@ pub struct AggregationLimitsGuard {
    /// Allocated memory with this guard.
    allocated_with_the_guard: u64,
 }
+
 impl Clone for AggregationLimitsGuard {
    fn clone(&self) -> Self {
        Self {
@@ -70,7 +71,7 @@ impl AggregationLimitsGuard {
    /// *memory_limit*
    /// memory_limit is defined in bytes.
    /// Aggregation fails when the estimated memory consumption of the aggregation is higher than
-    /// memory_limit.     
+    /// memory_limit.
    /// memory_limit will default to `DEFAULT_MEMORY_LIMIT` (500MB)
    ///
    /// *bucket_limit*
--- a/src/aggregation/agg_req.rs
+++ b/src/aggregation/agg_req.rs
@@ -26,12 +26,14 @@
 //! let _agg_req: Aggregations = serde_json::from_str(elasticsearch_compatible_json_req).unwrap();
 //! ```

-use std::collections::{HashMap, HashSet};
+use std::collections::HashSet;

+use rustc_hash::FxHashMap;
 use serde::{Deserialize, Serialize};

 use super::bucket::{
-    DateHistogramAggregationReq, HistogramAggregation, RangeAggregation, TermsAggregation,
+    DateHistogramAggregationReq, FilterAggregation, HistogramAggregation, RangeAggregation,
+    TermsAggregation,
 };
 use super::metric::{
    AverageAggregation, CardinalityAggregationReq, CountAggregation, ExtendedStatsAggregation,
@@ -43,7 +45,7 @@ use super::metric::{
 /// defined names. It is also used in buckets aggregations to define sub-aggregations.
 ///
 /// The key is the user defined name of the aggregation.
-pub type Aggregations = HashMap<String, Aggregation>;
+pub type Aggregations = FxHashMap<String, Aggregation>;

 /// Aggregation request.
 ///
@@ -129,6 +131,9 @@ pub enum AggregationVariants {
    /// Put data into buckets of terms.
    #[serde(rename = "terms")]
    Terms(TermsAggregation),
+    /// Filter documents into a single bucket.
+    #[serde(rename = "filter")]
+    Filter(FilterAggregation),

    // Metric aggregation types
    /// Computes the average of the extracted values.
@@ -174,6 +179,7 @@ impl AggregationVariants {
            AggregationVariants::Range(range) => vec![range.field.as_str()],
            AggregationVariants::Histogram(histogram) => vec![histogram.field.as_str()],
            AggregationVariants::DateHistogram(histogram) => vec![histogram.field.as_str()],
+            AggregationVariants::Filter(filter) => filter.get_fast_field_names(),
            AggregationVariants::Average(avg) => vec![avg.field_name()],
            AggregationVariants::Count(count) => vec![count.field_name()],
            AggregationVariants::Max(max) => vec![max.field_name()],
@@ -208,13 +214,6 @@ impl AggregationVariants {
            _ => None,
        }
    }
-    pub(crate) fn as_top_hits(&self) -> Option<&TopHitsAggregationReq> {
-        match &self {
-            AggregationVariants::TopHits(top_hits) => Some(top_hits),
-            _ => None,
-        }
-    }
-
    pub(crate) fn as_percentile(&self) -> Option<&PercentilesAggregationReq> {
        match &self {
            AggregationVariants::Percentiles(percentile_req) => Some(percentile_req),
--- a/src/aggregation/agg_req_with_accessor.rs
+++ b/src/aggregation/agg_req_with_accessor.rs
@@ -1,471 +0,0 @@
-//! This will enhance the request tree with access to the fastfield and metadata.
-
-use std::collections::HashMap;
-use std::io;
-
-use columnar::{Column, ColumnBlockAccessor, ColumnType, DynamicColumn, StrColumn};
-
-use super::agg_req::{Aggregation, AggregationVariants, Aggregations};
-use super::bucket::{
-    DateHistogramAggregationReq, HistogramAggregation, RangeAggregation, TermsAggregation,
-};
-use super::metric::{
-    AverageAggregation, CardinalityAggregationReq, CountAggregation, ExtendedStatsAggregation,
-    MaxAggregation, MinAggregation, StatsAggregation, SumAggregation,
-};
-use super::segment_agg_result::AggregationLimitsGuard;
-use super::VecWithNames;
-use crate::aggregation::{f64_to_fastfield_u64, Key};
-use crate::index::SegmentReader;
-use crate::SegmentOrdinal;
-
-#[derive(Default)]
-pub(crate) struct AggregationsWithAccessor {
-    pub aggs: VecWithNames<AggregationWithAccessor>,
-}
-
-impl AggregationsWithAccessor {
-    fn from_data(aggs: VecWithNames<AggregationWithAccessor>) -> Self {
-        Self { aggs }
-    }
-
-    pub fn is_empty(&self) -> bool {
-        self.aggs.is_empty()
-    }
-}
-
-pub struct AggregationWithAccessor {
-    pub(crate) segment_ordinal: SegmentOrdinal,
-    /// In general there can be buckets without fast field access, e.g. buckets that are created
-    /// based on search terms. That is not that case currently, but eventually this needs to be
-    /// Option or moved.
-    pub(crate) accessor: Column<u64>,
-    /// Load insert u64 for missing use case
-    pub(crate) missing_value_for_accessor: Option<u64>,
-    pub(crate) str_dict_column: Option<StrColumn>,
-    pub(crate) field_type: ColumnType,
-    pub(crate) sub_aggregation: AggregationsWithAccessor,
-    pub(crate) limits: AggregationLimitsGuard,
-    pub(crate) column_block_accessor: ColumnBlockAccessor<u64>,
-    /// Used for missing term aggregation, which checks all columns for existence.
-    /// And also for `top_hits` aggregation, which may sort on multiple fields.
-    /// By convention the missing aggregation is chosen, when this property is set
-    /// (instead bein set in `agg`).
-    /// If this needs to used by other aggregations, we need to refactor this.
-    // NOTE: we can make all other aggregations use this instead of the `accessor` and `field_type`
-    // (making them obsolete) But will it have a performance impact?
-    pub(crate) accessors: Vec<(Column<u64>, ColumnType)>,
-    /// Map field names to all associated column accessors.
-    /// This field is used for `docvalue_fields`, which is currently only supported for `top_hits`.
-    pub(crate) value_accessors: HashMap<String, Vec<DynamicColumn>>,
-    pub(crate) agg: Aggregation,
-}
-
-impl AggregationWithAccessor {
-    /// May return multiple accessors if the aggregation is e.g. on mixed field types.
-    fn try_from_agg(
-        agg: &Aggregation,
-        sub_aggregation: &Aggregations,
-        reader: &SegmentReader,
-        segment_ordinal: SegmentOrdinal,
-        limits: AggregationLimitsGuard,
-    ) -> crate::Result<Vec<AggregationWithAccessor>> {
-        let mut agg = agg.clone();
-
-        let add_agg_with_accessor = |agg: &Aggregation,
-                                     accessor: Column<u64>,
-                                     column_type: ColumnType,
-                                     aggs: &mut Vec<AggregationWithAccessor>|
-         -> crate::Result<()> {
-            let res = AggregationWithAccessor {
-                segment_ordinal,
-                accessor,
-                accessors: Default::default(),
-                value_accessors: Default::default(),
-                field_type: column_type,
-                sub_aggregation: get_aggs_with_segment_accessor_and_validate(
-                    sub_aggregation,
-                    reader,
-                    segment_ordinal,
-                    &limits,
-                )?,
-                agg: agg.clone(),
-                limits: limits.clone(),
-                missing_value_for_accessor: None,
-                str_dict_column: None,
-                column_block_accessor: Default::default(),
-            };
-            aggs.push(res);
-            Ok(())
-        };
-
-        let add_agg_with_accessors = |agg: &Aggregation,
-                                      accessors: Vec<(Column<u64>, ColumnType)>,
-                                      aggs: &mut Vec<AggregationWithAccessor>,
-                                      value_accessors: HashMap<String, Vec<DynamicColumn>>|
-         -> crate::Result<()> {
-            let (accessor, field_type) = accessors.first().expect("at least one accessor");
-            let limits = limits.clone();
-            let res = AggregationWithAccessor {
-                segment_ordinal,
-                // TODO: We should do away with the `accessor` field altogether
-                accessor: accessor.clone(),
-                value_accessors,
-                field_type: *field_type,
-                accessors,
-                sub_aggregation: get_aggs_with_segment_accessor_and_validate(
-                    sub_aggregation,
-                    reader,
-                    segment_ordinal,
-                    &limits,
-                )?,
-                agg: agg.clone(),
-                limits,
-                missing_value_for_accessor: None,
-                str_dict_column: None,
-                column_block_accessor: Default::default(),
-            };
-            aggs.push(res);
-            Ok(())
-        };
-
-        let mut res: Vec<AggregationWithAccessor> = Vec::new();
-        use AggregationVariants::*;
-
-        match agg.agg {
-            Range(RangeAggregation {
-                field: ref field_name,
-                ..
-            }) => {
-                let (accessor, column_type) =
-                    get_ff_reader(reader, field_name, Some(get_numeric_or_date_column_types()))?;
-                add_agg_with_accessor(&agg, accessor, column_type, &mut res)?;
-            }
-            Histogram(HistogramAggregation {
-                field: ref field_name,
-                ..
-            }) => {
-                let (accessor, column_type) =
-                    get_ff_reader(reader, field_name, Some(get_numeric_or_date_column_types()))?;
-                add_agg_with_accessor(&agg, accessor, column_type, &mut res)?;
-            }
-            DateHistogram(DateHistogramAggregationReq {
-                field: ref field_name,
-                ..
-            }) => {
-                let (accessor, column_type) =
-                    // Only DateTime is supported for DateHistogram
-                    get_ff_reader(reader, field_name, Some(&[ColumnType::DateTime]))?;
-                add_agg_with_accessor(&agg, accessor, column_type, &mut res)?;
-            }
-            Terms(TermsAggregation {
-                field: ref field_name,
-                ref missing,
-                ..
-            })
-            | Cardinality(CardinalityAggregationReq {
-                field: ref field_name,
-                ref missing,
-                ..
-            }) => {
-                let str_dict_column = reader.fast_fields().str(field_name)?;
-                let allowed_column_types = [
-                    ColumnType::I64,
-                    ColumnType::U64,
-                    ColumnType::F64,
-                    ColumnType::Str,
-                    ColumnType::DateTime,
-                    ColumnType::Bool,
-                    ColumnType::IpAddr,
-                    // ColumnType::Bytes Unsupported
-                ];
-
-                // In case the column is empty we want the shim column to match the missing type
-                let fallback_type = missing
-                    .as_ref()
-                    .map(|missing| match missing {
-                        Key::Str(_) => ColumnType::Str,
-                        Key::F64(_) => ColumnType::F64,
-                        Key::I64(_) => ColumnType::I64,
-                        Key::U64(_) => ColumnType::U64,
-                    })
-                    .unwrap_or(ColumnType::U64);
-                let column_and_types = get_all_ff_reader_or_empty(
-                    reader,
-                    field_name,
-                    Some(&allowed_column_types),
-                    fallback_type,
-                )?;
-                let missing_and_more_than_one_col = column_and_types.len() > 1 && missing.is_some();
-                let text_on_non_text_col = column_and_types.len() == 1
-                    && column_and_types[0].1.numerical_type().is_some()
-                    && missing
-                        .as_ref()
-                        .map(|m| matches!(m, Key::Str(_)))
-                        .unwrap_or(false);
-
-                // Actually we could convert the text to a number and have the fast path, if it is
-                // provided in Rfc3339 format. But this use case is probably common
-                // enough to justify the effort.
-                let text_on_date_col = column_and_types.len() == 1
-                    && column_and_types[0].1 == ColumnType::DateTime
-                    && missing
-                        .as_ref()
-                        .map(|m| matches!(m, Key::Str(_)))
-                        .unwrap_or(false);
-
-                let use_special_missing_agg =
-                    missing_and_more_than_one_col || text_on_non_text_col || text_on_date_col;
-                if use_special_missing_agg {
-                    let column_and_types =
-                        get_all_ff_reader_or_empty(reader, field_name, None, fallback_type)?;
-
-                    let accessors = column_and_types
-                        .iter()
-                        .map(|c_t| (c_t.0.clone(), c_t.1))
-                        .collect();
-                    add_agg_with_accessors(&agg, accessors, &mut res, Default::default())?;
-                }
-
-                for (accessor, column_type) in column_and_types {
-                    let missing_value_term_agg = if use_special_missing_agg {
-                        None
-                    } else {
-                        missing.clone()
-                    };
-
-                    let missing_value_for_accessor =
-                        if let Some(missing) = missing_value_term_agg.as_ref() {
-                            get_missing_val_as_u64_lenient(
-                                column_type,
-                                missing,
-                                agg.agg.get_fast_field_names()[0],
-                            )?
-                        } else {
-                            None
-                        };
-
-                    let limits = limits.clone();
-                    let agg = AggregationWithAccessor {
-                        segment_ordinal,
-                        missing_value_for_accessor,
-                        accessor,
-                        accessors: Default::default(),
-                        value_accessors: Default::default(),
-                        field_type: column_type,
-                        sub_aggregation: get_aggs_with_segment_accessor_and_validate(
-                            sub_aggregation,
-                            reader,
-                            segment_ordinal,
-                            &limits,
-                        )?,
-                        agg: agg.clone(),
-                        str_dict_column: str_dict_column.clone(),
-                        limits,
-                        column_block_accessor: Default::default(),
-                    };
-                    res.push(agg);
-                }
-            }
-            Average(AverageAggregation {
-                field: ref field_name,
-                ..
-            })
-            | Max(MaxAggregation {
-                field: ref field_name,
-                ..
-            })
-            | Min(MinAggregation {
-                field: ref field_name,
-                ..
-            })
-            | Stats(StatsAggregation {
-                field: ref field_name,
-                ..
-            })
-            | ExtendedStats(ExtendedStatsAggregation {
-                field: ref field_name,
-                ..
-            })
-            | Sum(SumAggregation {
-                field: ref field_name,
-                ..
-            }) => {
-                let (accessor, column_type) =
-                    get_ff_reader(reader, field_name, Some(get_numeric_or_date_column_types()))?;
-                add_agg_with_accessor(&agg, accessor, column_type, &mut res)?;
-            }
-            Count(CountAggregation {
-                field: ref field_name,
-                ..
-            }) => {
-                let allowed_column_types = [
-                    ColumnType::I64,
-                    ColumnType::U64,
-                    ColumnType::F64,
-                    ColumnType::Str,
-                    ColumnType::DateTime,
-                    ColumnType::Bool,
-                    ColumnType::IpAddr,
-                    // ColumnType::Bytes Unsupported
-                ];
-                let (accessor, column_type) =
-                    get_ff_reader(reader, field_name, Some(&allowed_column_types))?;
-                add_agg_with_accessor(&agg, accessor, column_type, &mut res)?;
-            }
-            Percentiles(ref percentiles) => {
-                let (accessor, column_type) = get_ff_reader(
-                    reader,
-                    percentiles.field_name(),
-                    Some(get_numeric_or_date_column_types()),
-                )?;
-                add_agg_with_accessor(&agg, accessor, column_type, &mut res)?;
-            }
-            TopHits(ref mut top_hits) => {
-                top_hits.validate_and_resolve_field_names(reader.fast_fields().columnar())?;
-                let accessors: Vec<(Column<u64>, ColumnType)> = top_hits
-                    .field_names()
-                    .iter()
-                    .map(|field| {
-                        get_ff_reader(reader, field, Some(get_numeric_or_date_column_types()))
-                    })
-                    .collect::<crate::Result<_>>()?;
-
-                let value_accessors = top_hits
-                    .value_field_names()
-                    .iter()
-                    .map(|field_name| {
-                        Ok((
-                            field_name.to_string(),
-                            get_dynamic_columns(reader, field_name)?,
-                        ))
-                    })
-                    .collect::<crate::Result<_>>()?;
-
-                add_agg_with_accessors(&agg, accessors, &mut res, value_accessors)?;
-            }
-        };
-
-        Ok(res)
-    }
-}
-
-/// Get the missing value as internal u64 representation
-///
-/// For terms we use u64::MAX as sentinel value
-/// For numerical data we convert the value into the representation
-/// we would get from the fast field, when we open it as u64_lenient_for_type.
-///
-/// That way we can use it the same way as if it would come from the fastfield.
-fn get_missing_val_as_u64_lenient(
-    column_type: ColumnType,
-    missing: &Key,
-    field_name: &str,
-) -> crate::Result<Option<u64>> {
-    let missing_val = match missing {
-        Key::Str(_) if column_type == ColumnType::Str => Some(u64::MAX),
-        // Allow fallback to number on text fields
-        Key::F64(_) if column_type == ColumnType::Str => Some(u64::MAX),
-        Key::U64(_) if column_type == ColumnType::Str => Some(u64::MAX),
-        Key::I64(_) if column_type == ColumnType::Str => Some(u64::MAX),
-        Key::F64(val) if column_type.numerical_type().is_some() => {
-            f64_to_fastfield_u64(*val, &column_type)
-        }
-        // NOTE: We may loose precision of the passed missing value by casting i64 and u64 to f64.
-        Key::I64(val) if column_type.numerical_type().is_some() => {
-            f64_to_fastfield_u64(*val as f64, &column_type)
-        }
-        Key::U64(val) if column_type.numerical_type().is_some() => {
-            f64_to_fastfield_u64(*val as f64, &column_type)
-        }
-        _ => {
-            return Err(crate::TantivyError::InvalidArgument(format!(
-                "Missing value {missing:?} for field {field_name} is not supported for column \
-                 type {column_type:?}"
-            )));
-        }
-    };
-    Ok(missing_val)
-}
-
-fn get_numeric_or_date_column_types() -> &'static [ColumnType] {
-    &[
-        ColumnType::F64,
-        ColumnType::U64,
-        ColumnType::I64,
-        ColumnType::DateTime,
-    ]
-}
-
-pub(crate) fn get_aggs_with_segment_accessor_and_validate(
-    aggs: &Aggregations,
-    reader: &SegmentReader,
-    segment_ordinal: SegmentOrdinal,
-    limits: &AggregationLimitsGuard,
-) -> crate::Result<AggregationsWithAccessor> {
-    let mut aggss = Vec::new();
-    for (key, agg) in aggs.iter() {
-        let aggs = AggregationWithAccessor::try_from_agg(
-            agg,
-            agg.sub_aggregation(),
-            reader,
-            segment_ordinal,
-            limits.clone(),
-        )?;
-        for agg in aggs {
-            aggss.push((key.to_string(), agg));
-        }
-    }
-    Ok(AggregationsWithAccessor::from_data(
-        VecWithNames::from_entries(aggss),
-    ))
-}
-
-/// Get fast field reader or empty as default.
-fn get_ff_reader(
-    reader: &SegmentReader,
-    field_name: &str,
-    allowed_column_types: Option<&[ColumnType]>,
-) -> crate::Result<(columnar::Column<u64>, ColumnType)> {
-    let ff_fields = reader.fast_fields();
-    let ff_field_with_type = ff_fields
-        .u64_lenient_for_type(allowed_column_types, field_name)?
-        .unwrap_or_else(|| {
-            (
-                Column::build_empty_column(reader.num_docs()),
-                ColumnType::U64,
-            )
-        });
-    Ok(ff_field_with_type)
-}
-
-fn get_dynamic_columns(
-    reader: &SegmentReader,
-    field_name: &str,
-) -> crate::Result<Vec<columnar::DynamicColumn>> {
-    let ff_fields = reader.fast_fields().dynamic_column_handles(field_name)?;
-    let cols = ff_fields
-        .iter()
-        .map(|h| h.open())
-        .collect::<io::Result<_>>()?;
-    assert!(!ff_fields.is_empty(), "field {field_name} not found");
-    Ok(cols)
-}
-
-/// Get all fast field reader or empty as default.
-///
-/// Is guaranteed to return at least one column.
-fn get_all_ff_reader_or_empty(
-    reader: &SegmentReader,
-    field_name: &str,
-    allowed_column_types: Option<&[ColumnType]>,
-    fallback_type: ColumnType,
-) -> crate::Result<Vec<(columnar::Column<u64>, ColumnType)>> {
-    let ff_fields = reader.fast_fields();
-    let mut ff_field_with_type =
-        ff_fields.u64_lenient_for_type_all(allowed_column_types, field_name)?;
-    if ff_field_with_type.is_empty() {
-        ff_field_with_type.push((Column::build_empty_column(reader.num_docs()), fallback_type));
-    }
-    Ok(ff_field_with_type)
-}
--- a/src/aggregation/agg_result.rs
+++ b/src/aggregation/agg_result.rs
@@ -16,7 +16,7 @@ use super::{AggregationError, Key};
 use crate::TantivyError;

 #[derive(Clone, Default, Debug, PartialEq, Serialize, Deserialize)]
-/// The final aggegation result.
+/// The final aggregation result.
 pub struct AggregationResults(pub FxHashMap<String, AggregationResult>);

 impl AggregationResults {
@@ -156,6 +156,8 @@ pub enum BucketResult {
        /// The upper bound error for the doc count of each term.
        doc_count_error_upper_bound: Option<u64>,
    },
+    /// This is the filter result - a single bucket with sub-aggregations
+    Filter(FilterBucketResult),
 }

 impl BucketResult {
@@ -172,6 +174,11 @@ impl BucketResult {
                sum_other_doc_count: _,
                doc_count_error_upper_bound: _,
            } => buckets.iter().map(|bucket| bucket.get_bucket_count()).sum(),
+            BucketResult::Filter(filter_result) => {
+                // Filter doesn't add to bucket count - it's not a user-facing bucket
+                // Only count sub-aggregation buckets
+                filter_result.sub_aggregations.get_bucket_count()
+            }
        }
    }
 }
@@ -308,3 +315,25 @@ impl RangeBucketEntry {
        1 + self.sub_aggregation.get_bucket_count()
    }
 }
+
+/// This is the filter bucket result, which contains the document count and sub-aggregations.
+///
+/// # JSON Format
+/// ```json
+/// {
+///   "electronics_only": {
+///     "doc_count": 2,
+///     "avg_price": {
+///       "value": 150.0
+///     }
+///   }
+/// }
+/// ```
+#[derive(Clone, Debug, PartialEq, Serialize, Deserialize)]
+pub struct FilterBucketResult {
+    /// Number of documents in the filter bucket
+    pub doc_count: u64,
+    /// Sub-aggregation results
+    #[serde(flatten)]
+    pub sub_aggregations: AggregationResults,
+}
--- a/src/aggregation/agg_tests.rs
+++ b/src/aggregation/agg_tests.rs
@@ -5,7 +5,6 @@ use crate::aggregation::agg_result::AggregationResults;
 use crate::aggregation::buf_collector::DOC_BLOCK_SIZE;
 use crate::aggregation::collector::AggregationCollector;
 use crate::aggregation::intermediate_agg_result::IntermediateAggregationResults;
-use crate::aggregation::segment_agg_result::AggregationLimitsGuard;
 use crate::aggregation::tests::{get_test_index_2_segments, get_test_index_from_values_and_terms};
 use crate::aggregation::DistributedAggregationCollector;
 use crate::query::{AllQuery, TermQuery};
@@ -128,10 +127,8 @@ fn test_aggregation_flushing(
            .unwrap();

    let agg_res: AggregationResults = if use_distributed_collector {
-        let collector = DistributedAggregationCollector::from_aggs(
-            agg_req.clone(),
-            AggregationLimitsGuard::default(),
-        );
+        let collector =
+            DistributedAggregationCollector::from_aggs(agg_req.clone(), Default::default());

        let searcher = reader.searcher();
        let intermediate_agg_result = searcher.search(&AllQuery, &collector).unwrap();
--- a/src/aggregation/bucket/filter.rs
+++ b/src/aggregation/bucket/filter.rs
--- a/src/aggregation/bucket/histogram/histogram.rs
+++ b/src/aggregation/bucket/histogram/histogram.rs
@@ -1,25 +1,54 @@
 use std::cmp::Ordering;

+use columnar::{Column, ColumnBlockAccessor, ColumnType};
 use rustc_hash::FxHashMap;
 use serde::{Deserialize, Serialize};
 use tantivy_bitpacker::minmax;

+use crate::aggregation::agg_data::{
+    build_segment_agg_collectors, AggRefNode, AggregationsSegmentCtx,
+};
 use crate::aggregation::agg_limits::MemoryConsumption;
 use crate::aggregation::agg_req::Aggregations;
-use crate::aggregation::agg_req_with_accessor::{
-    AggregationWithAccessor, AggregationsWithAccessor,
-};
 use crate::aggregation::agg_result::BucketEntry;
 use crate::aggregation::intermediate_agg_result::{
    IntermediateAggregationResult, IntermediateAggregationResults, IntermediateBucketResult,
    IntermediateHistogramBucketEntry,
 };
-use crate::aggregation::segment_agg_result::{
-    build_segment_agg_collector, SegmentAggregationCollector,
-};
+use crate::aggregation::segment_agg_result::SegmentAggregationCollector;
 use crate::aggregation::*;
 use crate::TantivyError;

+/// Contains all information required by the SegmentHistogramCollector to perform the
+/// histogram or date_histogram aggregation on a segment.
+pub struct HistogramAggReqData {
+    /// The column accessor to access the fast field values.
+    pub accessor: Column<u64>,
+    /// The field type of the fast field.
+    pub field_type: ColumnType,
+    /// The column block accessor to access the fast field values.
+    pub column_block_accessor: ColumnBlockAccessor<u64>,
+    /// The name of the aggregation.
+    pub name: String,
+    /// The sub aggregation blueprint, used to create sub aggregations for each bucket.
+    /// Will be filled during initialization of the collector.
+    pub sub_aggregation_blueprint: Option<Box<dyn SegmentAggregationCollector>>,
+    /// The histogram aggregation request.
+    pub req: HistogramAggregation,
+    /// True if this is a date_histogram aggregation.
+    pub is_date_histogram: bool,
+    /// The bounds to limit the buckets to.
+    pub bounds: HistogramBounds,
+    /// The offset used to calculate the bucket position.
+    pub offset: f64,
+}
+impl HistogramAggReqData {
+    /// Estimate the memory consumption of this struct in bytes.
+    pub fn get_memory_consumption(&self) -> usize {
+        std::mem::size_of::<Self>()
+    }
+}
+
 /// Histogram is a bucket aggregation, where buckets are created dynamically for given `interval`.
 /// Each document value is rounded down to its bucket.
 ///
@@ -234,12 +263,12 @@ impl SegmentHistogramBucketEntry {
    pub(crate) fn into_intermediate_bucket_entry(
        self,
        sub_aggregation: Option<Box<dyn SegmentAggregationCollector>>,
-        agg_with_accessor: &AggregationsWithAccessor,
+        agg_data: &AggregationsSegmentCtx,
    ) -> crate::Result<IntermediateHistogramBucketEntry> {
        let mut sub_aggregation_res = IntermediateAggregationResults::default();
        if let Some(sub_aggregation) = sub_aggregation {
            sub_aggregation
-                .add_intermediate_aggregation_result(agg_with_accessor, &mut sub_aggregation_res)?;
+                .add_intermediate_aggregation_result(agg_data, &mut sub_aggregation_res)?;
        }
        Ok(IntermediateHistogramBucketEntry {
            key: self.key,
@@ -256,24 +285,20 @@ pub struct SegmentHistogramCollector {
    /// The buckets containing the aggregation data.
    buckets: FxHashMap<i64, SegmentHistogramBucketEntry>,
    sub_aggregations: FxHashMap<i64, Box<dyn SegmentAggregationCollector>>,
-    sub_aggregation_blueprint: Option<Box<dyn SegmentAggregationCollector>>,
-    column_type: ColumnType,
-    interval: f64,
-    offset: f64,
-    bounds: HistogramBounds,
    accessor_idx: usize,
 }

 impl SegmentAggregationCollector for SegmentHistogramCollector {
    fn add_intermediate_aggregation_result(
        self: Box<Self>,
-        agg_with_accessor: &AggregationsWithAccessor,
+        agg_data: &AggregationsSegmentCtx,
        results: &mut IntermediateAggregationResults,
    ) -> crate::Result<()> {
-        let name = agg_with_accessor.aggs.keys[self.accessor_idx].to_string();
-        let agg_with_accessor = &agg_with_accessor.aggs.values[self.accessor_idx];
-
-        let bucket = self.into_intermediate_bucket_result(agg_with_accessor)?;
+        let name = agg_data
+            .get_histogram_req_data(self.accessor_idx)
+            .name
+            .clone();
+        let bucket = self.into_intermediate_bucket_result(agg_data)?;
        results.push(name, IntermediateAggregationResult::Bucket(bucket))?;

        Ok(())
@@ -283,56 +308,52 @@ impl SegmentAggregationCollector for SegmentHistogramCollector {
    fn collect(
        &mut self,
        doc: crate::DocId,
-        agg_with_accessor: &mut AggregationsWithAccessor,
+        agg_data: &mut AggregationsSegmentCtx,
    ) -> crate::Result<()> {
-        self.collect_block(&[doc], agg_with_accessor)
+        self.collect_block(&[doc], agg_data)
    }

    #[inline]
    fn collect_block(
        &mut self,
        docs: &[crate::DocId],
-        agg_with_accessor: &mut AggregationsWithAccessor,
+        agg_data: &mut AggregationsSegmentCtx,
    ) -> crate::Result<()> {
-        let bucket_agg_accessor = &mut agg_with_accessor.aggs.values[self.accessor_idx];
-
+        let mut req = agg_data.take_histogram_req_data(self.accessor_idx);
        let mem_pre = self.get_memory_consumption();

-        let bounds = self.bounds;
-        let interval = self.interval;
-        let offset = self.offset;
-        let get_bucket_pos = |val| (get_bucket_pos_f64(val, interval, offset) as i64);
+        let bounds = req.bounds;
+        let interval = req.req.interval;
+        let offset = req.offset;
+        let get_bucket_pos = |val| get_bucket_pos_f64(val, interval, offset) as i64;

-        bucket_agg_accessor
+        req.column_block_accessor.fetch_block(docs, &req.accessor);
+        for (doc, val) in req
            .column_block_accessor
-            .fetch_block(docs, &bucket_agg_accessor.accessor);
-
-        for (doc, val) in bucket_agg_accessor
-            .column_block_accessor
-            .iter_docid_vals(docs, &bucket_agg_accessor.accessor)
+            .iter_docid_vals(docs, &req.accessor)
        {
-            let val = self.f64_from_fastfield_u64(val);
-
+            let val = f64_from_fastfield_u64(val, &req.field_type);
            let bucket_pos = get_bucket_pos(val);
-
            if bounds.contains(val) {
                let bucket = self.buckets.entry(bucket_pos).or_insert_with(|| {
                    let key = get_bucket_key_from_pos(bucket_pos as f64, interval, offset);
                    SegmentHistogramBucketEntry { key, doc_count: 0 }
                });
                bucket.doc_count += 1;
-                if let Some(sub_aggregation_blueprint) = self.sub_aggregation_blueprint.as_mut() {
+                if let Some(sub_aggregation_blueprint) = req.sub_aggregation_blueprint.as_ref() {
                    self.sub_aggregations
                        .entry(bucket_pos)
                        .or_insert_with(|| sub_aggregation_blueprint.clone())
-                        .collect(doc, &mut bucket_agg_accessor.sub_aggregation)?;
+                        .collect(doc, agg_data)?;
                }
            }
        }
+        agg_data.put_back_histogram_req_data(self.accessor_idx, req);

        let mem_delta = self.get_memory_consumption() - mem_pre;
        if mem_delta > 0 {
-            bucket_agg_accessor
+            agg_data
+                .context
                .limits
                .add_memory_consumed(mem_delta as u64)?;
        }
@@ -340,12 +361,9 @@ impl SegmentAggregationCollector for SegmentHistogramCollector {
        Ok(())
    }

-    fn flush(&mut self, agg_with_accessor: &mut AggregationsWithAccessor) -> crate::Result<()> {
-        let sub_aggregation_accessor =
-            &mut agg_with_accessor.aggs.values[self.accessor_idx].sub_aggregation;
-
+    fn flush(&mut self, agg_data: &mut AggregationsSegmentCtx) -> crate::Result<()> {
        for sub_aggregation in self.sub_aggregations.values_mut() {
-            sub_aggregation.flush(sub_aggregation_accessor)?;
+            sub_aggregation.flush(agg_data)?;
        }

        Ok(())
@@ -362,65 +380,58 @@ impl SegmentHistogramCollector {
    /// Converts the collector result into a intermediate bucket result.
    pub fn into_intermediate_bucket_result(
        self,
-        agg_with_accessor: &AggregationWithAccessor,
+        agg_data: &AggregationsSegmentCtx,
    ) -> crate::Result<IntermediateBucketResult> {
        let mut buckets = Vec::with_capacity(self.buckets.len());

        for (bucket_pos, bucket) in self.buckets {
            let bucket_res = bucket.into_intermediate_bucket_entry(
                self.sub_aggregations.get(&bucket_pos).cloned(),
-                &agg_with_accessor.sub_aggregation,
+                agg_data,
            );

            buckets.push(bucket_res?);
        }
        buckets.sort_unstable_by(|b1, b2| b1.key.total_cmp(&b2.key));

+        let is_date_agg = agg_data
+            .get_histogram_req_data(self.accessor_idx)
+            .field_type
+            == ColumnType::DateTime;
        Ok(IntermediateBucketResult::Histogram {
            buckets,
-            is_date_agg: self.column_type == ColumnType::DateTime,
+            is_date_agg,
        })
    }

    pub(crate) fn from_req_and_validate(
-        mut req: HistogramAggregation,
-        sub_aggregation: &mut AggregationsWithAccessor,
-        field_type: ColumnType,
-        accessor_idx: usize,
+        agg_data: &mut AggregationsSegmentCtx,
+        node: &AggRefNode,
    ) -> crate::Result<Self> {
-        req.validate()?;
-        if field_type == ColumnType::DateTime {
-            req.normalize_date_time();
-        }
-
-        let sub_aggregation_blueprint = if sub_aggregation.is_empty() {
-            None
+        let blueprint = if !node.children.is_empty() {
+            Some(build_segment_agg_collectors(agg_data, &node.children)?)
        } else {
-            let sub_aggregation = build_segment_agg_collector(sub_aggregation)?;
-            Some(sub_aggregation)
+            None
        };
-
-        let bounds = req.hard_bounds.unwrap_or(HistogramBounds {
+        let req_data = agg_data.get_histogram_req_data_mut(node.idx_in_req_data);
+        req_data.req.validate()?;
+        if req_data.field_type == ColumnType::DateTime && !req_data.is_date_histogram {
+            req_data.req.normalize_date_time();
+        }
+        req_data.bounds = req_data.req.hard_bounds.unwrap_or(HistogramBounds {
            min: f64::MIN,
            max: f64::MAX,
        });
+        req_data.offset = req_data.req.offset.unwrap_or(0.0);
+
+        req_data.sub_aggregation_blueprint = blueprint;

        Ok(Self {
            buckets: Default::default(),
-            column_type: field_type,
-            interval: req.interval,
-            offset: req.offset.unwrap_or(0.0),
-            bounds,
            sub_aggregations: Default::default(),
-            sub_aggregation_blueprint,
-            accessor_idx,
+            accessor_idx: node.idx_in_req_data,
        })
    }
-
-    #[inline]
-    fn f64_from_fastfield_u64(&self, val: u64) -> f64 {
-        f64_from_fastfield_u64(val, &self.column_type)
-    }
 }

 #[inline]
--- a/src/aggregation/bucket/mod.rs
+++ b/src/aggregation/bucket/mod.rs
@@ -22,6 +22,7 @@
 //! - [Range](RangeAggregation)
 //! - [Terms](TermsAggregation)

+mod filter;
 mod histogram;
 mod range;
 mod term_agg;
@@ -30,6 +31,7 @@ mod term_missing_agg;
 use std::collections::HashMap;
 use std::fmt;

+pub use filter::*;
 pub use histogram::*;
 pub use range::*;
 use serde::{de, Deserialize, Deserializer, Serialize, Serializer};
--- a/src/aggregation/bucket/range.rs
+++ b/src/aggregation/bucket/range.rs
@@ -1,20 +1,43 @@
 use std::fmt::Debug;
 use std::ops::Range;

+use columnar::{Column, ColumnBlockAccessor, ColumnType};
 use rustc_hash::FxHashMap;
 use serde::{Deserialize, Serialize};

-use crate::aggregation::agg_req_with_accessor::AggregationsWithAccessor;
+use crate::aggregation::agg_data::{
+    build_segment_agg_collectors, AggRefNode, AggregationsSegmentCtx,
+};
 use crate::aggregation::intermediate_agg_result::{
    IntermediateAggregationResult, IntermediateAggregationResults, IntermediateBucketResult,
    IntermediateRangeBucketEntry, IntermediateRangeBucketResult,
 };
-use crate::aggregation::segment_agg_result::{
-    build_segment_agg_collector, SegmentAggregationCollector,
-};
+use crate::aggregation::segment_agg_result::SegmentAggregationCollector;
 use crate::aggregation::*;
 use crate::TantivyError;

+/// Contains all information required by the SegmentRangeCollector to perform the
+/// range aggregation on a segment.
+pub struct RangeAggReqData {
+    /// The column accessor to access the fast field values.
+    pub accessor: Column<u64>,
+    /// The type of the fast field.
+    pub field_type: ColumnType,
+    /// The column block accessor to access the fast field values.
+    pub column_block_accessor: ColumnBlockAccessor<u64>,
+    /// The range aggregation request.
+    pub req: RangeAggregation,
+    /// The name of the aggregation.
+    pub name: String,
+}
+
+impl RangeAggReqData {
+    /// Estimate the memory consumption of this struct in bytes.
+    pub fn get_memory_consumption(&self) -> usize {
+        std::mem::size_of::<Self>()
+    }
+}
+
 /// Provide user-defined buckets to aggregate on.
 ///
 /// Two special buckets will automatically be created to cover the whole range of values.
@@ -161,12 +184,12 @@ impl Debug for SegmentRangeBucketEntry {
 impl SegmentRangeBucketEntry {
    pub(crate) fn into_intermediate_bucket_entry(
        self,
-        agg_with_accessor: &AggregationsWithAccessor,
+        agg_data: &AggregationsSegmentCtx,
    ) -> crate::Result<IntermediateRangeBucketEntry> {
        let mut sub_aggregation_res = IntermediateAggregationResults::default();
        if let Some(sub_aggregation) = self.sub_aggregation {
            sub_aggregation
-                .add_intermediate_aggregation_result(agg_with_accessor, &mut sub_aggregation_res)?
+                .add_intermediate_aggregation_result(agg_data, &mut sub_aggregation_res)?
        } else {
            Default::default()
        };
@@ -184,12 +207,14 @@ impl SegmentRangeBucketEntry {
 impl SegmentAggregationCollector for SegmentRangeCollector {
    fn add_intermediate_aggregation_result(
        self: Box<Self>,
-        agg_with_accessor: &AggregationsWithAccessor,
+        agg_data: &AggregationsSegmentCtx,
        results: &mut IntermediateAggregationResults,
    ) -> crate::Result<()> {
        let field_type = self.column_type;
-        let name = agg_with_accessor.aggs.keys[self.accessor_idx].to_string();
-        let sub_agg = &agg_with_accessor.aggs.values[self.accessor_idx].sub_aggregation;
+        let name = agg_data
+            .get_range_req_data(self.accessor_idx)
+            .name
+            .to_string();

        let buckets: FxHashMap<SerializedKey, IntermediateRangeBucketEntry> = self
            .buckets
@@ -199,7 +224,7 @@ impl SegmentAggregationCollector for SegmentRangeCollector {
                    range_to_string(&range_bucket.range, &field_type)?,
                    range_bucket
                        .bucket
-                        .into_intermediate_bucket_entry(sub_agg)?,
+                        .into_intermediate_bucket_entry(agg_data)?,
                ))
            })
            .collect::<crate::Result<_>>()?;
@@ -218,66 +243,70 @@ impl SegmentAggregationCollector for SegmentRangeCollector {
    fn collect(
        &mut self,
        doc: crate::DocId,
-        agg_with_accessor: &mut AggregationsWithAccessor,
+        agg_data: &mut AggregationsSegmentCtx,
    ) -> crate::Result<()> {
-        self.collect_block(&[doc], agg_with_accessor)
+        self.collect_block(&[doc], agg_data)
    }

    #[inline]
    fn collect_block(
        &mut self,
        docs: &[crate::DocId],
-        agg_with_accessor: &mut AggregationsWithAccessor,
+        agg_data: &mut AggregationsSegmentCtx,
    ) -> crate::Result<()> {
-        let bucket_agg_accessor = &mut agg_with_accessor.aggs.values[self.accessor_idx];
+        // Take request data to avoid borrow conflicts during sub-aggregation
+        let mut req = agg_data.take_range_req_data(self.accessor_idx);

-        bucket_agg_accessor
-            .column_block_accessor
-            .fetch_block(docs, &bucket_agg_accessor.accessor);
+        req.column_block_accessor.fetch_block(docs, &req.accessor);

-        for (doc, val) in bucket_agg_accessor
+        for (doc, val) in req
            .column_block_accessor
-            .iter_docid_vals(docs, &bucket_agg_accessor.accessor)
+            .iter_docid_vals(docs, &req.accessor)
        {
            let bucket_pos = self.get_bucket_pos(val);
-
            let bucket = &mut self.buckets[bucket_pos];
-
            bucket.bucket.doc_count += 1;
-            if let Some(sub_aggregation) = &mut bucket.bucket.sub_aggregation {
-                sub_aggregation.collect(doc, &mut bucket_agg_accessor.sub_aggregation)?;
+            if let Some(sub_agg) = bucket.bucket.sub_aggregation.as_mut() {
+                sub_agg.collect(doc, agg_data)?;
            }
        }

+        agg_data.put_back_range_req_data(self.accessor_idx, req);
+
        Ok(())
    }

-    fn flush(&mut self, agg_with_accessor: &mut AggregationsWithAccessor) -> crate::Result<()> {
-        let sub_aggregation_accessor =
-            &mut agg_with_accessor.aggs.values[self.accessor_idx].sub_aggregation;
-
+    fn flush(&mut self, agg_data: &mut AggregationsSegmentCtx) -> crate::Result<()> {
        for bucket in self.buckets.iter_mut() {
            if let Some(sub_agg) = bucket.bucket.sub_aggregation.as_mut() {
-                sub_agg.flush(sub_aggregation_accessor)?;
+                sub_agg.flush(agg_data)?;
            }
        }
-
        Ok(())
    }
 }

 impl SegmentRangeCollector {
    pub(crate) fn from_req_and_validate(
-        req: &RangeAggregation,
-        sub_aggregation: &mut AggregationsWithAccessor,
-        limits: &mut AggregationLimitsGuard,
-        field_type: ColumnType,
-        accessor_idx: usize,
+        req_data: &mut AggregationsSegmentCtx,
+        node: &AggRefNode,
    ) -> crate::Result<Self> {
+        let accessor_idx = node.idx_in_req_data;
+        let (field_type, ranges) = {
+            let req_view = req_data.get_range_req_data(node.idx_in_req_data);
+            (req_view.field_type, req_view.req.ranges.clone())
+        };
+
        // The range input on the request is f64.
        // We need to convert to u64 ranges, because we read the values as u64.
        // The mapping from the conversion is monotonic so ordering is preserved.
-        let buckets: Vec<_> = extend_validate_ranges(&req.ranges, &field_type)?
+        let sub_agg_prototype = if !node.children.is_empty() {
+            Some(build_segment_agg_collectors(req_data, &node.children)?)
+        } else {
+            None
+        };
+
+        let buckets: Vec<_> = extend_validate_ranges(&ranges, &field_type)?
            .iter()
            .map(|range| {
                let key = range
@@ -295,11 +324,7 @@ impl SegmentRangeCollector {
                } else {
                    Some(f64_from_fastfield_u64(range.range.start, &field_type))
                };
-                let sub_aggregation = if sub_aggregation.is_empty() {
-                    None
-                } else {
-                    Some(build_segment_agg_collector(sub_aggregation)?)
-                };
+                let sub_aggregation = sub_agg_prototype.clone();

                Ok(SegmentRangeAndBucketEntry {
                    range: range.range.clone(),
@@ -314,7 +339,7 @@ impl SegmentRangeCollector {
            })
            .collect::<crate::Result<_>>()?;

-        limits.add_memory_consumed(
+        req_data.context.limits.add_memory_consumed(
            buckets.len() as u64 * std::mem::size_of::<SegmentRangeAndBucketEntry>() as u64,
        )?;

@@ -467,15 +492,45 @@ mod tests {
            ranges,
            ..Default::default()
        };
+        // Build buckets directly as in from_req_and_validate without AggregationsData
+        let buckets: Vec<_> = extend_validate_ranges(&req.ranges, &field_type)
+            .expect("unexpected error in extend_validate_ranges")
+            .iter()
+            .map(|range| {
+                let key = range
+                    .key
+                    .clone()
+                    .map(|key| Ok(Key::Str(key)))
+                    .unwrap_or_else(|| range_to_key(&range.range, &field_type))
+                    .expect("unexpected error in range_to_key");
+                let to = if range.range.end == u64::MAX {
+                    None
+                } else {
+                    Some(f64_from_fastfield_u64(range.range.end, &field_type))
+                };
+                let from = if range.range.start == u64::MIN {
+                    None
+                } else {
+                    Some(f64_from_fastfield_u64(range.range.start, &field_type))
+                };
+                SegmentRangeAndBucketEntry {
+                    range: range.range.clone(),
+                    bucket: SegmentRangeBucketEntry {
+                        doc_count: 0,
+                        sub_aggregation: None,
+                        key,
+                        from,
+                        to,
+                    },
+                }
+            })
+            .collect();

-        SegmentRangeCollector::from_req_and_validate(
-            &req,
-            &mut Default::default(),
-            &mut AggregationLimitsGuard::default(),
-            field_type,
-            0,
-        )
-        .expect("unexpected error")
+        SegmentRangeCollector {
+            buckets,
+            column_type: field_type,
+            accessor_idx: 0,
+        }
    }

    #[test]
--- a/src/aggregation/bucket/term_agg.rs
+++ b/src/aggregation/bucket/term_agg.rs
--- a/src/aggregation/bucket/term_missing_agg.rs
+++ b/src/aggregation/bucket/term_missing_agg.rs
@@ -1,13 +1,39 @@
+use columnar::{Column, ColumnType};
 use rustc_hash::FxHashMap;

-use crate::aggregation::agg_req_with_accessor::AggregationsWithAccessor;
+use crate::aggregation::agg_data::{
+    build_segment_agg_collectors, AggRefNode, AggregationsSegmentCtx,
+};
+use crate::aggregation::bucket::term_agg::TermsAggregation;
 use crate::aggregation::intermediate_agg_result::{
    IntermediateAggregationResult, IntermediateAggregationResults, IntermediateBucketResult,
    IntermediateKey, IntermediateTermBucketEntry, IntermediateTermBucketResult,
 };
-use crate::aggregation::segment_agg_result::{
-    build_segment_agg_collector, SegmentAggregationCollector,
-};
+use crate::aggregation::segment_agg_result::SegmentAggregationCollector;
+
+/// Special aggregation to handle missing values for term aggregations.
+/// This missing aggregation will check multiple columns for existence.
+///
+/// This is needed when:
+/// - The field is multi-valued and we therefore have multiple columns
+/// - The field is not text and missing is provided as string (we cannot use the numeric missing
+///   value optimization)
+#[derive(Default)]
+pub struct MissingTermAggReqData {
+    /// The accessors to check for existence of a value.
+    pub accessors: Vec<(Column<u64>, ColumnType)>,
+    /// The name of the aggregation.
+    pub name: String,
+    /// The original terms aggregation request.
+    pub req: TermsAggregation,
+}
+
+impl MissingTermAggReqData {
+    /// Estimate the memory consumption of this struct in bytes.
+    pub fn get_memory_consumption(&self) -> usize {
+        std::mem::size_of::<Self>()
+    }
+}

 /// The specialized missing term aggregation.
 #[derive(Default, Debug, Clone)]
@@ -18,12 +44,13 @@ pub struct TermMissingAgg {
 }
 impl TermMissingAgg {
    pub(crate) fn new(
-        accessor_idx: usize,
-        sub_aggregations: &mut AggregationsWithAccessor,
+        req_data: &mut AggregationsSegmentCtx,
+        node: &AggRefNode,
    ) -> crate::Result<Self> {
-        let has_sub_aggregations = !sub_aggregations.is_empty();
+        let has_sub_aggregations = !node.children.is_empty();
+        let accessor_idx = node.idx_in_req_data;
        let sub_agg = if has_sub_aggregations {
-            let sub_aggregation = build_segment_agg_collector(sub_aggregations)?;
+            let sub_aggregation = build_segment_agg_collectors(req_data, &node.children)?;
            Some(sub_aggregation)
        } else {
            None
@@ -40,16 +67,11 @@ impl TermMissingAgg {
 impl SegmentAggregationCollector for TermMissingAgg {
    fn add_intermediate_aggregation_result(
        self: Box<Self>,
-        agg_with_accessor: &AggregationsWithAccessor,
+        agg_data: &AggregationsSegmentCtx,
        results: &mut IntermediateAggregationResults,
    ) -> crate::Result<()> {
-        let name = agg_with_accessor.aggs.keys[self.accessor_idx].to_string();
-        let agg_with_accessor = &agg_with_accessor.aggs.values[self.accessor_idx];
-        let term_agg = agg_with_accessor
-            .agg
-            .agg
-            .as_term()
-            .expect("TermMissingAgg collector must be term agg req");
+        let req_data = agg_data.get_missing_term_req_data(self.accessor_idx);
+        let term_agg = &req_data.req;
        let missing = term_agg
            .missing
            .as_ref()
@@ -64,10 +86,7 @@ impl SegmentAggregationCollector for TermMissingAgg {
        };
        if let Some(sub_agg) = self.sub_agg {
            let mut res = IntermediateAggregationResults::default();
-            sub_agg.add_intermediate_aggregation_result(
-                &agg_with_accessor.sub_aggregation,
-                &mut res,
-            )?;
+            sub_agg.add_intermediate_aggregation_result(agg_data, &mut res)?;
            missing_entry.sub_aggregation = res;
        }
        entries.insert(missing.into(), missing_entry);
@@ -80,7 +99,10 @@ impl SegmentAggregationCollector for TermMissingAgg {
            },
        };

-        results.push(name, IntermediateAggregationResult::Bucket(bucket))?;
+        results.push(
+            req_data.name.to_string(),
+            IntermediateAggregationResult::Bucket(bucket),
+        )?;

        Ok(())
    }
@@ -88,17 +110,17 @@ impl SegmentAggregationCollector for TermMissingAgg {
    fn collect(
        &mut self,
        doc: crate::DocId,
-        agg_with_accessor: &mut AggregationsWithAccessor,
+        agg_data: &mut AggregationsSegmentCtx,
    ) -> crate::Result<()> {
-        let agg = &mut agg_with_accessor.aggs.values[self.accessor_idx];
-        let has_value = agg
+        let req_data = agg_data.get_missing_term_req_data(self.accessor_idx);
+        let has_value = req_data
            .accessors
            .iter()
            .any(|(acc, _)| acc.index.has_value(doc));
        if !has_value {
            self.missing_count += 1;
            if let Some(sub_agg) = self.sub_agg.as_mut() {
-                sub_agg.collect(doc, &mut agg.sub_aggregation)?;
+                sub_agg.collect(doc, agg_data)?;
            }
        }
        Ok(())
@@ -107,10 +129,10 @@ impl SegmentAggregationCollector for TermMissingAgg {
    fn collect_block(
        &mut self,
        docs: &[crate::DocId],
-        agg_with_accessor: &mut AggregationsWithAccessor,
+        agg_data: &mut AggregationsSegmentCtx,
    ) -> crate::Result<()> {
        for doc in docs {
-            self.collect(*doc, agg_with_accessor)?;
+            self.collect(*doc, agg_data)?;
        }
        Ok(())
    }
--- a/src/aggregation/buf_collector.rs
+++ b/src/aggregation/buf_collector.rs
@@ -1,9 +1,14 @@
-use super::agg_req_with_accessor::AggregationsWithAccessor;
 use super::intermediate_agg_result::IntermediateAggregationResults;
 use super::segment_agg_result::SegmentAggregationCollector;
+use crate::aggregation::agg_data::AggregationsSegmentCtx;
 use crate::DocId;

+#[cfg(test)]
 pub(crate) const DOC_BLOCK_SIZE: usize = 64;
+
+#[cfg(not(test))]
+pub(crate) const DOC_BLOCK_SIZE: usize = 256;
+
 pub(crate) type DocBlock = [DocId; DOC_BLOCK_SIZE];

 /// BufAggregationCollector buffers documents before calling collect_block().
@@ -15,7 +20,7 @@ pub(crate) struct BufAggregationCollector {
 }

 impl std::fmt::Debug for BufAggregationCollector {
-    fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
+    fn fmt(&self, f: &mut std::fmt::Formatter) -> std::fmt::Result {
        f.debug_struct("SegmentAggregationResultsCollector")
            .field("staged_docs", &&self.staged_docs[..self.num_staged_docs])
            .field("num_staged_docs", &self.num_staged_docs)
@@ -37,23 +42,23 @@ impl SegmentAggregationCollector for BufAggregationCollector {
    #[inline]
    fn add_intermediate_aggregation_result(
        self: Box<Self>,
-        agg_with_accessor: &AggregationsWithAccessor,
+        agg_data: &AggregationsSegmentCtx,
        results: &mut IntermediateAggregationResults,
    ) -> crate::Result<()> {
-        Box::new(self.collector).add_intermediate_aggregation_result(agg_with_accessor, results)
+        Box::new(self.collector).add_intermediate_aggregation_result(agg_data, results)
    }

    #[inline]
    fn collect(
        &mut self,
        doc: crate::DocId,
-        agg_with_accessor: &mut AggregationsWithAccessor,
+        agg_data: &mut AggregationsSegmentCtx,
    ) -> crate::Result<()> {
        self.staged_docs[self.num_staged_docs] = doc;
        self.num_staged_docs += 1;
        if self.num_staged_docs == self.staged_docs.len() {
            self.collector
-                .collect_block(&self.staged_docs[..self.num_staged_docs], agg_with_accessor)?;
+                .collect_block(&self.staged_docs[..self.num_staged_docs], agg_data)?;
            self.num_staged_docs = 0;
        }
        Ok(())
@@ -63,20 +68,19 @@ impl SegmentAggregationCollector for BufAggregationCollector {
    fn collect_block(
        &mut self,
        docs: &[crate::DocId],
-        agg_with_accessor: &mut AggregationsWithAccessor,
+        agg_data: &mut AggregationsSegmentCtx,
    ) -> crate::Result<()> {
-        self.collector.collect_block(docs, agg_with_accessor)?;
-
+        self.collector.collect_block(docs, agg_data)?;
        Ok(())
    }

    #[inline]
-    fn flush(&mut self, agg_with_accessor: &mut AggregationsWithAccessor) -> crate::Result<()> {
+    fn flush(&mut self, agg_data: &mut AggregationsSegmentCtx) -> crate::Result<()> {
        self.collector
-            .collect_block(&self.staged_docs[..self.num_staged_docs], agg_with_accessor)?;
+            .collect_block(&self.staged_docs[..self.num_staged_docs], agg_data)?;
        self.num_staged_docs = 0;

-        self.collector.flush(agg_with_accessor)?;
+        self.collector.flush(agg_data)?;

        Ok(())
    }
--- a/src/aggregation/collector.rs
+++ b/src/aggregation/collector.rs
@@ -1,12 +1,12 @@
 use super::agg_req::Aggregations;
-use super::agg_req_with_accessor::AggregationsWithAccessor;
 use super::agg_result::AggregationResults;
 use super::buf_collector::BufAggregationCollector;
 use super::intermediate_agg_result::IntermediateAggregationResults;
-use super::segment_agg_result::{
-    build_segment_agg_collector, AggregationLimitsGuard, SegmentAggregationCollector,
+use super::segment_agg_result::SegmentAggregationCollector;
+use super::AggContextParams;
+use crate::aggregation::agg_data::{
+    build_aggregations_data_from_req, build_segment_agg_collectors_root, AggregationsSegmentCtx,
 };
-use crate::aggregation::agg_req_with_accessor::get_aggs_with_segment_accessor_and_validate;
 use crate::collector::{Collector, SegmentCollector};
 use crate::index::SegmentReader;
 use crate::{DocId, SegmentOrdinal, TantivyError};
@@ -22,7 +22,7 @@ pub const DEFAULT_MEMORY_LIMIT: u64 = 500_000_000;
 /// The collector collects all aggregations by the underlying aggregation request.
 pub struct AggregationCollector {
    agg: Aggregations,
-    limits: AggregationLimitsGuard,
+    context: AggContextParams,
 }

 impl AggregationCollector {
@@ -30,8 +30,8 @@ impl AggregationCollector {
    ///
    /// Aggregation fails when the limits in `AggregationLimits` is exceeded. (memory limit and
    /// bucket limit)
-    pub fn from_aggs(agg: Aggregations, limits: AggregationLimitsGuard) -> Self {
-        Self { agg, limits }
+    pub fn from_aggs(agg: Aggregations, context: AggContextParams) -> Self {
+        Self { agg, context }
    }
 }

@@ -45,7 +45,7 @@ impl AggregationCollector {
 /// into the final `AggregationResults` via the `into_final_result()` method.
 pub struct DistributedAggregationCollector {
    agg: Aggregations,
-    limits: AggregationLimitsGuard,
+    context: AggContextParams,
 }

 impl DistributedAggregationCollector {
@@ -53,8 +53,8 @@ impl DistributedAggregationCollector {
    ///
    /// Aggregation fails when the limits in `AggregationLimits` is exceeded. (memory limit and
    /// bucket limit)
-    pub fn from_aggs(agg: Aggregations, limits: AggregationLimitsGuard) -> Self {
-        Self { agg, limits }
+    pub fn from_aggs(agg: Aggregations, context: AggContextParams) -> Self {
+        Self { agg, context }
    }
 }

@@ -72,7 +72,7 @@ impl Collector for DistributedAggregationCollector {
            &self.agg,
            reader,
            segment_local_id,
-            &self.limits,
+            &self.context,
        )
    }

@@ -102,7 +102,7 @@ impl Collector for AggregationCollector {
            &self.agg,
            reader,
            segment_local_id,
-            &self.limits,
+            &self.context,
        )
    }

@@ -115,7 +115,7 @@ impl Collector for AggregationCollector {
        segment_fruits: Vec<<Self::Child as SegmentCollector>::Fruit>,
    ) -> crate::Result<Self::Fruit> {
        let res = merge_fruits(segment_fruits)?;
-        res.into_final_result(self.agg.clone(), self.limits.clone())
+        res.into_final_result(self.agg.clone(), self.context.limits.clone())
    }
 }

@@ -135,7 +135,7 @@ fn merge_fruits(

 /// `AggregationSegmentCollector` does the aggregation collection on a segment.
 pub struct AggregationSegmentCollector {
-    aggs_with_accessor: AggregationsWithAccessor,
+    aggs_with_accessor: AggregationsSegmentCtx,
    agg_collector: BufAggregationCollector,
    error: Option<TantivyError>,
 }
@@ -147,14 +147,15 @@ impl AggregationSegmentCollector {
        agg: &Aggregations,
        reader: &SegmentReader,
        segment_ordinal: SegmentOrdinal,
-        limits: &AggregationLimitsGuard,
+        context: &AggContextParams,
    ) -> crate::Result<Self> {
-        let mut aggs_with_accessor =
-            get_aggs_with_segment_accessor_and_validate(agg, reader, segment_ordinal, limits)?;
+        let mut agg_data =
+            build_aggregations_data_from_req(agg, reader, segment_ordinal, context.clone())?;
        let result =
-            BufAggregationCollector::new(build_segment_agg_collector(&mut aggs_with_accessor)?);
+            BufAggregationCollector::new(build_segment_agg_collectors_root(&mut agg_data)?);
+
        Ok(AggregationSegmentCollector {
-            aggs_with_accessor,
+            aggs_with_accessor: agg_data,
            agg_collector: result,
            error: None,
        })
--- a/src/aggregation/intermediate_agg_result.rs
+++ b/src/aggregation/intermediate_agg_result.rs
@@ -24,7 +24,9 @@ use super::metric::{
 };
 use super::segment_agg_result::AggregationLimitsGuard;
 use super::{format_date, AggregationError, Key, SerializedKey};
-use crate::aggregation::agg_result::{AggregationResults, BucketEntries, BucketEntry};
+use crate::aggregation::agg_result::{
+    AggregationResults, BucketEntries, BucketEntry, FilterBucketResult,
+};
 use crate::aggregation::bucket::TermsAggregationInternal;
 use crate::aggregation::metric::CardinalityCollector;
 use crate::TantivyError;
@@ -179,12 +181,17 @@ impl IntermediateAggregationResults {
    }

    /// Merge another intermediate aggregation result into this result.
-    ///
-    /// The order of the values need to be the same on both results. This is ensured when the same
-    /// (key values) are present on the underlying `VecWithNames` struct.
-    pub fn merge_fruits(&mut self, other: IntermediateAggregationResults) -> crate::Result<()> {
-        for (left, right) in self.aggs_res.values_mut().zip(other.aggs_res.into_values()) {
-            left.merge_fruits(right)?;
+    pub fn merge_fruits(&mut self, mut other: IntermediateAggregationResults) -> crate::Result<()> {
+        for (key, left) in self.aggs_res.iter_mut() {
+            if let Some(key) = other.aggs_res.remove(key) {
+                left.merge_fruits(key)?;
+            }
+        }
+        // Move remainder of other aggs_res into self.
+        // Note: Currently we don't expect this to happen, as we create empty intermediate results
+        // via [IntermediateAggregationResults::empty_from_req].
+        for (key, value) in other.aggs_res {
+            self.aggs_res.insert(key, value);
        }
        Ok(())
    }
@@ -241,11 +248,16 @@ pub(crate) fn empty_from_req(req: &Aggregation) -> IntermediateAggregationResult
        Cardinality(_) => IntermediateAggregationResult::Metric(
            IntermediateMetricResult::Cardinality(CardinalityCollector::default()),
        ),
+        Filter(_) => IntermediateAggregationResult::Bucket(IntermediateBucketResult::Filter {
+            doc_count: 0,
+            sub_aggregations: IntermediateAggregationResults::default(),
+        }),
    }
 }

 /// An aggregation is either a bucket or a metric.
 #[derive(Clone, Debug, PartialEq, Serialize, Deserialize)]
+#[allow(clippy::large_enum_variant)]
 pub enum IntermediateAggregationResult {
    /// Bucket variant
    Bucket(IntermediateBucketResult),
@@ -426,6 +438,13 @@ pub enum IntermediateBucketResult {
        /// The term buckets
        buckets: IntermediateTermBucketResult,
    },
+    /// Filter aggregation - a single bucket with sub-aggregations
+    Filter {
+        /// Document count in the filter bucket
+        doc_count: u64,
+        /// Sub-aggregation results
+        sub_aggregations: IntermediateAggregationResults,
+    },
 }

 impl IntermediateBucketResult {
@@ -509,6 +528,18 @@ impl IntermediateBucketResult {
                req.sub_aggregation(),
                limits,
            ),
+            IntermediateBucketResult::Filter {
+                doc_count,
+                sub_aggregations,
+            } => {
+                // Convert sub-aggregation results to final format
+                let final_sub_aggregations = sub_aggregations
+                    .into_final_result(req.sub_aggregation().clone(), limits.clone())?;
+                Ok(BucketResult::Filter(FilterBucketResult {
+                    doc_count,
+                    sub_aggregations: final_sub_aggregations,
+                }))
+            }
        }
    }

@@ -562,6 +593,19 @@ impl IntermediateBucketResult {

                *buckets_left = buckets?;
            }
+            (
+                IntermediateBucketResult::Filter {
+                    doc_count: doc_count_left,
+                    sub_aggregations: sub_aggs_left,
+                },
+                IntermediateBucketResult::Filter {
+                    doc_count: doc_count_right,
+                    sub_aggregations: sub_aggs_right,
+                },
+            ) => {
+                *doc_count_left += doc_count_right;
+                sub_aggs_left.merge_fruits(sub_aggs_right)?;
+            }
            (IntermediateBucketResult::Range(_), _) => {
                panic!("try merge on different types")
            }
@@ -571,6 +615,9 @@ impl IntermediateBucketResult {
            (IntermediateBucketResult::Terms { .. }, _) => {
                panic!("try merge on different types")
            }
+            (IntermediateBucketResult::Filter { .. }, _) => {
+                panic!("try merge on different types")
+            }
        }
        Ok(())
    }
--- a/src/aggregation/metric/cardinality.rs
+++ b/src/aggregation/metric/cardinality.rs
@@ -2,15 +2,13 @@ use std::collections::hash_map::DefaultHasher;
 use std::hash::{BuildHasher, Hasher};

 use columnar::column_values::CompactSpaceU64Accessor;
-use columnar::Dictionary;
+use columnar::{Column, ColumnBlockAccessor, ColumnType, Dictionary, StrColumn};
 use common::f64_to_u64;
 use hyperloglogplus::{HyperLogLog, HyperLogLogPlus};
 use rustc_hash::FxHashSet;
 use serde::{Deserialize, Serialize};

-use crate::aggregation::agg_req_with_accessor::{
-    AggregationWithAccessor, AggregationsWithAccessor,
-};
+use crate::aggregation::agg_data::AggregationsSegmentCtx;
 use crate::aggregation::intermediate_agg_result::{
    IntermediateAggregationResult, IntermediateAggregationResults, IntermediateMetricResult,
 };
@@ -97,6 +95,32 @@ pub struct CardinalityAggregationReq {
    pub missing: Option<Key>,
 }

+/// Contains all information required by the SegmentCardinalityCollector to perform the
+/// cardinality aggregation on a segment.
+pub struct CardinalityAggReqData {
+    /// The column accessor to access the fast field values.
+    pub accessor: Column<u64>,
+    /// The column_type of the field.
+    pub column_type: ColumnType,
+    /// The string dictionary column if the field is of type string.
+    pub str_dict_column: Option<StrColumn>,
+    /// The missing value normalized to the internal u64 representation of the field type.
+    pub missing_value_for_accessor: Option<u64>,
+    /// The column block accessor to access the fast field values.
+    pub(crate) column_block_accessor: ColumnBlockAccessor<u64>,
+    /// The name of the aggregation.
+    pub name: String,
+    /// The aggregation request.
+    pub req: CardinalityAggregationReq,
+}
+
+impl CardinalityAggReqData {
+    /// Estimate the memory consumption of this struct in bytes.
+    pub fn get_memory_consumption(&self) -> usize {
+        std::mem::size_of::<Self>()
+    }
+}
+
 impl CardinalityAggregationReq {
    /// Creates a new [`CardinalityAggregationReq`] instance from a field name.
    pub fn from_field_name(field_name: String) -> Self {
@@ -115,47 +139,44 @@ impl CardinalityAggregationReq {
 pub(crate) struct SegmentCardinalityCollector {
    cardinality: CardinalityCollector,
    entries: FxHashSet<u64>,
-    column_type: ColumnType,
    accessor_idx: usize,
-    missing: Option<Key>,
 }

 impl SegmentCardinalityCollector {
-    pub fn from_req(column_type: ColumnType, accessor_idx: usize, missing: &Option<Key>) -> Self {
+    pub fn from_req(column_type: ColumnType, accessor_idx: usize) -> Self {
        Self {
            cardinality: CardinalityCollector::new(column_type as u8),
            entries: Default::default(),
-            column_type,
            accessor_idx,
-            missing: missing.clone(),
        }
    }

    fn fetch_block_with_field(
        &mut self,
        docs: &[crate::DocId],
-        agg_accessor: &mut AggregationWithAccessor,
+        agg_data: &mut CardinalityAggReqData,
    ) {
-        if let Some(missing) = agg_accessor.missing_value_for_accessor {
-            agg_accessor.column_block_accessor.fetch_block_with_missing(
+        if let Some(missing) = agg_data.missing_value_for_accessor {
+            agg_data.column_block_accessor.fetch_block_with_missing(
                docs,
-                &agg_accessor.accessor,
+                &agg_data.accessor,
                missing,
            );
        } else {
-            agg_accessor
+            agg_data
                .column_block_accessor
-                .fetch_block(docs, &agg_accessor.accessor);
+                .fetch_block(docs, &agg_data.accessor);
        }
    }

    fn into_intermediate_metric_result(
        mut self,
-        agg_with_accessor: &AggregationWithAccessor,
+        agg_data: &AggregationsSegmentCtx,
    ) -> crate::Result<IntermediateMetricResult> {
-        if self.column_type == ColumnType::Str {
+        let req_data = &agg_data.get_cardinality_req_data(self.accessor_idx);
+        if req_data.column_type == ColumnType::Str {
            let fallback_dict = Dictionary::empty();
-            let dict = agg_with_accessor
+            let dict = req_data
                .str_dict_column
                .as_ref()
                .map(|el| el.dictionary())
@@ -180,10 +201,10 @@ impl SegmentCardinalityCollector {
            })?;
            if has_missing {
                // Replace missing with the actual value provided
-                let missing_key = self
-                    .missing
-                    .as_ref()
-                    .expect("Found sentinel value u64::MAX for term_ord but `missing` is not set");
+                let missing_key =
+                    req_data.req.missing.as_ref().expect(
+                        "Found sentinel value u64::MAX for term_ord but `missing` is not set",
+                    );
                match missing_key {
                    Key::Str(missing) => {
                        self.cardinality.sketch.insert_any(&missing);
@@ -209,13 +230,13 @@ impl SegmentCardinalityCollector {
 impl SegmentAggregationCollector for SegmentCardinalityCollector {
    fn add_intermediate_aggregation_result(
        self: Box<Self>,
-        agg_with_accessor: &AggregationsWithAccessor,
+        agg_data: &AggregationsSegmentCtx,
        results: &mut IntermediateAggregationResults,
    ) -> crate::Result<()> {
-        let name = agg_with_accessor.aggs.keys[self.accessor_idx].to_string();
-        let agg_with_accessor = &agg_with_accessor.aggs.values[self.accessor_idx];
+        let req_data = &agg_data.get_cardinality_req_data(self.accessor_idx);
+        let name = req_data.name.to_string();

-        let intermediate_result = self.into_intermediate_metric_result(agg_with_accessor)?;
+        let intermediate_result = self.into_intermediate_metric_result(agg_data)?;
        results.push(
            name,
            IntermediateAggregationResult::Metric(intermediate_result),
@@ -227,26 +248,26 @@ impl SegmentAggregationCollector for SegmentCardinalityCollector {
    fn collect(
        &mut self,
        doc: crate::DocId,
-        agg_with_accessor: &mut AggregationsWithAccessor,
+        agg_data: &mut AggregationsSegmentCtx,
    ) -> crate::Result<()> {
-        self.collect_block(&[doc], agg_with_accessor)
+        self.collect_block(&[doc], agg_data)
    }

    fn collect_block(
        &mut self,
        docs: &[crate::DocId],
-        agg_with_accessor: &mut AggregationsWithAccessor,
+        agg_data: &mut AggregationsSegmentCtx,
    ) -> crate::Result<()> {
-        let bucket_agg_accessor = &mut agg_with_accessor.aggs.values[self.accessor_idx];
-        self.fetch_block_with_field(docs, bucket_agg_accessor);
+        let req_data = agg_data.get_cardinality_req_data_mut(self.accessor_idx);
+        self.fetch_block_with_field(docs, req_data);

-        let col_block_accessor = &bucket_agg_accessor.column_block_accessor;
-        if self.column_type == ColumnType::Str {
+        let col_block_accessor = &req_data.column_block_accessor;
+        if req_data.column_type == ColumnType::Str {
            for term_ord in col_block_accessor.iter_vals() {
                self.entries.insert(term_ord);
            }
-        } else if self.column_type == ColumnType::IpAddr {
-            let compact_space_accessor = bucket_agg_accessor
+        } else if req_data.column_type == ColumnType::IpAddr {
+            let compact_space_accessor = req_data
                .accessor
                .values
                .clone()
--- a/src/aggregation/metric/extended_stats.rs
+++ b/src/aggregation/metric/extended_stats.rs
@@ -4,12 +4,11 @@ use std::mem;
 use serde::{Deserialize, Serialize};

 use super::*;
-use crate::aggregation::agg_req_with_accessor::{
-    AggregationWithAccessor, AggregationsWithAccessor,
-};
+use crate::aggregation::agg_data::AggregationsSegmentCtx;
 use crate::aggregation::intermediate_agg_result::{
    IntermediateAggregationResult, IntermediateAggregationResults, IntermediateMetricResult,
 };
+use crate::aggregation::metric::MetricAggReqData;
 use crate::aggregation::segment_agg_result::SegmentAggregationCollector;
 use crate::aggregation::*;
 use crate::{DocId, TantivyError};
@@ -63,7 +62,7 @@ impl ExtendedStatsAggregation {

 /// Extended stats contains a collection of statistics
 /// they extends stats adding variance, standard deviation
-/// and bound informations
+/// and bound information
 #[derive(Clone, Debug, PartialEq, Serialize, Deserialize)]
 pub struct ExtendedStats {
    /// The number of documents.
@@ -348,20 +347,20 @@ impl SegmentExtendedStatsCollector {
    pub(crate) fn collect_block_with_field(
        &mut self,
        docs: &[DocId],
-        agg_accessor: &mut AggregationWithAccessor,
+        req_data: &mut MetricAggReqData,
    ) {
        if let Some(missing) = self.missing.as_ref() {
-            agg_accessor.column_block_accessor.fetch_block_with_missing(
+            req_data.column_block_accessor.fetch_block_with_missing(
                docs,
-                &agg_accessor.accessor,
+                &req_data.accessor,
                *missing,
            );
        } else {
-            agg_accessor
+            req_data
                .column_block_accessor
-                .fetch_block(docs, &agg_accessor.accessor);
+                .fetch_block(docs, &req_data.accessor);
        }
-        for val in agg_accessor.column_block_accessor.iter_vals() {
+        for val in req_data.column_block_accessor.iter_vals() {
            let val1 = f64_from_fastfield_u64(val, &self.field_type);
            self.extended_stats.collect(val1);
        }
@@ -372,10 +371,10 @@ impl SegmentAggregationCollector for SegmentExtendedStatsCollector {
    #[inline]
    fn add_intermediate_aggregation_result(
        self: Box<Self>,
-        agg_with_accessor: &AggregationsWithAccessor,
+        agg_data: &AggregationsSegmentCtx,
        results: &mut IntermediateAggregationResults,
    ) -> crate::Result<()> {
-        let name = agg_with_accessor.aggs.keys[self.accessor_idx].to_string();
+        let name = agg_data.get_metric_req_data(self.accessor_idx).name.clone();
        results.push(
            name,
            IntermediateAggregationResult::Metric(IntermediateMetricResult::ExtendedStats(
@@ -390,12 +389,12 @@ impl SegmentAggregationCollector for SegmentExtendedStatsCollector {
    fn collect(
        &mut self,
        doc: crate::DocId,
-        agg_with_accessor: &mut AggregationsWithAccessor,
+        agg_data: &mut AggregationsSegmentCtx,
    ) -> crate::Result<()> {
-        let field = &agg_with_accessor.aggs.values[self.accessor_idx].accessor;
+        let req_data = agg_data.get_metric_req_data(self.accessor_idx);
        if let Some(missing) = self.missing {
            let mut has_val = false;
-            for val in field.values_for_doc(doc) {
+            for val in req_data.accessor.values_for_doc(doc) {
                let val1 = f64_from_fastfield_u64(val, &self.field_type);
                self.extended_stats.collect(val1);
                has_val = true;
@@ -405,7 +404,7 @@ impl SegmentAggregationCollector for SegmentExtendedStatsCollector {
                    .collect(f64_from_fastfield_u64(missing, &self.field_type));
            }
        } else {
-            for val in field.values_for_doc(doc) {
+            for val in req_data.accessor.values_for_doc(doc) {
                let val1 = f64_from_fastfield_u64(val, &self.field_type);
                self.extended_stats.collect(val1);
            }
@@ -418,10 +417,10 @@ impl SegmentAggregationCollector for SegmentExtendedStatsCollector {
    fn collect_block(
        &mut self,
        docs: &[crate::DocId],
-        agg_with_accessor: &mut AggregationsWithAccessor,
+        agg_data: &mut AggregationsSegmentCtx,
    ) -> crate::Result<()> {
-        let field = &mut agg_with_accessor.aggs.values[self.accessor_idx];
-        self.collect_block_with_field(docs, field);
+        let req_data = agg_data.get_metric_req_data_mut(self.accessor_idx);
+        self.collect_block_with_field(docs, req_data);
        Ok(())
    }
 }
--- a/src/aggregation/metric/mod.rs
+++ b/src/aggregation/metric/mod.rs
@@ -31,6 +31,7 @@ use std::collections::HashMap;

 pub use average::*;
 pub use cardinality::*;
+use columnar::{Column, ColumnBlockAccessor, ColumnType};
 pub use count::*;
 pub use extended_stats::*;
 pub use max::*;
@@ -44,6 +45,35 @@ pub use top_hits::*;

 use crate::schema::OwnedValue;

+/// Contains all information required by metric aggregations like avg, min, max, sum, stats,
+/// extended_stats, count, percentiles.
+#[repr(C)]
+pub struct MetricAggReqData {
+    /// True if the field is of number or date type.
+    pub is_number_or_date_type: bool,
+    /// The type of the field.
+    pub field_type: ColumnType,
+    /// The missing value normalized to the internal u64 representation of the field type.
+    pub missing_u64: Option<u64>,
+    /// The column block accessor to access the fast field values.
+    pub column_block_accessor: ColumnBlockAccessor<u64>,
+    /// The column accessor to access the fast field values.
+    pub accessor: Column<u64>,
+    /// Used when converting to intermediate result
+    pub collecting_for: StatsType,
+    /// The missing value
+    pub missing: Option<f64>,
+    /// The name of the aggregation.
+    pub name: String,
+}
+
+impl MetricAggReqData {
+    /// Estimate the memory consumption of this struct in bytes.
+    pub fn get_memory_consumption(&self) -> usize {
+        std::mem::size_of::<Self>()
+    }
+}
+
 /// Single-metric aggregations use this common result structure.
 ///
 /// Main reason to wrap it in value is to match elasticsearch output structure.
--- a/src/aggregation/metric/percentiles.rs
+++ b/src/aggregation/metric/percentiles.rs
@@ -3,12 +3,11 @@ use std::fmt::Debug;
 use serde::{Deserialize, Serialize};

 use super::*;
-use crate::aggregation::agg_req_with_accessor::{
-    AggregationWithAccessor, AggregationsWithAccessor,
-};
+use crate::aggregation::agg_data::AggregationsSegmentCtx;
 use crate::aggregation::intermediate_agg_result::{
    IntermediateAggregationResult, IntermediateAggregationResults, IntermediateMetricResult,
 };
+use crate::aggregation::metric::MetricAggReqData;
 use crate::aggregation::segment_agg_result::SegmentAggregationCollector;
 use crate::aggregation::*;
 use crate::{DocId, TantivyError};
@@ -112,7 +111,8 @@ impl PercentilesAggregationReq {
        &self.field
    }

-    fn validate(&self) -> crate::Result<()> {
+    /// Validates the request parameters.
+    pub fn validate(&self) -> crate::Result<()> {
        if let Some(percents) = self.percents.as_ref() {
            let all_in_range = percents
                .iter()
@@ -133,10 +133,8 @@ impl PercentilesAggregationReq {

 #[derive(Clone, Debug, PartialEq)]
 pub(crate) struct SegmentPercentilesCollector {
-    field_type: ColumnType,
    pub(crate) percentiles: PercentilesCollector,
    pub(crate) accessor_idx: usize,
-    missing: Option<u64>,
 }

 #[derive(Clone, Serialize, Deserialize)]
@@ -231,43 +229,32 @@ impl PercentilesCollector {
 }

 impl SegmentPercentilesCollector {
-    pub fn from_req_and_validate(
-        req: &PercentilesAggregationReq,
-        field_type: ColumnType,
-        accessor_idx: usize,
-    ) -> crate::Result<Self> {
-        req.validate()?;
-        let missing = req
-            .missing
-            .and_then(|val| f64_to_fastfield_u64(val, &field_type));
-
+    pub fn from_req_and_validate(accessor_idx: usize) -> crate::Result<Self> {
        Ok(Self {
-            field_type,
            percentiles: PercentilesCollector::new(),
            accessor_idx,
-            missing,
        })
    }
    #[inline]
    pub(crate) fn collect_block_with_field(
        &mut self,
        docs: &[DocId],
-        agg_accessor: &mut AggregationWithAccessor,
+        req_data: &mut MetricAggReqData,
    ) {
-        if let Some(missing) = self.missing.as_ref() {
-            agg_accessor.column_block_accessor.fetch_block_with_missing(
+        if let Some(missing) = req_data.missing_u64.as_ref() {
+            req_data.column_block_accessor.fetch_block_with_missing(
                docs,
-                &agg_accessor.accessor,
+                &req_data.accessor,
                *missing,
            );
        } else {
-            agg_accessor
+            req_data
                .column_block_accessor
-                .fetch_block(docs, &agg_accessor.accessor);
+                .fetch_block(docs, &req_data.accessor);
        }

-        for val in agg_accessor.column_block_accessor.iter_vals() {
-            let val1 = f64_from_fastfield_u64(val, &self.field_type);
+        for val in req_data.column_block_accessor.iter_vals() {
+            let val1 = f64_from_fastfield_u64(val, &req_data.field_type);
            self.percentiles.collect(val1);
        }
    }
@@ -277,10 +264,10 @@ impl SegmentAggregationCollector for SegmentPercentilesCollector {
    #[inline]
    fn add_intermediate_aggregation_result(
        self: Box<Self>,
-        agg_with_accessor: &AggregationsWithAccessor,
+        agg_data: &AggregationsSegmentCtx,
        results: &mut IntermediateAggregationResults,
    ) -> crate::Result<()> {
-        let name = agg_with_accessor.aggs.keys[self.accessor_idx].to_string();
+        let name = agg_data.get_metric_req_data(self.accessor_idx).name.clone();
        let intermediate_metric_result = IntermediateMetricResult::Percentiles(self.percentiles);

        results.push(
@@ -295,24 +282,24 @@ impl SegmentAggregationCollector for SegmentPercentilesCollector {
    fn collect(
        &mut self,
        doc: crate::DocId,
-        agg_with_accessor: &mut AggregationsWithAccessor,
+        agg_data: &mut AggregationsSegmentCtx,
    ) -> crate::Result<()> {
-        let field = &agg_with_accessor.aggs.values[self.accessor_idx].accessor;
+        let req_data = agg_data.get_metric_req_data(self.accessor_idx);

-        if let Some(missing) = self.missing {
+        if let Some(missing) = req_data.missing_u64 {
            let mut has_val = false;
-            for val in field.values_for_doc(doc) {
-                let val1 = f64_from_fastfield_u64(val, &self.field_type);
+            for val in req_data.accessor.values_for_doc(doc) {
+                let val1 = f64_from_fastfield_u64(val, &req_data.field_type);
                self.percentiles.collect(val1);
                has_val = true;
            }
            if !has_val {
                self.percentiles
-                    .collect(f64_from_fastfield_u64(missing, &self.field_type));
+                    .collect(f64_from_fastfield_u64(missing, &req_data.field_type));
            }
        } else {
-            for val in field.values_for_doc(doc) {
-                let val1 = f64_from_fastfield_u64(val, &self.field_type);
+            for val in req_data.accessor.values_for_doc(doc) {
+                let val1 = f64_from_fastfield_u64(val, &req_data.field_type);
                self.percentiles.collect(val1);
            }
        }
@@ -324,10 +311,10 @@ impl SegmentAggregationCollector for SegmentPercentilesCollector {
    fn collect_block(
        &mut self,
        docs: &[crate::DocId],
-        agg_with_accessor: &mut AggregationsWithAccessor,
+        agg_data: &mut AggregationsSegmentCtx,
    ) -> crate::Result<()> {
-        let field = &mut agg_with_accessor.aggs.values[self.accessor_idx];
-        self.collect_block_with_field(docs, field);
+        let req_data = agg_data.get_metric_req_data_mut(self.accessor_idx);
+        self.collect_block_with_field(docs, req_data);
        Ok(())
    }
 }
--- a/src/aggregation/metric/stats.rs
+++ b/src/aggregation/metric/stats.rs
@@ -3,12 +3,11 @@ use std::fmt::Debug;
 use serde::{Deserialize, Serialize};

 use super::*;
-use crate::aggregation::agg_req_with_accessor::{
-    AggregationWithAccessor, AggregationsWithAccessor,
-};
+use crate::aggregation::agg_data::AggregationsSegmentCtx;
 use crate::aggregation::intermediate_agg_result::{
    IntermediateAggregationResult, IntermediateAggregationResults, IntermediateMetricResult,
 };
+use crate::aggregation::metric::MetricAggReqData;
 use crate::aggregation::segment_agg_result::SegmentAggregationCollector;
 use crate::aggregation::*;
 use crate::{DocId, TantivyError};
@@ -166,74 +165,65 @@ impl IntermediateStats {
    }
 }

-#[derive(Clone, Debug, PartialEq)]
-pub(crate) enum SegmentStatsType {
+/// The type of stats aggregation to perform.
+/// Note that not all stats types are supported in the stats aggregation.
+#[derive(Clone, Copy, Debug)]
+pub enum StatsType {
+    /// The average of the values.
    Average,
+    /// The count of the values.
    Count,
+    /// The maximum value.
    Max,
+    /// The minimum value.
    Min,
+    /// The stats (count, sum, min, max, avg) of the values.
    Stats,
+    /// The extended stats (count, sum, min, max, avg, sum_of_squares, variance, std_deviation,
+    ExtendedStats(Option<f64>), // sigma
+    /// The sum of the values.
    Sum,
+    /// The percentiles of the values.
+    Percentiles,
 }

-#[derive(Clone, Debug, PartialEq)]
+#[derive(Clone, Debug)]
 pub(crate) struct SegmentStatsCollector {
-    missing: Option<u64>,
-    field_type: ColumnType,
-    pub(crate) collecting_for: SegmentStatsType,
    pub(crate) stats: IntermediateStats,
    pub(crate) accessor_idx: usize,
-    val_cache: Vec<u64>,
 }

 impl SegmentStatsCollector {
-    pub fn from_req(
-        field_type: ColumnType,
-        collecting_for: SegmentStatsType,
-        accessor_idx: usize,
-        missing: Option<f64>,
-    ) -> Self {
-        let missing = missing.and_then(|val| f64_to_fastfield_u64(val, &field_type));
+    pub fn from_req(accessor_idx: usize) -> Self {
        Self {
-            field_type,
-            collecting_for,
            stats: IntermediateStats::default(),
            accessor_idx,
-            missing,
-            val_cache: Default::default(),
        }
    }
    #[inline]
    pub(crate) fn collect_block_with_field(
        &mut self,
        docs: &[DocId],
-        agg_accessor: &mut AggregationWithAccessor,
+        req_data: &mut MetricAggReqData,
    ) {
-        if let Some(missing) = self.missing.as_ref() {
-            agg_accessor.column_block_accessor.fetch_block_with_missing(
+        if let Some(missing) = req_data.missing_u64.as_ref() {
+            req_data.column_block_accessor.fetch_block_with_missing(
                docs,
-                &agg_accessor.accessor,
+                &req_data.accessor,
                *missing,
            );
        } else {
-            agg_accessor
+            req_data
                .column_block_accessor
-                .fetch_block(docs, &agg_accessor.accessor);
+                .fetch_block(docs, &req_data.accessor);
        }
-        if [
-            ColumnType::I64,
-            ColumnType::U64,
-            ColumnType::F64,
-            ColumnType::DateTime,
-        ]
-        .contains(&self.field_type)
-        {
-            for val in agg_accessor.column_block_accessor.iter_vals() {
-                let val1 = f64_from_fastfield_u64(val, &self.field_type);
+        if req_data.is_number_or_date_type {
+            for val in req_data.column_block_accessor.iter_vals() {
+                let val1 = f64_from_fastfield_u64(val, &req_data.field_type);
                self.stats.collect(val1);
            }
        } else {
-            for _val in agg_accessor.column_block_accessor.iter_vals() {
+            for _val in req_data.column_block_accessor.iter_vals() {
                // we ignore the value and simply record that we got something
                self.stats.collect(0.0);
            }
@@ -245,27 +235,28 @@ impl SegmentAggregationCollector for SegmentStatsCollector {
    #[inline]
    fn add_intermediate_aggregation_result(
        self: Box<Self>,
-        agg_with_accessor: &AggregationsWithAccessor,
+        agg_data: &AggregationsSegmentCtx,
        results: &mut IntermediateAggregationResults,
    ) -> crate::Result<()> {
-        let name = agg_with_accessor.aggs.keys[self.accessor_idx].to_string();
+        let req = agg_data.get_metric_req_data(self.accessor_idx);
+        let name = req.name.clone();

-        let intermediate_metric_result = match self.collecting_for {
-            SegmentStatsType::Average => {
+        let intermediate_metric_result = match req.collecting_for {
+            StatsType::Average => {
                IntermediateMetricResult::Average(IntermediateAverage::from_collector(*self))
            }
-            SegmentStatsType::Count => {
+            StatsType::Count => {
                IntermediateMetricResult::Count(IntermediateCount::from_collector(*self))
            }
-            SegmentStatsType::Max => {
-                IntermediateMetricResult::Max(IntermediateMax::from_collector(*self))
-            }
-            SegmentStatsType::Min => {
-                IntermediateMetricResult::Min(IntermediateMin::from_collector(*self))
-            }
-            SegmentStatsType::Stats => IntermediateMetricResult::Stats(self.stats),
-            SegmentStatsType::Sum => {
-                IntermediateMetricResult::Sum(IntermediateSum::from_collector(*self))
+            StatsType::Max => IntermediateMetricResult::Max(IntermediateMax::from_collector(*self)),
+            StatsType::Min => IntermediateMetricResult::Min(IntermediateMin::from_collector(*self)),
+            StatsType::Stats => IntermediateMetricResult::Stats(self.stats),
+            StatsType::Sum => IntermediateMetricResult::Sum(IntermediateSum::from_collector(*self)),
+            _ => {
+                return Err(TantivyError::InvalidArgument(format!(
+                    "Unsupported stats type for stats aggregation: {:?}",
+                    req.collecting_for
+                )))
            }
        };

@@ -281,23 +272,23 @@ impl SegmentAggregationCollector for SegmentStatsCollector {
    fn collect(
        &mut self,
        doc: crate::DocId,
-        agg_with_accessor: &mut AggregationsWithAccessor,
+        agg_data: &mut AggregationsSegmentCtx,
    ) -> crate::Result<()> {
-        let field = &agg_with_accessor.aggs.values[self.accessor_idx].accessor;
-        if let Some(missing) = self.missing {
+        let req_data = agg_data.get_metric_req_data(self.accessor_idx);
+        if let Some(missing) = req_data.missing_u64 {
            let mut has_val = false;
-            for val in field.values_for_doc(doc) {
-                let val1 = f64_from_fastfield_u64(val, &self.field_type);
+            for val in req_data.accessor.values_for_doc(doc) {
+                let val1 = f64_from_fastfield_u64(val, &req_data.field_type);
                self.stats.collect(val1);
                has_val = true;
            }
            if !has_val {
                self.stats
-                    .collect(f64_from_fastfield_u64(missing, &self.field_type));
+                    .collect(f64_from_fastfield_u64(missing, &req_data.field_type));
            }
        } else {
-            for val in field.values_for_doc(doc) {
-                let val1 = f64_from_fastfield_u64(val, &self.field_type);
+            for val in req_data.accessor.values_for_doc(doc) {
+                let val1 = f64_from_fastfield_u64(val, &req_data.field_type);
                self.stats.collect(val1);
            }
        }
@@ -309,10 +300,10 @@ impl SegmentAggregationCollector for SegmentStatsCollector {
    fn collect_block(
        &mut self,
        docs: &[crate::DocId],
-        agg_with_accessor: &mut AggregationsWithAccessor,
+        agg_data: &mut AggregationsSegmentCtx,
    ) -> crate::Result<()> {
-        let field = &mut agg_with_accessor.aggs.values[self.accessor_idx];
-        self.collect_block_with_field(docs, field);
+        let req_data = agg_data.get_metric_req_data_mut(self.accessor_idx);
+        self.collect_block_with_field(docs, req_data);
        Ok(())
    }
 }
--- a/src/aggregation/metric/top_hits.rs
+++ b/src/aggregation/metric/top_hits.rs
@@ -1,7 +1,8 @@
+use std::cmp::Ordering;
 use std::collections::HashMap;
 use std::net::Ipv6Addr;

-use columnar::{Column, ColumnType, ColumnarReader, DynamicColumn};
+use columnar::{Column, ColumnType, ColumnarReader, DynamicColumn, ValueRange};
 use common::json_path_writer::JSON_PATH_SEGMENT_SEP_STR;
 use common::DateTime;
 use regex::Regex;
@@ -9,15 +10,41 @@ use serde::ser::SerializeMap;
 use serde::{Deserialize, Deserializer, Serialize, Serializer};

 use super::{TopHitsMetricResult, TopHitsVecEntry};
+use crate::aggregation::agg_data::AggregationsSegmentCtx;
 use crate::aggregation::bucket::Order;
 use crate::aggregation::intermediate_agg_result::{
    IntermediateAggregationResult, IntermediateMetricResult,
 };
 use crate::aggregation::segment_agg_result::SegmentAggregationCollector;
 use crate::aggregation::AggregationError;
+use crate::collector::sort_key::{Comparator, ReverseComparator};
 use crate::collector::TopNComputer;
 use crate::schema::OwnedValue;
 use crate::{DocAddress, DocId, SegmentOrdinal};
+// duplicate import removed; already imported above
+
+/// Contains all information required by the TopHitsSegmentCollector to perform the
+/// top_hits aggregation on a segment.
+#[derive(Default)]
+pub struct TopHitsAggReqData {
+    /// The accessors to access the fast field values.
+    pub accessors: Vec<(Column<u64>, ColumnType)>,
+    /// The accessors to access the fast field values for retrieving document fields.
+    pub value_accessors: HashMap<String, Vec<DynamicColumn>>,
+    /// The ordinal of the segment this request data is for.
+    pub segment_ordinal: SegmentOrdinal,
+    /// The name of the aggregation.
+    pub name: String,
+    /// The top_hits aggregation request.
+    pub req: TopHitsAggregationReq,
+}
+
+impl TopHitsAggReqData {
+    /// Estimate the memory consumption of this struct in bytes.
+    pub fn get_memory_consumption(&self) -> usize {
+        std::mem::size_of::<Self>()
+    }
+}

 /// # Top Hits
 ///
@@ -357,7 +384,7 @@ impl From<FastFieldValue> for OwnedValue {

 /// Holds a fast field value in its u64 representation, and the order in which it should be sorted.
 #[derive(Clone, Serialize, Deserialize, Debug)]
-struct DocValueAndOrder {
+pub(crate) struct DocValueAndOrder {
    /// A fast field value in its u64 representation.
    value: Option<u64>,
    /// Sort order for the value
@@ -429,11 +456,42 @@ impl PartialEq for DocSortValuesAndFields {

 impl Eq for DocSortValuesAndFields {}

+impl Comparator<DocSortValuesAndFields> for ReverseComparator {
+    #[inline(always)]
+    fn compare(&self, lhs: &DocSortValuesAndFields, rhs: &DocSortValuesAndFields) -> Ordering {
+        rhs.cmp(lhs)
+    }
+
+    fn threshold_to_valuerange(
+        &self,
+        threshold: DocSortValuesAndFields,
+    ) -> ValueRange<DocSortValuesAndFields> {
+        ValueRange::LessThan(threshold, true)
+    }
+}
+
+#[derive(Clone, Debug, PartialEq, Eq, PartialOrd, Ord)]
+pub(crate) struct TopHitsSegmentSortKey(pub Vec<DocValueAndOrder>);
+
+impl Comparator<TopHitsSegmentSortKey> for ReverseComparator {
+    #[inline(always)]
+    fn compare(&self, lhs: &TopHitsSegmentSortKey, rhs: &TopHitsSegmentSortKey) -> Ordering {
+        rhs.cmp(lhs)
+    }
+
+    fn threshold_to_valuerange(
+        &self,
+        threshold: TopHitsSegmentSortKey,
+    ) -> ValueRange<TopHitsSegmentSortKey> {
+        ValueRange::LessThan(threshold, true)
+    }
+}
+
 /// The TopHitsCollector used for collecting over segments and merging results.
 #[derive(Clone, Serialize, Deserialize, Debug)]
 pub struct TopHitsTopNComputer {
    req: TopHitsAggregationReq,
-    top_n: TopNComputer<DocSortValuesAndFields, DocAddress, false>,
+    top_n: TopNComputer<DocSortValuesAndFields, DocAddress, ReverseComparator>,
 }

 impl std::cmp::PartialEq for TopHitsTopNComputer {
@@ -457,7 +515,7 @@ impl TopHitsTopNComputer {

    pub(crate) fn merge_fruits(&mut self, other_fruit: Self) -> crate::Result<()> {
        for doc in other_fruit.top_n.into_vec() {
-            self.collect(doc.feature, doc.doc);
+            self.collect(doc.sort_key, doc.doc);
        }
        Ok(())
    }
@@ -469,9 +527,9 @@ impl TopHitsTopNComputer {
            .into_sorted_vec()
            .into_iter()
            .map(|doc| TopHitsVecEntry {
-                sort: doc.feature.sorts.iter().map(|f| f.value).collect(),
+                sort: doc.sort_key.sorts.iter().map(|f| f.value).collect(),
                doc_value_fields: doc
-                    .feature
+                    .sort_key
                    .doc_value_fields
                    .into_iter()
                    .map(|(k, v)| (k, v.into()))
@@ -492,7 +550,7 @@ impl TopHitsTopNComputer {
 pub(crate) struct TopHitsSegmentCollector {
    segment_ordinal: SegmentOrdinal,
    accessor_idx: usize,
-    top_n: TopNComputer<Vec<DocValueAndOrder>, DocAddress, false>,
+    top_n: TopNComputer<TopHitsSegmentSortKey, DocAddress, ReverseComparator>,
 }

 impl TopHitsSegmentCollector {
@@ -513,13 +571,15 @@ impl TopHitsSegmentCollector {
        req: &TopHitsAggregationReq,
    ) -> TopHitsTopNComputer {
        let mut top_hits_computer = TopHitsTopNComputer::new(req);
+        // Map TopHitsSegmentSortKey back to Vec<DocValueAndOrder> if needed or use directly
+        // The TopNComputer here stores TopHitsSegmentSortKey.
        let top_results = self.top_n.into_vec();

        for res in top_results {
            let doc_value_fields = req.get_document_field_data(value_accessors, res.doc.doc_id);
            top_hits_computer.collect(
                DocSortValuesAndFields {
-                    sorts: res.feature,
+                    sorts: res.sort_key.0,
                    doc_value_fields,
                },
                res.doc,
@@ -553,7 +613,7 @@ impl TopHitsSegmentCollector {
            .collect();

        self.top_n.push(
-            sorts,
+            TopHitsSegmentSortKey(sorts),
            DocAddress {
                segment_ord: self.segment_ordinal,
                doc_id,
@@ -566,23 +626,18 @@ impl TopHitsSegmentCollector {
 impl SegmentAggregationCollector for TopHitsSegmentCollector {
    fn add_intermediate_aggregation_result(
        self: Box<Self>,
-        agg_with_accessor: &crate::aggregation::agg_req_with_accessor::AggregationsWithAccessor,
+        agg_data: &AggregationsSegmentCtx,
        results: &mut crate::aggregation::intermediate_agg_result::IntermediateAggregationResults,
    ) -> crate::Result<()> {
-        let name = agg_with_accessor.aggs.keys[self.accessor_idx].to_string();
+        let req_data = agg_data.get_top_hits_req_data(self.accessor_idx);

-        let value_accessors = &agg_with_accessor.aggs.values[self.accessor_idx].value_accessors;
-        let tophits_req = &agg_with_accessor.aggs.values[self.accessor_idx]
-            .agg
-            .agg
-            .as_top_hits()
-            .expect("aggregation request must be of type top hits");
+        let value_accessors = &req_data.value_accessors;

        let intermediate_result = IntermediateMetricResult::TopHits(
-            self.into_top_hits_collector(value_accessors, tophits_req),
+            self.into_top_hits_collector(value_accessors, &req_data.req),
        );
        results.push(
-            name,
+            req_data.name.to_string(),
            IntermediateAggregationResult::Metric(intermediate_result),
        )
    }
@@ -591,32 +646,22 @@ impl SegmentAggregationCollector for TopHitsSegmentCollector {
    fn collect(
        &mut self,
        doc_id: crate::DocId,
-        agg_with_accessor: &mut crate::aggregation::agg_req_with_accessor::AggregationsWithAccessor,
+        agg_data: &mut AggregationsSegmentCtx,
    ) -> crate::Result<()> {
-        let tophits_req = &agg_with_accessor.aggs.values[self.accessor_idx]
-            .agg
-            .agg
-            .as_top_hits()
-            .expect("aggregation request must be of type top hits");
-        let accessors = &agg_with_accessor.aggs.values[self.accessor_idx].accessors;
-        self.collect_with(doc_id, tophits_req, accessors)?;
+        let req_data = agg_data.get_top_hits_req_data(self.accessor_idx);
+        self.collect_with(doc_id, &req_data.req, &req_data.accessors)?;
        Ok(())
    }

    fn collect_block(
        &mut self,
        docs: &[crate::DocId],
-        agg_with_accessor: &mut crate::aggregation::agg_req_with_accessor::AggregationsWithAccessor,
+        agg_data: &mut AggregationsSegmentCtx,
    ) -> crate::Result<()> {
-        let tophits_req = &agg_with_accessor.aggs.values[self.accessor_idx]
-            .agg
-            .agg
-            .as_top_hits()
-            .expect("aggregation request must be of type top hits");
-        let accessors = &agg_with_accessor.aggs.values[self.accessor_idx].accessors;
+        let req_data = agg_data.get_top_hits_req_data(self.accessor_idx);
        // TODO: Consider getting fields with the column block accessor.
        for doc in docs {
-            self.collect_with(*doc, tophits_req, accessors)?;
+            self.collect_with(*doc, &req_data.req, &req_data.accessors)?;
        }
        Ok(())
    }
@@ -635,6 +680,7 @@ mod tests {
    use crate::aggregation::bucket::tests::get_test_index_from_docs;
    use crate::aggregation::tests::get_test_index_from_values;
    use crate::aggregation::AggregationCollector;
+    use crate::collector::sort_key::ReverseComparator;
    use crate::collector::ComparableDoc;
    use crate::query::AllQuery;
    use crate::schema::OwnedValue;
@@ -650,7 +696,7 @@ mod tests {

    fn collector_with_capacity(capacity: usize) -> super::TopHitsTopNComputer {
        super::TopHitsTopNComputer {
-            top_n: super::TopNComputer::new(capacity),
+            top_n: super::TopNComputer::new_with_comparator(capacity, ReverseComparator),
            req: Default::default(),
        }
    }
@@ -764,12 +810,12 @@ mod tests {
    #[test]
    fn test_top_hits_collector_single_feature() -> crate::Result<()> {
        let docs = vec![
-            ComparableDoc::<_, _, false> {
+            ComparableDoc::<_, _> {
                doc: crate::DocAddress {
                    segment_ord: 0,
                    doc_id: 0,
                },
-                feature: DocSortValuesAndFields {
+                sort_key: DocSortValuesAndFields {
                    sorts: vec![DocValueAndOrder {
                        value: Some(1),
                        order: Order::Asc,
@@ -782,7 +828,7 @@ mod tests {
                    segment_ord: 0,
                    doc_id: 2,
                },
-                feature: DocSortValuesAndFields {
+                sort_key: DocSortValuesAndFields {
                    sorts: vec![DocValueAndOrder {
                        value: Some(3),
                        order: Order::Asc,
@@ -795,7 +841,7 @@ mod tests {
                    segment_ord: 0,
                    doc_id: 1,
                },
-                feature: DocSortValuesAndFields {
+                sort_key: DocSortValuesAndFields {
                    sorts: vec![DocValueAndOrder {
                        value: Some(5),
                        order: Order::Asc,
@@ -807,7 +853,7 @@ mod tests {

        let mut collector = collector_with_capacity(3);
        for doc in docs.clone() {
-            collector.collect(doc.feature, doc.doc);
+            collector.collect(doc.sort_key, doc.doc);
        }

        let res = collector.into_final_result();
@@ -817,15 +863,15 @@ mod tests {
            super::TopHitsMetricResult {
                hits: vec![
                    super::TopHitsVecEntry {
-                        sort: vec![docs[0].feature.sorts[0].value],
+                        sort: vec![docs[0].sort_key.sorts[0].value],
                        doc_value_fields: Default::default(),
                    },
                    super::TopHitsVecEntry {
-                        sort: vec![docs[1].feature.sorts[0].value],
+                        sort: vec![docs[1].sort_key.sorts[0].value],
                        doc_value_fields: Default::default(),
                    },
                    super::TopHitsVecEntry {
-                        sort: vec![docs[2].feature.sorts[0].value],
+                        sort: vec![docs[2].sort_key.sorts[0].value],
                        doc_value_fields: Default::default(),
                    },
                ]
--- a/src/aggregation/mod.rs
+++ b/src/aggregation/mod.rs
@@ -127,9 +127,10 @@
 //! [`AggregationResults`](agg_result::AggregationResults) via the
 //! [`into_final_result`](intermediate_agg_result::IntermediateAggregationResults::into_final_result) method.

+mod accessor_helpers;
+mod agg_data;
 mod agg_limits;
 pub mod agg_req;
-mod agg_req_with_accessor;
 pub mod agg_result;
 pub mod bucket;
 mod buf_collector;
@@ -140,7 +141,6 @@ pub mod intermediate_agg_result;
 pub mod metric;

 mod segment_agg_result;
-use std::collections::HashMap;
 use std::fmt::Display;

 #[cfg(test)]
@@ -160,6 +160,28 @@ use itertools::Itertools;
 use serde::de::{self, Visitor};
 use serde::{Deserialize, Deserializer, Serialize};

+use crate::tokenizer::TokenizerManager;
+
+/// Context parameters for aggregation execution
+///
+/// This struct holds shared resources needed during aggregation execution:
+/// - `limits`: Memory and bucket limits for the aggregation
+/// - `tokenizers`: TokenizerManager for parsing query strings in filter aggregations
+#[derive(Clone, Default)]
+pub struct AggContextParams {
+    /// Aggregation limits (memory and bucket count)
+    pub limits: AggregationLimitsGuard,
+    /// Tokenizer manager for query string parsing
+    pub tokenizers: TokenizerManager,
+}
+
+impl AggContextParams {
+    /// Create new aggregation context parameters
+    pub fn new(limits: AggregationLimitsGuard, tokenizers: TokenizerManager) -> Self {
+        Self { limits, tokenizers }
+    }
+}
+
 fn parse_str_into_f64<E: de::Error>(value: &str) -> Result<f64, E> {
    let parsed = value
        .parse::<f64>()
@@ -257,80 +279,6 @@ where D: Deserializer<'de> {
    deserializer.deserialize_any(StringOrFloatVisitor)
 }

-/// Represents an associative array `(key => values)` in a very efficient manner.
-#[derive(PartialEq, Serialize, Deserialize)]
-pub(crate) struct VecWithNames<T> {
-    pub(crate) values: Vec<T>,
-    keys: Vec<String>,
-}
-
-impl<T: Clone> Clone for VecWithNames<T> {
-    fn clone(&self) -> Self {
-        Self {
-            values: self.values.clone(),
-            keys: self.keys.clone(),
-        }
-    }
-}
-
-impl<T> Default for VecWithNames<T> {
-    fn default() -> Self {
-        Self {
-            values: Default::default(),
-            keys: Default::default(),
-        }
-    }
-}
-
-impl<T: std::fmt::Debug> std::fmt::Debug for VecWithNames<T> {
-    fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
-        f.debug_map().entries(self.iter()).finish()
-    }
-}
-
-impl<T> From<HashMap<String, T>> for VecWithNames<T> {
-    fn from(map: HashMap<String, T>) -> Self {
-        VecWithNames::from_entries(map.into_iter().collect_vec())
-    }
-}
-
-impl<T> VecWithNames<T> {
-    fn from_entries(mut entries: Vec<(String, T)>) -> Self {
-        // Sort to ensure order of elements match across multiple instances
-        entries.sort_by(|left, right| left.0.cmp(&right.0));
-        let mut data = Vec::with_capacity(entries.len());
-        let mut data_names = Vec::with_capacity(entries.len());
-        for entry in entries {
-            data_names.push(entry.0);
-            data.push(entry.1);
-        }
-        VecWithNames {
-            values: data,
-            keys: data_names,
-        }
-    }
-    fn iter(&self) -> impl Iterator<Item = (&str, &T)> + '_ {
-        self.keys().zip(self.values.iter())
-    }
-    fn keys(&self) -> impl Iterator<Item = &str> + '_ {
-        self.keys.iter().map(|key| key.as_str())
-    }
-    fn values_mut(&mut self) -> impl Iterator<Item = &mut T> + '_ {
-        self.values.iter_mut()
-    }
-    fn is_empty(&self) -> bool {
-        self.keys.is_empty()
-    }
-    fn len(&self) -> usize {
-        self.keys.len()
-    }
-    fn get(&self, name: &str) -> Option<&T> {
-        self.keys()
-            .position(|key| key == name)
-            .map(|pos| &self.values[pos])
-    }
-}
-
 /// The serialized key is used in a `HashMap`.
 pub type SerializedKey = String;

@@ -464,7 +412,10 @@ mod tests {
        query: Option<(&str, &str)>,
        limits: AggregationLimitsGuard,
    ) -> crate::Result<Value> {
-        let collector = AggregationCollector::from_aggs(agg_req, limits);
+        let collector = AggregationCollector::from_aggs(
+            agg_req,
+            AggContextParams::new(limits, index.tokenizers().clone()),
+        );

        let reader = index.reader()?;
        let searcher = reader.searcher();
--- a/src/aggregation/segment_agg_result.rs
+++ b/src/aggregation/segment_agg_result.rs
@@ -6,48 +6,38 @@
 use std::fmt::Debug;

 pub(crate) use super::agg_limits::AggregationLimitsGuard;
-use super::agg_req::AggregationVariants;
-use super::agg_req_with_accessor::{AggregationWithAccessor, AggregationsWithAccessor};
-use super::bucket::{SegmentHistogramCollector, SegmentRangeCollector, SegmentTermCollector};
 use super::intermediate_agg_result::IntermediateAggregationResults;
-use super::metric::{
-    AverageAggregation, CountAggregation, ExtendedStatsAggregation, MaxAggregation, MinAggregation,
-    SegmentPercentilesCollector, SegmentStatsCollector, SegmentStatsType, StatsAggregation,
-    SumAggregation,
-};
-use crate::aggregation::bucket::TermMissingAgg;
-use crate::aggregation::metric::{
-    CardinalityAggregationReq, SegmentCardinalityCollector, SegmentExtendedStatsCollector,
-    TopHitsSegmentCollector,
-};
+use crate::aggregation::agg_data::AggregationsSegmentCtx;

-pub(crate) trait SegmentAggregationCollector: CollectorClone + Debug {
+/// A SegmentAggregationCollector is used to collect aggregation results.
+pub trait SegmentAggregationCollector: CollectorClone + Debug {
    fn add_intermediate_aggregation_result(
        self: Box<Self>,
-        agg_with_accessor: &AggregationsWithAccessor,
+        agg_data: &AggregationsSegmentCtx,
        results: &mut IntermediateAggregationResults,
    ) -> crate::Result<()>;

    fn collect(
        &mut self,
        doc: crate::DocId,
-        agg_with_accessor: &mut AggregationsWithAccessor,
+        agg_data: &mut AggregationsSegmentCtx,
    ) -> crate::Result<()>;

    fn collect_block(
        &mut self,
        docs: &[crate::DocId],
-        agg_with_accessor: &mut AggregationsWithAccessor,
+        agg_data: &mut AggregationsSegmentCtx,
    ) -> crate::Result<()>;

    /// Finalize method. Some Aggregator collect blocks of docs before calling `collect_block`.
    /// This method ensures those staged docs will be collected.
-    fn flush(&mut self, _agg_with_accessor: &mut AggregationsWithAccessor) -> crate::Result<()> {
+    fn flush(&mut self, _agg_data: &mut AggregationsSegmentCtx) -> crate::Result<()> {
        Ok(())
    }
 }

-pub(crate) trait CollectorClone {
+/// A helper trait to enable cloning of Box<dyn SegmentAggregationCollector>
+pub trait CollectorClone {
    fn clone_box(&self) -> Box<dyn SegmentAggregationCollector>;
 }

@@ -65,119 +55,6 @@ impl Clone for Box<dyn SegmentAggregationCollector> {
    }
 }

-pub(crate) fn build_segment_agg_collector(
-    req: &mut AggregationsWithAccessor,
-) -> crate::Result<Box<dyn SegmentAggregationCollector>> {
-    // Single collector special case
-    if req.aggs.len() == 1 {
-        let req = &mut req.aggs.values[0];
-        let accessor_idx = 0;
-        return build_single_agg_segment_collector(req, accessor_idx);
-    }
-
-    let agg = GenericSegmentAggregationResultsCollector::from_req_and_validate(req)?;
-    Ok(Box::new(agg))
-}
-
-pub(crate) fn build_single_agg_segment_collector(
-    req: &mut AggregationWithAccessor,
-    accessor_idx: usize,
-) -> crate::Result<Box<dyn SegmentAggregationCollector>> {
-    use AggregationVariants::*;
-    match &req.agg.agg {
-        Terms(terms_req) => {
-            if req.accessors.is_empty() {
-                Ok(Box::new(SegmentTermCollector::from_req_and_validate(
-                    terms_req,
-                    &mut req.sub_aggregation,
-                    req.field_type,
-                    accessor_idx,
-                )?))
-            } else {
-                Ok(Box::new(TermMissingAgg::new(
-                    accessor_idx,
-                    &mut req.sub_aggregation,
-                )?))
-            }
-        }
-        Range(range_req) => Ok(Box::new(SegmentRangeCollector::from_req_and_validate(
-            range_req,
-            &mut req.sub_aggregation,
-            &mut req.limits,
-            req.field_type,
-            accessor_idx,
-        )?)),
-        Histogram(histogram) => Ok(Box::new(SegmentHistogramCollector::from_req_and_validate(
-            histogram.clone(),
-            &mut req.sub_aggregation,
-            req.field_type,
-            accessor_idx,
-        )?)),
-        DateHistogram(histogram) => Ok(Box::new(SegmentHistogramCollector::from_req_and_validate(
-            histogram.to_histogram_req()?,
-            &mut req.sub_aggregation,
-            req.field_type,
-            accessor_idx,
-        )?)),
-        Average(AverageAggregation { missing, .. }) => {
-            Ok(Box::new(SegmentStatsCollector::from_req(
-                req.field_type,
-                SegmentStatsType::Average,
-                accessor_idx,
-                *missing,
-            )))
-        }
-        Count(CountAggregation { missing, .. }) => Ok(Box::new(SegmentStatsCollector::from_req(
-            req.field_type,
-            SegmentStatsType::Count,
-            accessor_idx,
-            *missing,
-        ))),
-        Max(MaxAggregation { missing, .. }) => Ok(Box::new(SegmentStatsCollector::from_req(
-            req.field_type,
-            SegmentStatsType::Max,
-            accessor_idx,
-            *missing,
-        ))),
-        Min(MinAggregation { missing, .. }) => Ok(Box::new(SegmentStatsCollector::from_req(
-            req.field_type,
-            SegmentStatsType::Min,
-            accessor_idx,
-            *missing,
-        ))),
-        Stats(StatsAggregation { missing, .. }) => Ok(Box::new(SegmentStatsCollector::from_req(
-            req.field_type,
-            SegmentStatsType::Stats,
-            accessor_idx,
-            *missing,
-        ))),
-        ExtendedStats(ExtendedStatsAggregation { missing, sigma, .. }) => Ok(Box::new(
-            SegmentExtendedStatsCollector::from_req(req.field_type, *sigma, accessor_idx, *missing),
-        )),
-        Sum(SumAggregation { missing, .. }) => Ok(Box::new(SegmentStatsCollector::from_req(
-            req.field_type,
-            SegmentStatsType::Sum,
-            accessor_idx,
-            *missing,
-        ))),
-        Percentiles(percentiles_req) => Ok(Box::new(
-            SegmentPercentilesCollector::from_req_and_validate(
-                percentiles_req,
-                req.field_type,
-                accessor_idx,
-            )?,
-        )),
-        TopHits(top_hits_req) => Ok(Box::new(TopHitsSegmentCollector::from_req(
-            top_hits_req,
-            accessor_idx,
-            req.segment_ordinal,
-        ))),
-        Cardinality(CardinalityAggregationReq { missing, .. }) => Ok(Box::new(
-            SegmentCardinalityCollector::from_req(req.field_type, accessor_idx, missing),
-        )),
-    }
-}
-
 #[derive(Clone, Default)]
 /// The GenericSegmentAggregationResultsCollector is the generic version of the collector, which
 /// can handle arbitrary complexity of  sub-aggregations. Ideally we never have to pick this one
@@ -197,11 +74,11 @@ impl Debug for GenericSegmentAggregationResultsCollector {
 impl SegmentAggregationCollector for GenericSegmentAggregationResultsCollector {
    fn add_intermediate_aggregation_result(
        self: Box<Self>,
-        agg_with_accessor: &AggregationsWithAccessor,
+        agg_data: &AggregationsSegmentCtx,
        results: &mut IntermediateAggregationResults,
    ) -> crate::Result<()> {
        for agg in self.aggs {
-            agg.add_intermediate_aggregation_result(agg_with_accessor, results)?;
+            agg.add_intermediate_aggregation_result(agg_data, results)?;
        }

        Ok(())
@@ -210,9 +87,9 @@ impl SegmentAggregationCollector for GenericSegmentAggregationResultsCollector {
    fn collect(
        &mut self,
        doc: crate::DocId,
-        agg_with_accessor: &mut AggregationsWithAccessor,
+        agg_data: &mut AggregationsSegmentCtx,
    ) -> crate::Result<()> {
-        self.collect_block(&[doc], agg_with_accessor)?;
+        self.collect_block(&[doc], agg_data)?;

        Ok(())
    }
@@ -220,32 +97,19 @@ impl SegmentAggregationCollector for GenericSegmentAggregationResultsCollector {
    fn collect_block(
        &mut self,
        docs: &[crate::DocId],
-        agg_with_accessor: &mut AggregationsWithAccessor,
+        agg_data: &mut AggregationsSegmentCtx,
    ) -> crate::Result<()> {
        for collector in &mut self.aggs {
-            collector.collect_block(docs, agg_with_accessor)?;
+            collector.collect_block(docs, agg_data)?;
        }

        Ok(())
    }

-    fn flush(&mut self, agg_with_accessor: &mut AggregationsWithAccessor) -> crate::Result<()> {
+    fn flush(&mut self, agg_data: &mut AggregationsSegmentCtx) -> crate::Result<()> {
        for collector in &mut self.aggs {
-            collector.flush(agg_with_accessor)?;
+            collector.flush(agg_data)?;
        }
        Ok(())
    }
 }
-
-impl GenericSegmentAggregationResultsCollector {
-    pub(crate) fn from_req_and_validate(req: &mut AggregationsWithAccessor) -> crate::Result<Self> {
-        let aggs = req
-            .aggs
-            .values_mut()
-            .enumerate()
-            .map(|(accessor_idx, req)| build_single_agg_segment_collector(req, accessor_idx))
-            .collect::<crate::Result<Vec<Box<dyn SegmentAggregationCollector>>>>()?;
-
-        Ok(GenericSegmentAggregationResultsCollector { aggs })
-    }
-}
--- a/src/collector/custom_score_top_collector.rs
+++ b/src/collector/custom_score_top_collector.rs
@@ -1,121 +0,0 @@
-use crate::collector::top_collector::{TopCollector, TopSegmentCollector};
-use crate::collector::{Collector, SegmentCollector};
-use crate::{DocAddress, DocId, Score, SegmentReader};
-
-pub(crate) struct CustomScoreTopCollector<TCustomScorer, TScore = Score> {
-    custom_scorer: TCustomScorer,
-    collector: TopCollector<TScore>,
-}
-
-impl<TCustomScorer, TScore> CustomScoreTopCollector<TCustomScorer, TScore>
-where TScore: Clone + PartialOrd
-{
-    pub(crate) fn new(
-        custom_scorer: TCustomScorer,
-        collector: TopCollector<TScore>,
-    ) -> CustomScoreTopCollector<TCustomScorer, TScore> {
-        CustomScoreTopCollector {
-            custom_scorer,
-            collector,
-        }
-    }
-}
-
-/// A custom segment scorer makes it possible to define any kind of score
-/// for a given document belonging to a specific segment.
-///
-/// It is the segment local version of the [`CustomScorer`].
-pub trait CustomSegmentScorer<TScore>: 'static {
-    /// Computes the score of a specific `doc`.
-    fn score(&mut self, doc: DocId) -> TScore;
-}
-
-/// `CustomScorer` makes it possible to define any kind of score.
-///
-/// The `CustomerScorer` itself does not make much of the computation itself.
-/// Instead, it helps constructing `Self::Child` instances that will compute
-/// the score at a segment scale.
-pub trait CustomScorer<TScore>: Sync {
-    /// Type of the associated [`CustomSegmentScorer`].
-    type Child: CustomSegmentScorer<TScore>;
-    /// Builds a child scorer for a specific segment. The child scorer is associated with
-    /// a specific segment.
-    fn segment_scorer(&self, segment_reader: &SegmentReader) -> crate::Result<Self::Child>;
-}
-
-impl<TCustomScorer, TScore> Collector for CustomScoreTopCollector<TCustomScorer, TScore>
-where
-    TCustomScorer: CustomScorer<TScore> + Send + Sync,
-    TScore: 'static + PartialOrd + Clone + Send + Sync,
-{
-    type Fruit = Vec<(TScore, DocAddress)>;
-
-    type Child = CustomScoreTopSegmentCollector<TCustomScorer::Child, TScore>;
-
-    fn for_segment(
-        &self,
-        segment_local_id: u32,
-        segment_reader: &SegmentReader,
-    ) -> crate::Result<Self::Child> {
-        let segment_collector = self.collector.for_segment(segment_local_id, segment_reader);
-        let segment_scorer = self.custom_scorer.segment_scorer(segment_reader)?;
-        Ok(CustomScoreTopSegmentCollector {
-            segment_collector,
-            segment_scorer,
-        })
-    }
-
-    fn requires_scoring(&self) -> bool {
-        false
-    }
-
-    fn merge_fruits(&self, segment_fruits: Vec<Self::Fruit>) -> crate::Result<Self::Fruit> {
-        self.collector.merge_fruits(segment_fruits)
-    }
-}
-
-pub struct CustomScoreTopSegmentCollector<T, TScore>
-where
-    TScore: 'static + PartialOrd + Clone + Send + Sync + Sized,
-    T: CustomSegmentScorer<TScore>,
-{
-    segment_collector: TopSegmentCollector<TScore>,
-    segment_scorer: T,
-}
-
-impl<T, TScore> SegmentCollector for CustomScoreTopSegmentCollector<T, TScore>
-where
-    TScore: 'static + PartialOrd + Clone + Send + Sync,
-    T: 'static + CustomSegmentScorer<TScore>,
-{
-    type Fruit = Vec<(TScore, DocAddress)>;
-
-    fn collect(&mut self, doc: DocId, _score: Score) {
-        let score = self.segment_scorer.score(doc);
-        self.segment_collector.collect(doc, score);
-    }
-
-    fn harvest(self) -> Vec<(TScore, DocAddress)> {
-        self.segment_collector.harvest()
-    }
-}
-
-impl<F, TScore, T> CustomScorer<TScore> for F
-where
-    F: 'static + Send + Sync + Fn(&SegmentReader) -> T,
-    T: CustomSegmentScorer<TScore>,
-{
-    type Child = T;
-
-    fn segment_scorer(&self, segment_reader: &SegmentReader) -> crate::Result<Self::Child> {
-        Ok((self)(segment_reader))
-    }
-}
-
-impl<F, TScore> CustomSegmentScorer<TScore> for F
-where F: 'static + FnMut(DocId) -> TScore
-{
-    fn score(&mut self, doc: DocId) -> TScore {
-        (self)(doc)
-    }
-}
--- a/src/collector/facet_collector.rs
+++ b/src/collector/facet_collector.rs
@@ -484,7 +484,6 @@ impl FacetCounts {
 #[cfg(test)]
 mod tests {
    use std::collections::BTreeSet;
-    use std::iter;

    use columnar::Dictionary;
    use rand::distributions::Uniform;
--- a/src/collector/filter_collector_wrapper.rs
+++ b/src/collector/filter_collector_wrapper.rs
@@ -12,6 +12,7 @@ use std::marker::PhantomData;
 use columnar::{BytesColumn, Column, DynamicColumn, HasAssociatedColumnType};

 use crate::collector::{Collector, SegmentCollector};
+use crate::schema::Schema;
 use crate::{DocId, Score, SegmentReader};

 /// The `FilterCollector` filters docs using a fast field value and a predicate.
@@ -49,13 +50,13 @@ use crate::{DocId, Score, SegmentReader};
 ///
 /// let query_parser = QueryParser::for_index(&index, vec![title]);
 /// let query = query_parser.parse_query("diary")?;
-/// let no_filter_collector = FilterCollector::new("price".to_string(), |value: u64| value > 20_120u64, TopDocs::with_limit(2));
+/// let no_filter_collector = FilterCollector::new("price".to_string(), |value: u64| value > 20_120u64, TopDocs::with_limit(2).order_by_score());
 /// let top_docs = searcher.search(&query, &no_filter_collector)?;
 ///
 /// assert_eq!(top_docs.len(), 1);
 /// assert_eq!(top_docs[0].1, DocAddress::new(0, 1));
 ///
-/// let filter_all_collector: FilterCollector<_, _, u64> = FilterCollector::new("price".to_string(), |value| value < 5u64, TopDocs::with_limit(2));
+/// let filter_all_collector: FilterCollector<_, _, u64> = FilterCollector::new("price".to_string(), |value| value < 5u64, TopDocs::with_limit(2).order_by_score());
 /// let filtered_top_docs = searcher.search(&query, &filter_all_collector)?;
 ///
 /// assert_eq!(filtered_top_docs.len(), 0);
@@ -104,6 +105,11 @@ where

    type Child = FilterSegmentCollector<TCollector::Child, TPredicate, TPredicateValue>;

+    fn check_schema(&self, schema: &Schema) -> crate::Result<()> {
+        self.collector.check_schema(schema)?;
+        Ok(())
+    }
+
    fn for_segment(
        &self,
        segment_local_id: u32,
@@ -120,6 +126,7 @@ where
            segment_collector,
            predicate: self.predicate.clone(),
            t_predicate_value: PhantomData,
+            filtered_docs: Vec::with_capacity(crate::COLLECT_BLOCK_BUFFER_LEN),
        })
    }

@@ -140,6 +147,7 @@ pub struct FilterSegmentCollector<TSegmentCollector, TPredicate, TPredicateValue
    segment_collector: TSegmentCollector,
    predicate: TPredicate,
    t_predicate_value: PhantomData<TPredicateValue>,
+    filtered_docs: Vec<DocId>,
 }

 impl<TSegmentCollector, TPredicate, TPredicateValue>
@@ -176,6 +184,20 @@ where
        }
    }

+    fn collect_block(&mut self, docs: &[DocId]) {
+        self.filtered_docs.clear();
+        for &doc in docs {
+            // TODO: `accept_document` could be further optimized to do batch lookups of column
+            // values for single-valued columns.
+            if self.accept_document(doc) {
+                self.filtered_docs.push(doc);
+            }
+        }
+        if !self.filtered_docs.is_empty() {
+            self.segment_collector.collect_block(&self.filtered_docs);
+        }
+    }
+
    fn harvest(self) -> TSegmentCollector::Fruit {
        self.segment_collector.harvest()
    }
@@ -218,7 +240,7 @@ where
 ///
 /// let query_parser = QueryParser::for_index(&index, vec![title]);
 /// let query = query_parser.parse_query("diary")?;
-/// let filter_collector = BytesFilterCollector::new("barcode".to_string(), |bytes: &[u8]| bytes.starts_with(b"01"), TopDocs::with_limit(2));
+/// let filter_collector = BytesFilterCollector::new("barcode".to_string(), |bytes: &[u8]| bytes.starts_with(b"01"), TopDocs::with_limit(2).order_by_score());
 /// let top_docs = searcher.search(&query, &filter_collector)?;
 ///
 /// assert_eq!(top_docs.len(), 1);
@@ -258,6 +280,10 @@ where

    type Child = BytesFilterSegmentCollector<TCollector::Child, TPredicate>;

+    fn check_schema(&self, schema: &Schema) -> crate::Result<()> {
+        self.collector.check_schema(schema)
+    }
+
    fn for_segment(
        &self,
        segment_local_id: u32,
@@ -274,6 +300,7 @@ where
            segment_collector,
            predicate: self.predicate.clone(),
            buffer: Vec::new(),
+            filtered_docs: Vec::with_capacity(crate::COLLECT_BLOCK_BUFFER_LEN),
        })
    }

@@ -296,6 +323,7 @@ where TPredicate: 'static
    segment_collector: TSegmentCollector,
    predicate: TPredicate,
    buffer: Vec<u8>,
+    filtered_docs: Vec<DocId>,
 }

 impl<TSegmentCollector, TPredicate> BytesFilterSegmentCollector<TSegmentCollector, TPredicate>
@@ -334,6 +362,20 @@ where
        }
    }

+    fn collect_block(&mut self, docs: &[DocId]) {
+        self.filtered_docs.clear();
+        for &doc in docs {
+            // TODO: `accept_document` could be further optimized to do batch lookups of column
+            // values for single-valued columns.
+            if self.accept_document(doc) {
+                self.filtered_docs.push(doc);
+            }
+        }
+        if !self.filtered_docs.is_empty() {
+            self.segment_collector.collect_block(&self.filtered_docs);
+        }
+    }
+
    fn harvest(self) -> TSegmentCollector::Fruit {
        self.segment_collector.harvest()
    }
--- a/src/collector/mod.rs
+++ b/src/collector/mod.rs
@@ -57,7 +57,7 @@
 //! #     let query_parser = QueryParser::for_index(&index, vec![title]);
 //! #     let query = query_parser.parse_query("diary")?;
 //! let (doc_count, top_docs): (usize, Vec<(Score, DocAddress)>) =
-//! searcher.search(&query, &(Count, TopDocs::with_limit(2)))?;
+//! searcher.search(&query, &(Count, TopDocs::with_limit(2).order_by_score()))?;
 //! #     Ok(())
 //! # }
 //! ```
@@ -83,28 +83,28 @@

 use downcast_rs::impl_downcast;

+use crate::schema::Schema;
 use crate::{DocId, Score, SegmentOrdinal, SegmentReader};

 mod count_collector;
 pub use self::count_collector::Count;

+/// Sort keys
+pub mod sort_key;
+
 mod histogram_collector;
 pub use histogram_collector::HistogramCollector;

 mod multi_collector;
+pub use columnar::ComparableDoc;
+
 pub use self::multi_collector::{FruitHandle, MultiCollector, MultiFruit};

-mod top_collector;
-
 mod top_score_collector;
-pub use self::top_collector::ComparableDoc;
 pub use self::top_score_collector::{TopDocs, TopNComputer};

-mod custom_score_top_collector;
-pub use self::custom_score_top_collector::{CustomScorer, CustomSegmentScorer};
-
-mod tweak_score_top_collector;
-pub use self::tweak_score_top_collector::{ScoreSegmentTweaker, ScoreTweaker};
+mod sort_key_top_collector;
+pub use self::sort_key::{SegmentSortKeyComputer, SortKeyComputer};
 mod facet_collector;
 pub use self::facet_collector::{FacetCollector, FacetCounts};
 use crate::query::Weight;
@@ -145,6 +145,11 @@ pub trait Collector: Sync + Send {
    /// Type of the `SegmentCollector` associated with this collector.
    type Child: SegmentCollector;

+    /// Returns an error if the schema is not compatible with the collector.
+    fn check_schema(&self, _schema: &Schema) -> crate::Result<()> {
+        Ok(())
+    }
+
    /// `set_segment` is called before beginning to enumerate
    /// on this segment.
    fn for_segment(
@@ -170,41 +175,50 @@ pub trait Collector: Sync + Send {
        segment_ord: u32,
        reader: &SegmentReader,
    ) -> crate::Result<<Self::Child as SegmentCollector>::Fruit> {
+        let with_scoring = self.requires_scoring();
        let mut segment_collector = self.for_segment(segment_ord, reader)?;
-
-        match (reader.alive_bitset(), self.requires_scoring()) {
-            (Some(alive_bitset), true) => {
-                weight.for_each(reader, &mut |doc, score| {
-                    if alive_bitset.is_alive(doc) {
-                        segment_collector.collect(doc, score);
-                    }
-                })?;
-            }
-            (Some(alive_bitset), false) => {
-                weight.for_each_no_score(reader, &mut |docs| {
-                    for doc in docs.iter().cloned() {
-                        if alive_bitset.is_alive(doc) {
-                            segment_collector.collect(doc, 0.0);
-                        }
-                    }
-                })?;
-            }
-            (None, true) => {
-                weight.for_each(reader, &mut |doc, score| {
-                    segment_collector.collect(doc, score);
-                })?;
-            }
-            (None, false) => {
-                weight.for_each_no_score(reader, &mut |docs| {
-                    segment_collector.collect_block(docs);
-                })?;
-            }
-        }
-
+        default_collect_segment_impl(&mut segment_collector, weight, reader, with_scoring)?;
        Ok(segment_collector.harvest())
    }
 }

+pub(crate) fn default_collect_segment_impl<TSegmentCollector: SegmentCollector>(
+    segment_collector: &mut TSegmentCollector,
+    weight: &dyn Weight,
+    reader: &SegmentReader,
+    with_scoring: bool,
+) -> crate::Result<()> {
+    match (reader.alive_bitset(), with_scoring) {
+        (Some(alive_bitset), true) => {
+            weight.for_each(reader, &mut |doc, score| {
+                if alive_bitset.is_alive(doc) {
+                    segment_collector.collect(doc, score);
+                }
+            })?;
+        }
+        (Some(alive_bitset), false) => {
+            weight.for_each_no_score(reader, &mut |docs| {
+                for doc in docs.iter().cloned() {
+                    if alive_bitset.is_alive(doc) {
+                        segment_collector.collect(doc, 0.0);
+                    }
+                }
+            })?;
+        }
+        (None, true) => {
+            weight.for_each(reader, &mut |doc, score| {
+                segment_collector.collect(doc, score);
+            })?;
+        }
+        (None, false) => {
+            weight.for_each_no_score(reader, &mut |docs| {
+                segment_collector.collect_block(docs);
+            })?;
+        }
+    }
+    Ok(())
+}
+
 impl<TSegmentCollector: SegmentCollector> SegmentCollector for Option<TSegmentCollector> {
    type Fruit = Option<TSegmentCollector::Fruit>;

@@ -214,6 +228,12 @@ impl<TSegmentCollector: SegmentCollector> SegmentCollector for Option<TSegmentCo
        }
    }

+    fn collect_block(&mut self, docs: &[DocId]) {
+        if let Some(segment_collector) = self {
+            segment_collector.collect_block(docs);
+        }
+    }
+
    fn harvest(self) -> Self::Fruit {
        self.map(|segment_collector| segment_collector.harvest())
    }
@@ -224,6 +244,13 @@ impl<TCollector: Collector> Collector for Option<TCollector> {

    type Child = Option<<TCollector as Collector>::Child>;

+    fn check_schema(&self, schema: &Schema) -> crate::Result<()> {
+        if let Some(underlying_collector) = self {
+            underlying_collector.check_schema(schema)?;
+        }
+        Ok(())
+    }
+
    fn for_segment(
        &self,
        segment_local_id: SegmentOrdinal,
@@ -299,6 +326,12 @@ where
    type Fruit = (Left::Fruit, Right::Fruit);
    type Child = (Left::Child, Right::Child);

+    fn check_schema(&self, schema: &Schema) -> crate::Result<()> {
+        self.0.check_schema(schema)?;
+        self.1.check_schema(schema)?;
+        Ok(())
+    }
+
    fn for_segment(
        &self,
        segment_local_id: u32,
@@ -342,6 +375,11 @@ where
        self.1.collect(doc, score);
    }

+    fn collect_block(&mut self, docs: &[DocId]) {
+        self.0.collect_block(docs);
+        self.1.collect_block(docs);
+    }
+
    fn harvest(self) -> <Self as SegmentCollector>::Fruit {
        (self.0.harvest(), self.1.harvest())
    }
@@ -358,6 +396,13 @@ where
    type Fruit = (One::Fruit, Two::Fruit, Three::Fruit);
    type Child = (One::Child, Two::Child, Three::Child);

+    fn check_schema(&self, schema: &Schema) -> crate::Result<()> {
+        self.0.check_schema(schema)?;
+        self.1.check_schema(schema)?;
+        self.2.check_schema(schema)?;
+        Ok(())
+    }
+
    fn for_segment(
        &self,
        segment_local_id: u32,
@@ -407,6 +452,12 @@ where
        self.2.collect(doc, score);
    }

+    fn collect_block(&mut self, docs: &[DocId]) {
+        self.0.collect_block(docs);
+        self.1.collect_block(docs);
+        self.2.collect_block(docs);
+    }
+
    fn harvest(self) -> <Self as SegmentCollector>::Fruit {
        (self.0.harvest(), self.1.harvest(), self.2.harvest())
    }
@@ -424,6 +475,14 @@ where
    type Fruit = (One::Fruit, Two::Fruit, Three::Fruit, Four::Fruit);
    type Child = (One::Child, Two::Child, Three::Child, Four::Child);

+    fn check_schema(&self, schema: &Schema) -> crate::Result<()> {
+        self.0.check_schema(schema)?;
+        self.1.check_schema(schema)?;
+        self.2.check_schema(schema)?;
+        self.3.check_schema(schema)?;
+        Ok(())
+    }
+
    fn for_segment(
        &self,
        segment_local_id: u32,
@@ -482,6 +541,13 @@ where
        self.3.collect(doc, score);
    }

+    fn collect_block(&mut self, docs: &[DocId]) {
+        self.0.collect_block(docs);
+        self.1.collect_block(docs);
+        self.2.collect_block(docs);
+        self.3.collect_block(docs);
+    }
+
    fn harvest(self) -> <Self as SegmentCollector>::Fruit {
        (
            self.0.harvest(),
--- a/src/collector/multi_collector.rs
+++ b/src/collector/multi_collector.rs
@@ -3,6 +3,7 @@ use std::ops::Deref;

 use super::{Collector, SegmentCollector};
 use crate::collector::Fruit;
+use crate::schema::Schema;
 use crate::{DocId, Score, SegmentOrdinal, SegmentReader, TantivyError};

 /// MultiFruit keeps Fruits from every nested Collector
@@ -16,6 +17,10 @@ impl<TCollector: Collector> Collector for CollectorWrapper<TCollector> {
    type Fruit = Box<dyn Fruit>;
    type Child = Box<dyn BoxableSegmentCollector>;

+    fn check_schema(&self, schema: &Schema) -> crate::Result<()> {
+        self.0.check_schema(schema)
+    }
+
    fn for_segment(
        &self,
        segment_local_id: u32,
@@ -147,7 +152,7 @@ impl<TFruit: Fruit> FruitHandle<TFruit> {
 /// let searcher = reader.searcher();
 ///
 /// let mut collectors = MultiCollector::new();
-/// let top_docs_handle = collectors.add_collector(TopDocs::with_limit(2));
+/// let top_docs_handle = collectors.add_collector(TopDocs::with_limit(2).order_by_score());
 /// let count_handle = collectors.add_collector(Count);
 /// let query_parser = QueryParser::for_index(&index, vec![title]);
 /// let query = query_parser.parse_query("diary").unwrap();
@@ -194,6 +199,13 @@ impl Collector for MultiCollector<'_> {
    type Fruit = MultiFruit;
    type Child = MultiCollectorChild;

+    fn check_schema(&self, schema: &Schema) -> crate::Result<()> {
+        for collector in &self.collector_wrappers {
+            collector.check_schema(schema)?;
+        }
+        Ok(())
+    }
+
    fn for_segment(
        &self,
        segment_local_id: SegmentOrdinal,
@@ -250,6 +262,12 @@ impl SegmentCollector for MultiCollectorChild {
        }
    }

+    fn collect_block(&mut self, docs: &[DocId]) {
+        for child in &mut self.children {
+            child.collect_block(docs);
+        }
+    }
+
    fn harvest(self) -> MultiFruit {
        MultiFruit {
            sub_fruits: self
@@ -293,7 +311,7 @@ mod tests {
        let query = TermQuery::new(term, IndexRecordOption::Basic);

        let mut collectors = MultiCollector::new();
-        let topdocs_handler = collectors.add_collector(TopDocs::with_limit(2));
+        let topdocs_handler = collectors.add_collector(TopDocs::with_limit(2).order_by_score());
        let count_handler = collectors.add_collector(Count);
        let mut multifruits = searcher.search(&query, &collectors).unwrap();

--- a/src/collector/sort_key/mod.rs
+++ b/src/collector/sort_key/mod.rs
@@ -0,0 +1,694 @@
+mod order;
+mod sort_by_erased_type;
+mod sort_by_score;
+mod sort_by_static_fast_value;
+mod sort_by_string;
+mod sort_key_computer;
+
+pub use order::*;
+pub use sort_by_erased_type::SortByErasedType;
+pub use sort_by_score::SortBySimilarityScore;
+pub use sort_by_static_fast_value::SortByStaticFastValue;
+pub use sort_by_string::SortByString;
+pub use sort_key_computer::{SegmentSortKeyComputer, SortKeyComputer};
+
+#[cfg(test)]
+pub(crate) mod tests {
+
+    // By spec, regardless of whether ascending or descending order was requested, in presence of a
+    // tie, we sort by ascending doc id/doc address.
+    pub(crate) fn sort_hits<TSortKey: Ord, D: Ord>(
+        hits: &mut [ComparableDoc<TSortKey, D>],
+        order: Order,
+    ) {
+        if order.is_asc() {
+            hits.sort_by(|l, r| l.sort_key.cmp(&r.sort_key).then(l.doc.cmp(&r.doc)));
+        } else {
+            hits.sort_by(|l, r| {
+                l.sort_key
+                    .cmp(&r.sort_key)
+                    .reverse() // This is descending
+                    .then(l.doc.cmp(&r.doc))
+            });
+        }
+    }
+
+    use std::collections::HashMap;
+    use std::ops::Range;
+
+    use crate::collector::sort_key::{
+        SortByErasedType, SortBySimilarityScore, SortByStaticFastValue, SortByString,
+    };
+    use crate::collector::top_score_collector::compare_for_top_k;
+    use crate::collector::{ComparableDoc, DocSetCollector, TopDocs};
+    use crate::indexer::NoMergePolicy;
+    use crate::query::{AllQuery, QueryParser};
+    use crate::schema::{OwnedValue, Schema, FAST, TEXT};
+    use crate::{DocAddress, Document, Index, Order, Score, Searcher};
+
+    fn make_index() -> crate::Result<Index> {
+        let mut schema_builder = Schema::builder();
+        let id = schema_builder.add_u64_field("id", FAST);
+        let city = schema_builder.add_text_field("city", TEXT | FAST);
+        let catchphrase = schema_builder.add_text_field("catchphrase", TEXT);
+        let altitude = schema_builder.add_f64_field("altitude", FAST);
+        let schema = schema_builder.build();
+        let index = Index::create_in_ram(schema);
+
+        fn create_segment(index: &Index, docs: Vec<impl Document>) -> crate::Result<()> {
+            let mut index_writer = index.writer_for_tests()?;
+            index_writer.set_merge_policy(Box::new(NoMergePolicy));
+            for doc in docs {
+                index_writer.add_document(doc)?;
+            }
+            index_writer.commit()?;
+            Ok(())
+        }
+
+        create_segment(
+            &index,
+            vec![
+                doc!(
+                    id => 0_u64,
+                    city => "austin",
+                    catchphrase => "Hills, Barbeque, Glow",
+                    altitude => 149.0,
+                ),
+                doc!(
+                    id => 1_u64,
+                    city => "greenville",
+                    catchphrase => "Grow, Glow, Glow",
+                    altitude => 27.0,
+                ),
+            ],
+        )?;
+        create_segment(
+            &index,
+            vec![doc!(
+                id => 2_u64,
+                city => "tokyo",
+                catchphrase => "Glow, Glow, Glow",
+                altitude => 40.0,
+            )],
+        )?;
+        create_segment(
+            &index,
+            vec![doc!(
+                id => 3_u64,
+                catchphrase => "No, No, No",
+                altitude => 0.0,
+            )],
+        )?;
+        Ok(index)
+    }
+
+    // NOTE: You cannot determine the SegmentIds that will be generated for Segments
+    // ahead of time, so DocAddresses must be mapped back to a unique id for each Searcher.
+    fn id_mapping(searcher: &Searcher) -> HashMap<DocAddress, u64> {
+        searcher
+            .search(&AllQuery, &DocSetCollector)
+            .unwrap()
+            .into_iter()
+            .map(|doc_address| {
+                let column = searcher.segment_readers()[doc_address.segment_ord as usize]
+                    .fast_fields()
+                    .u64("id")
+                    .unwrap();
+                (doc_address, column.first(doc_address.doc_id).unwrap())
+            })
+            .collect()
+    }
+
+    #[test]
+    fn test_order_by_string() -> crate::Result<()> {
+        let index = make_index()?;
+
+        #[track_caller]
+        fn assert_query(
+            index: &Index,
+            order: Order,
+            doc_range: Range<usize>,
+            expected: Vec<(Option<String>, u64)>,
+        ) -> crate::Result<()> {
+            let searcher = index.reader()?.searcher();
+            let ids = id_mapping(&searcher);
+
+            // Try as primitive.
+            let top_collector = TopDocs::for_doc_range(doc_range)
+                .order_by((SortByString::for_field("city"), order));
+            let actual = searcher
+                .search(&AllQuery, &top_collector)?
+                .into_iter()
+                .map(|(sort_key_opt, doc)| (sort_key_opt, ids[&doc]))
+                .collect::<Vec<_>>();
+            assert_eq!(actual, expected);
+            Ok(())
+        }
+
+        assert_query(
+            &index,
+            Order::Asc,
+            0..4,
+            vec![
+                (Some("austin".to_owned()), 0),
+                (Some("greenville".to_owned()), 1),
+                (Some("tokyo".to_owned()), 2),
+                (None, 3),
+            ],
+        )?;
+
+        assert_query(
+            &index,
+            Order::Asc,
+            0..3,
+            vec![
+                (Some("austin".to_owned()), 0),
+                (Some("greenville".to_owned()), 1),
+                (Some("tokyo".to_owned()), 2),
+            ],
+        )?;
+
+        assert_query(
+            &index,
+            Order::Asc,
+            0..2,
+            vec![
+                (Some("austin".to_owned()), 0),
+                (Some("greenville".to_owned()), 1),
+            ],
+        )?;
+
+        assert_query(
+            &index,
+            Order::Asc,
+            0..1,
+            vec![(Some("austin".to_string()), 0)],
+        )?;
+
+        assert_query(
+            &index,
+            Order::Asc,
+            1..3,
+            vec![
+                (Some("greenville".to_owned()), 1),
+                (Some("tokyo".to_owned()), 2),
+            ],
+        )?;
+
+        assert_query(
+            &index,
+            Order::Desc,
+            0..4,
+            vec![
+                (Some("tokyo".to_owned()), 2),
+                (Some("greenville".to_owned()), 1),
+                (Some("austin".to_owned()), 0),
+                (None, 3),
+            ],
+        )?;
+
+        assert_query(
+            &index,
+            Order::Desc,
+            1..3,
+            vec![
+                (Some("greenville".to_owned()), 1),
+                (Some("austin".to_owned()), 0),
+            ],
+        )?;
+
+        assert_query(
+            &index,
+            Order::Desc,
+            0..1,
+            vec![(Some("tokyo".to_owned()), 2)],
+        )?;
+
+        Ok(())
+    }
+
+    #[test]
+    fn test_order_by_f64() -> crate::Result<()> {
+        let index = make_index()?;
+
+        fn assert_query(
+            index: &Index,
+            order: Order,
+            expected: Vec<(Option<f64>, u64)>,
+        ) -> crate::Result<()> {
+            let searcher = index.reader()?.searcher();
+            let ids = id_mapping(&searcher);
+
+            // Try as primitive.
+            let top_collector = TopDocs::with_limit(3)
+                .order_by((SortByStaticFastValue::<f64>::for_field("altitude"), order));
+            let actual = searcher
+                .search(&AllQuery, &top_collector)?
+                .into_iter()
+                .map(|(altitude_opt, doc)| (altitude_opt, ids[&doc]))
+                .collect::<Vec<_>>();
+            assert_eq!(actual, expected);
+
+            Ok(())
+        }
+
+        assert_query(
+            &index,
+            Order::Asc,
+            vec![(Some(0.0), 3), (Some(27.0), 1), (Some(40.0), 2)],
+        )?;
+
+        assert_query(
+            &index,
+            Order::Desc,
+            vec![(Some(149.0), 0), (Some(40.0), 2), (Some(27.0), 1)],
+        )?;
+
+        Ok(())
+    }
+
+    #[test]
+    fn test_order_by_score() -> crate::Result<()> {
+        let index = make_index()?;
+
+        fn query(index: &Index, order: Order) -> crate::Result<Vec<(Score, u64)>> {
+            let searcher = index.reader()?.searcher();
+            let ids = id_mapping(&searcher);
+
+            let top_collector = TopDocs::with_limit(4).order_by((SortBySimilarityScore, order));
+            let field = index.schema().get_field("catchphrase").unwrap();
+            let query_parser = QueryParser::for_index(index, vec![field]);
+            let text_query = query_parser.parse_query("glow")?;
+
+            Ok(searcher
+                .search(&text_query, &top_collector)?
+                .into_iter()
+                .map(|(score, doc)| (score, ids[&doc]))
+                .collect())
+        }
+
+        assert_eq!(
+            &query(&index, Order::Desc)?,
+            &[(0.5604893, 2), (0.4904281, 1), (0.35667497, 0),]
+        );
+
+        assert_eq!(
+            &query(&index, Order::Asc)?,
+            &[(0.35667497, 0), (0.4904281, 1), (0.5604893, 2),]
+        );
+
+        Ok(())
+    }
+
+    #[test]
+    fn test_order_by_score_then_string() -> crate::Result<()> {
+        let index = make_index()?;
+
+        type SortKey = (Score, Option<String>);
+
+        fn query(
+            index: &Index,
+            score_order: Order,
+            city_order: Order,
+        ) -> crate::Result<Vec<(SortKey, u64)>> {
+            let searcher = index.reader()?.searcher();
+            let ids = id_mapping(&searcher);
+
+            let top_collector = TopDocs::with_limit(4).order_by((
+                (SortBySimilarityScore, score_order),
+                (SortByString::for_field("city"), city_order),
+            ));
+            let results: Vec<((Score, Option<String>), DocAddress)> =
+                searcher.search(&AllQuery, &top_collector)?;
+            Ok(results.into_iter().map(|(f, doc)| (f, ids[&doc])).collect())
+        }
+
+        assert_eq!(
+            &query(&index, Order::Asc, Order::Asc)?,
+            &[
+                ((1.0, Some("austin".to_owned())), 0),
+                ((1.0, Some("greenville".to_owned())), 1),
+                ((1.0, Some("tokyo".to_owned())), 2),
+                ((1.0, None), 3),
+            ]
+        );
+
+        assert_eq!(
+            &query(&index, Order::Asc, Order::Desc)?,
+            &[
+                ((1.0, Some("tokyo".to_owned())), 2),
+                ((1.0, Some("greenville".to_owned())), 1),
+                ((1.0, Some("austin".to_owned())), 0),
+                ((1.0, None), 3),
+            ]
+        );
+        Ok(())
+    }
+
+    #[test]
+    fn test_order_by_score_then_owned_value() -> crate::Result<()> {
+        let index = make_index()?;
+
+        type SortKey = (Score, OwnedValue);
+
+        fn query(
+            index: &Index,
+            score_order: Order,
+            city_order: Order,
+        ) -> crate::Result<Vec<(SortKey, u64)>> {
+            let searcher = index.reader()?.searcher();
+            let ids = id_mapping(&searcher);
+
+            let top_collector = TopDocs::with_limit(4).order_by::<(Score, OwnedValue)>((
+                (SortBySimilarityScore, score_order),
+                (SortByErasedType::for_field("city"), city_order),
+            ));
+            let results: Vec<((Score, OwnedValue), DocAddress)> =
+                searcher.search(&AllQuery, &top_collector)?;
+            Ok(results.into_iter().map(|(f, doc)| (f, ids[&doc])).collect())
+        }
+
+        assert_eq!(
+            &query(&index, Order::Asc, Order::Asc)?,
+            &[
+                ((1.0, OwnedValue::Str("austin".to_owned())), 0),
+                ((1.0, OwnedValue::Str("greenville".to_owned())), 1),
+                ((1.0, OwnedValue::Str("tokyo".to_owned())), 2),
+                ((1.0, OwnedValue::Null), 3),
+            ]
+        );
+
+        assert_eq!(
+            &query(&index, Order::Asc, Order::Desc)?,
+            &[
+                ((1.0, OwnedValue::Str("tokyo".to_owned())), 2),
+                ((1.0, OwnedValue::Str("greenville".to_owned())), 1),
+                ((1.0, OwnedValue::Str("austin".to_owned())), 0),
+                ((1.0, OwnedValue::Null), 3),
+            ]
+        );
+        Ok(())
+    }
+
+    #[test]
+    fn test_order_by_compound_fast_fields() -> crate::Result<()> {
+        let index = make_index()?;
+
+        type CompoundSortKey = (Option<String>, Option<f64>);
+
+        fn assert_query(
+            index: &Index,
+            city_order: Order,
+            altitude_order: Order,
+            expected: Vec<(CompoundSortKey, u64)>,
+        ) -> crate::Result<()> {
+            let searcher = index.reader()?.searcher();
+            let ids = id_mapping(&searcher);
+
+            let top_collector = TopDocs::with_limit(4).order_by((
+                (SortByString::for_field("city"), city_order),
+                (
+                    SortByStaticFastValue::<f64>::for_field("altitude"),
+                    altitude_order,
+                ),
+            ));
+            let actual = searcher
+                .search(&AllQuery, &top_collector)?
+                .into_iter()
+                .map(|(key, doc)| (key, ids[&doc]))
+                .collect::<Vec<_>>();
+            assert_eq!(actual, expected);
+            Ok(())
+        }
+
+        assert_query(
+            &index,
+            Order::Asc,
+            Order::Desc,
+            vec![
+                ((Some("austin".to_owned()), Some(149.0)), 0),
+                ((Some("greenville".to_owned()), Some(27.0)), 1),
+                ((Some("tokyo".to_owned()), Some(40.0)), 2),
+                ((None, Some(0.0)), 3),
+            ],
+        )?;
+
+        Ok(())
+    }
+
+    use proptest::prelude::*;
+
+    proptest! {
+    #[test]
+    fn test_order_by_string_prop(
+          order in prop_oneof!(Just(Order::Desc), Just(Order::Asc)),
+          limit in 1..64_usize,
+          offset in 0..64_usize,
+          segments_terms in
+            proptest::collection::vec(
+                proptest::collection::vec(0..32_u8, 1..32_usize),
+                0..8_usize,
+            )
+        ) {
+            let mut schema_builder = Schema::builder();
+            let city = schema_builder.add_text_field("city", TEXT | FAST);
+            let schema = schema_builder.build();
+            let index = Index::create_in_ram(schema);
+            let mut index_writer = index.writer_for_tests()?;
+
+            // A Vec<Vec<u8>>, where the outer Vec represents segments, and the inner Vec
+            // represents terms.
+            for segment_terms in segments_terms.into_iter() {
+                for term in segment_terms.into_iter() {
+                    let term = format!("{term:0>3}");
+                    index_writer.add_document(doc!(
+                        city => term,
+                    ))?;
+                }
+                index_writer.commit()?;
+            }
+
+            let searcher = index.reader()?.searcher();
+            let top_n_results = searcher.search(&AllQuery, &TopDocs::with_limit(limit)
+                .and_offset(offset)
+                .order_by_string_fast_field("city", order))?;
+            let all_results = searcher.search(&AllQuery, &DocSetCollector)?.into_iter().map(|doc_address| {
+                // Get the term for this address.
+                let column = searcher.segment_readers()[doc_address.segment_ord as usize].fast_fields().str("city").unwrap().unwrap();
+                let value = column.term_ords(doc_address.doc_id).next().map(|term_ord| {
+                    let mut city = Vec::new();
+                    column.dictionary().ord_to_term(term_ord, &mut city).unwrap();
+                    String::try_from(city).unwrap()
+                });
+                (value, doc_address)
+            });
+
+            // Using the TopDocs collector should always be equivalent to sorting, skipping the
+            // offset, and then taking the limit.
+            let sorted_docs: Vec<_> = {
+                let mut comparable_docs: Vec<ComparableDoc<_, _>> =
+                    all_results.into_iter().map(|(sort_key, doc)| ComparableDoc { sort_key, doc}).collect();
+                sort_hits(&mut comparable_docs, order);
+                comparable_docs.into_iter().map(|cd| (cd.sort_key, cd.doc)).collect()
+            };
+            let expected_docs = sorted_docs.into_iter().skip(offset).take(limit).collect::<Vec<_>>();
+            prop_assert_eq!(
+                expected_docs,
+                top_n_results
+            );
+        }
+    }
+
+    proptest! {
+    #[test]
+    fn test_order_by_compound_prop(
+        city_order in prop_oneof!(Just(Order::Desc), Just(Order::Asc)),
+        altitude_order in prop_oneof!(Just(Order::Desc), Just(Order::Asc)),
+        limit in 1..20_usize,
+        offset in 0..20_usize,
+        segments_data in proptest::collection::vec(
+            proptest::collection::vec(
+                (proptest::option::of("[a-c]"), proptest::option::of(0..50u64)),
+                1..10_usize // segment size
+            ),
+            1..4_usize // num segments
+        )
+    ) {
+        use crate::collector::sort_key::ComparatorEnum;
+        use crate::TantivyDocument;
+
+        let mut schema_builder = Schema::builder();
+        let city = schema_builder.add_text_field("city", TEXT | FAST);
+        let altitude = schema_builder.add_u64_field("altitude", FAST);
+        let schema = schema_builder.build();
+        let index = Index::create_in_ram(schema);
+        let mut index_writer = index.writer_for_tests().unwrap();
+
+        for segment_data in segments_data.into_iter() {
+            for (city_val, altitude_val) in segment_data.into_iter() {
+                let mut doc = TantivyDocument::default();
+                if let Some(c) = city_val {
+                    doc.add_text(city, c);
+                }
+                if let Some(a) = altitude_val {
+                    doc.add_u64(altitude, a);
+                }
+                index_writer.add_document(doc).unwrap();
+            }
+            index_writer.commit().unwrap();
+        }
+
+        let searcher = index.reader().unwrap().searcher();
+
+        let top_collector = TopDocs::with_limit(limit)
+            .and_offset(offset)
+            .order_by((
+                (SortByString::for_field("city"), city_order),
+                (
+                    SortByStaticFastValue::<u64>::for_field("altitude"),
+                    altitude_order,
+                ),
+            ));
+
+        let actual_results = searcher.search(&AllQuery, &top_collector).unwrap();
+        let actual_doc_ids: Vec<DocAddress> =
+            actual_results.into_iter().map(|(_, doc)| doc).collect();
+
+        // Verification logic
+        let all_docs_collector = DocSetCollector;
+        let all_docs = searcher.search(&AllQuery, &all_docs_collector).unwrap();
+
+        let docs_with_keys: Vec<((Option<String>, Option<u64>), DocAddress)> = all_docs
+            .into_iter()
+            .map(|doc_addr| {
+                let reader = searcher.segment_reader(doc_addr.segment_ord);
+
+                let city_val = if let Some(col) = reader.fast_fields().str("city").unwrap() {
+                     let ord = col.ords().first(doc_addr.doc_id);
+                     if let Some(ord) = ord {
+                         let mut out = Vec::new();
+                         col.dictionary().ord_to_term(ord, &mut out).unwrap();
+                         String::from_utf8(out).ok()
+                     } else {
+                         None
+                     }
+                } else {
+                    None
+                };
+
+                let alt_val = if let Some((col, _)) = reader.fast_fields().u64_lenient("altitude").unwrap() {
+                    col.first(doc_addr.doc_id)
+                } else {
+                    None
+                };
+
+                ((city_val, alt_val), doc_addr)
+            })
+            .collect();
+
+        let city_comparator = ComparatorEnum::from(city_order);
+        let alt_comparator = ComparatorEnum::from(altitude_order);
+        let comparator = (city_comparator, alt_comparator);
+
+        let mut comparable_docs: Vec<ComparableDoc<_, _>> = docs_with_keys
+            .into_iter()
+            .map(|(sort_key, doc)| ComparableDoc { sort_key, doc })
+            .collect();
+
+        comparable_docs.sort_by(|l, r| compare_for_top_k(&comparator, l, r));
+
+        let expected_results = comparable_docs
+            .into_iter()
+            .skip(offset)
+            .take(limit)
+            .collect::<Vec<_>>();
+
+        let expected_doc_ids: Vec<DocAddress> =
+            expected_results.into_iter().map(|cd| cd.doc).collect();
+
+        prop_assert_eq!(actual_doc_ids, expected_doc_ids);
+    }
+    }
+
+    proptest! {
+    #[test]
+    fn test_order_by_u64_prop(
+        order in prop_oneof!(Just(Order::Desc), Just(Order::Asc)),
+        limit in 1..20_usize,
+        offset in 0..20_usize,
+        segments_data in proptest::collection::vec(
+            proptest::collection::vec(
+                proptest::option::of(0..100u64),
+                1..1000_usize // segment size
+            ),
+            1..4_usize // num segments
+        )
+    ) {
+        use crate::collector::sort_key::ComparatorEnum;
+        use crate::TantivyDocument;
+
+        let mut schema_builder = Schema::builder();
+        let field = schema_builder.add_u64_field("field", FAST);
+        let schema = schema_builder.build();
+        let index = Index::create_in_ram(schema);
+        let mut index_writer = index.writer_for_tests().unwrap();
+
+        for segment_data in segments_data.into_iter() {
+            for val in segment_data.into_iter() {
+                let mut doc = TantivyDocument::default();
+                if let Some(v) = val {
+                    doc.add_u64(field, v);
+                }
+                index_writer.add_document(doc).unwrap();
+            }
+            index_writer.commit().unwrap();
+        }
+
+        let searcher = index.reader().unwrap().searcher();
+
+        let top_collector = TopDocs::with_limit(limit)
+            .and_offset(offset)
+            .order_by((SortByStaticFastValue::<u64>::for_field("field"), order));
+
+        let actual_results = searcher.search(&AllQuery, &top_collector).unwrap();
+        let actual_doc_ids: Vec<DocAddress> =
+            actual_results.into_iter().map(|(_, doc)| doc).collect();
+
+        // Verification logic
+        let all_docs_collector = DocSetCollector;
+        let all_docs = searcher.search(&AllQuery, &all_docs_collector).unwrap();
+
+        let docs_with_keys: Vec<(Option<u64>, DocAddress)> = all_docs
+            .into_iter()
+            .map(|doc_addr| {
+                let reader = searcher.segment_reader(doc_addr.segment_ord);
+                let val = if let Some((col, _)) = reader.fast_fields().u64_lenient("field").unwrap() {
+                    col.first(doc_addr.doc_id)
+                } else {
+                    None
+                };
+                (val, doc_addr)
+            })
+            .collect();
+
+        let comparator = ComparatorEnum::from(order);
+        let mut comparable_docs: Vec<ComparableDoc<_, _>> = docs_with_keys
+            .into_iter()
+            .map(|(sort_key, doc)| ComparableDoc { sort_key, doc })
+            .collect();
+
+        comparable_docs.sort_by(|l, r| compare_for_top_k(&comparator, l, r));
+
+        let expected_results = comparable_docs
+            .into_iter()
+            .skip(offset)
+            .take(limit)
+            .collect::<Vec<_>>();
+
+        let expected_doc_ids: Vec<DocAddress> =
+            expected_results.into_iter().map(|cd| cd.doc).collect();
+
+        prop_assert_eq!(actual_doc_ids, expected_doc_ids);
+    }
+    }
+}
--- a/src/collector/sort_key/order.rs
+++ b/src/collector/sort_key/order.rs
@@ -0,0 +1,768 @@
+use std::cmp::Ordering;
+
+use columnar::{ComparableDoc, MonotonicallyMappableToU64, ValueRange};
+use serde::{Deserialize, Serialize};
+
+use crate::collector::{SegmentSortKeyComputer, SortKeyComputer};
+use crate::schema::{OwnedValue, Schema};
+use crate::{DocId, Order, Score};
+
+fn compare_owned_value<const NULLS_FIRST: bool>(lhs: &OwnedValue, rhs: &OwnedValue) -> Ordering {
+    match (lhs, rhs) {
+        (OwnedValue::Null, OwnedValue::Null) => Ordering::Equal,
+        (OwnedValue::Null, _) => {
+            if NULLS_FIRST {
+                Ordering::Less
+            } else {
+                Ordering::Greater
+            }
+        }
+        (_, OwnedValue::Null) => {
+            if NULLS_FIRST {
+                Ordering::Greater
+            } else {
+                Ordering::Less
+            }
+        }
+        (OwnedValue::Str(a), OwnedValue::Str(b)) => a.cmp(b),
+        (OwnedValue::PreTokStr(a), OwnedValue::PreTokStr(b)) => a.cmp(b),
+        (OwnedValue::U64(a), OwnedValue::U64(b)) => a.cmp(b),
+        (OwnedValue::I64(a), OwnedValue::I64(b)) => a.cmp(b),
+        (OwnedValue::F64(a), OwnedValue::F64(b)) => a.to_u64().cmp(&b.to_u64()),
+        (OwnedValue::Bool(a), OwnedValue::Bool(b)) => a.cmp(b),
+        (OwnedValue::Date(a), OwnedValue::Date(b)) => a.cmp(b),
+        (OwnedValue::Facet(a), OwnedValue::Facet(b)) => a.cmp(b),
+        (OwnedValue::Bytes(a), OwnedValue::Bytes(b)) => a.cmp(b),
+        (OwnedValue::IpAddr(a), OwnedValue::IpAddr(b)) => a.cmp(b),
+        (OwnedValue::U64(a), OwnedValue::I64(b)) => {
+            if *b < 0 {
+                Ordering::Greater
+            } else {
+                a.cmp(&(*b as u64))
+            }
+        }
+        (OwnedValue::I64(a), OwnedValue::U64(b)) => {
+            if *a < 0 {
+                Ordering::Less
+            } else {
+                (*a as u64).cmp(b)
+            }
+        }
+        (OwnedValue::U64(a), OwnedValue::F64(b)) => (*a as f64).to_u64().cmp(&b.to_u64()),
+        (OwnedValue::F64(a), OwnedValue::U64(b)) => a.to_u64().cmp(&(*b as f64).to_u64()),
+        (OwnedValue::I64(a), OwnedValue::F64(b)) => (*a as f64).to_u64().cmp(&b.to_u64()),
+        (OwnedValue::F64(a), OwnedValue::I64(b)) => a.to_u64().cmp(&(*b as f64).to_u64()),
+        (a, b) => {
+            let ord = a.discriminant_value().cmp(&b.discriminant_value());
+            // If the discriminant is equal, it's because a new type was added, but hasn't been
+            // included in this `match` statement.
+            assert!(
+                ord != Ordering::Equal,
+                "Unimplemented comparison for type of {a:?}, {b:?}"
+            );
+            ord
+        }
+    }
+}
+
+/// Comparator trait defining the order in which documents should be ordered.
+pub trait Comparator<T>: Send + Sync + std::fmt::Debug + Default {
+    /// Return the order between two values.
+    fn compare(&self, lhs: &T, rhs: &T) -> Ordering;
+
+    /// Return a `ValueRange` that matches all values that are greater than the provided threshold.
+    fn threshold_to_valuerange(&self, threshold: T) -> ValueRange<T>;
+}
+
+/// Compare values naturally (e.g. 1 < 2).
+///
+/// When used with `TopDocs`, which reverses the order, this results in a
+/// "Descending" sort (Greatest values first).
+///
+/// `None` (or Null for `OwnedValue`) values are considered to be smaller than any other value,
+/// and will therefore appear last in a descending sort (e.g. `[Some(20), Some(10), None]`).
+#[derive(Debug, Copy, Clone, Default, Serialize, Deserialize)]
+pub struct NaturalComparator;
+
+impl<T: PartialOrd> Comparator<T> for NaturalComparator {
+    #[inline(always)]
+    fn compare(&self, lhs: &T, rhs: &T) -> Ordering {
+        lhs.partial_cmp(rhs).unwrap_or(Ordering::Equal)
+    }
+
+    fn threshold_to_valuerange(&self, threshold: T) -> ValueRange<T> {
+        ValueRange::GreaterThan(threshold, false)
+    }
+}
+
+/// A (partial) implementation of comparison for OwnedValue.
+///
+/// Intended for use within columns of homogenous types, and so will panic for OwnedValues with
+/// mismatched types. The one exception is Null, for which we do define all comparisons.
+impl Comparator<OwnedValue> for NaturalComparator {
+    #[inline(always)]
+    fn compare(&self, lhs: &OwnedValue, rhs: &OwnedValue) -> Ordering {
+        compare_owned_value::</* NULLS_FIRST= */ true>(lhs, rhs)
+    }
+
+    fn threshold_to_valuerange(&self, threshold: OwnedValue) -> ValueRange<OwnedValue> {
+        ValueRange::GreaterThan(threshold, false)
+    }
+}
+
+/// Compare values in reverse (e.g. 2 < 1).
+///
+/// When used with `TopDocs`, which reverses the order, this results in an
+/// "Ascending" sort (Smallest values first).
+///
+/// `None` is considered smaller than `Some` in the underlying comparator, but because the
+/// comparison is reversed, `None` is effectively treated as the lowest value in the resulting
+/// Ascending sort (e.g. `[None, Some(10), Some(20)]`).
+///
+/// The ReverseComparator does not necessarily imply that the sort order is reversed compared
+/// to the NaturalComparator. In presence of a tie on the sort key, documents will always be
+/// sorted by ascending `DocId`/`DocAddress` in TopN results, regardless of the sort key's order.
+#[derive(Debug, Copy, Clone, Default, Serialize, Deserialize)]
+pub struct ReverseComparator;
+
+macro_rules! impl_reverse_comparator_primitive {
+    ($($t:ty),*) => {
+        $(
+            impl Comparator<$t> for ReverseComparator {
+                #[inline(always)]
+                fn compare(&self, lhs: &$t, rhs: &$t) -> Ordering {
+                    NaturalComparator.compare(rhs, lhs)
+                }
+
+                fn threshold_to_valuerange(&self, threshold: $t) -> ValueRange<$t> {
+                    ValueRange::LessThan(threshold, true)
+                }
+            }
+        )*
+    }
+}
+
+impl_reverse_comparator_primitive!(
+    bool,
+    u8,
+    u16,
+    u32,
+    u64,
+    u128,
+    usize,
+    i8,
+    i16,
+    i32,
+    i64,
+    i128,
+    isize,
+    f32,
+    f64,
+    String,
+    crate::DateTime,
+    Vec<u8>,
+    crate::schema::Facet
+);
+
+impl<T: PartialOrd + Send + Sync + std::fmt::Debug + Clone + 'static> Comparator<Option<T>>
+    for ReverseComparator
+{
+    #[inline(always)]
+    fn compare(&self, lhs: &Option<T>, rhs: &Option<T>) -> Ordering {
+        NaturalComparator.compare(rhs, lhs)
+    }
+
+    fn threshold_to_valuerange(&self, threshold: Option<T>) -> ValueRange<Option<T>> {
+        let is_some = threshold.is_some();
+        ValueRange::LessThan(threshold, is_some)
+    }
+}
+
+impl Comparator<OwnedValue> for ReverseComparator {
+    #[inline(always)]
+    fn compare(&self, lhs: &OwnedValue, rhs: &OwnedValue) -> Ordering {
+        NaturalComparator.compare(rhs, lhs)
+    }
+
+    fn threshold_to_valuerange(&self, threshold: OwnedValue) -> ValueRange<OwnedValue> {
+        let is_not_null = !matches!(threshold, OwnedValue::Null);
+        ValueRange::LessThan(threshold, is_not_null)
+    }
+}
+
+/// Compare values in reverse, but treating `None` as lower than `Some`.
+///
+/// When used with `TopDocs`, which reverses the order, this results in an
+/// "Ascending" sort (Smallest values first), but with `None` values appearing last
+/// (e.g. `[Some(10), Some(20), None]`).
+///
+/// This is usually what is wanted when sorting by a field in an ascending order.
+/// For instance, in an e-commerce website, if sorting by price ascending,
+/// the cheapest items would appear first, and items without a price would appear last.
+#[derive(Debug, Copy, Clone, Default)]
+pub struct ReverseNoneIsLowerComparator;
+
+impl<T> Comparator<Option<T>> for ReverseNoneIsLowerComparator
+where ReverseComparator: Comparator<T>
+{
+    #[inline(always)]
+    fn compare(&self, lhs_opt: &Option<T>, rhs_opt: &Option<T>) -> Ordering {
+        match (lhs_opt, rhs_opt) {
+            (None, None) => Ordering::Equal,
+            (None, Some(_)) => Ordering::Less,
+            (Some(_), None) => Ordering::Greater,
+            (Some(lhs), Some(rhs)) => ReverseComparator.compare(lhs, rhs),
+        }
+    }
+
+    fn threshold_to_valuerange(&self, threshold: Option<T>) -> ValueRange<Option<T>> {
+        if threshold.is_some() {
+            ValueRange::LessThan(threshold, false)
+        } else {
+            ValueRange::GreaterThan(threshold, false)
+        }
+    }
+}
+
+impl Comparator<u32> for ReverseNoneIsLowerComparator {
+    #[inline(always)]
+    fn compare(&self, lhs: &u32, rhs: &u32) -> Ordering {
+        ReverseComparator.compare(lhs, rhs)
+    }
+
+    fn threshold_to_valuerange(&self, threshold: u32) -> ValueRange<u32> {
+        ValueRange::LessThan(threshold, false)
+    }
+}
+
+impl Comparator<u64> for ReverseNoneIsLowerComparator {
+    #[inline(always)]
+    fn compare(&self, lhs: &u64, rhs: &u64) -> Ordering {
+        ReverseComparator.compare(lhs, rhs)
+    }
+
+    fn threshold_to_valuerange(&self, threshold: u64) -> ValueRange<u64> {
+        ValueRange::LessThan(threshold, false)
+    }
+}
+
+impl Comparator<f64> for ReverseNoneIsLowerComparator {
+    #[inline(always)]
+    fn compare(&self, lhs: &f64, rhs: &f64) -> Ordering {
+        ReverseComparator.compare(lhs, rhs)
+    }
+
+    fn threshold_to_valuerange(&self, threshold: f64) -> ValueRange<f64> {
+        ValueRange::LessThan(threshold, false)
+    }
+}
+
+impl Comparator<f32> for ReverseNoneIsLowerComparator {
+    #[inline(always)]
+    fn compare(&self, lhs: &f32, rhs: &f32) -> Ordering {
+        ReverseComparator.compare(lhs, rhs)
+    }
+
+    fn threshold_to_valuerange(&self, threshold: f32) -> ValueRange<f32> {
+        ValueRange::LessThan(threshold, false)
+    }
+}
+
+impl Comparator<i64> for ReverseNoneIsLowerComparator {
+    #[inline(always)]
+    fn compare(&self, lhs: &i64, rhs: &i64) -> Ordering {
+        ReverseComparator.compare(lhs, rhs)
+    }
+
+    fn threshold_to_valuerange(&self, threshold: i64) -> ValueRange<i64> {
+        ValueRange::LessThan(threshold, false)
+    }
+}
+
+impl Comparator<String> for ReverseNoneIsLowerComparator {
+    #[inline(always)]
+    fn compare(&self, lhs: &String, rhs: &String) -> Ordering {
+        ReverseComparator.compare(lhs, rhs)
+    }
+
+    fn threshold_to_valuerange(&self, threshold: String) -> ValueRange<String> {
+        ValueRange::LessThan(threshold, false)
+    }
+}
+
+impl Comparator<OwnedValue> for ReverseNoneIsLowerComparator {
+    #[inline(always)]
+    fn compare(&self, lhs: &OwnedValue, rhs: &OwnedValue) -> Ordering {
+        compare_owned_value::</* NULLS_FIRST= */ false>(rhs, lhs)
+    }
+
+    fn threshold_to_valuerange(&self, threshold: OwnedValue) -> ValueRange<OwnedValue> {
+        ValueRange::LessThan(threshold, false)
+    }
+}
+
+/// Compare values naturally, but treating `None` as higher than `Some`.
+///
+/// When used with `TopDocs`, which reverses the order, this results in a
+/// "Descending" sort (Greatest values first), but with `None` values appearing first
+/// (e.g. `[None, Some(20), Some(10)]`).
+#[derive(Debug, Copy, Clone, Default, Serialize, Deserialize)]
+pub struct NaturalNoneIsHigherComparator;
+
+impl<T> Comparator<Option<T>> for NaturalNoneIsHigherComparator
+where NaturalComparator: Comparator<T>
+{
+    #[inline(always)]
+    fn compare(&self, lhs_opt: &Option<T>, rhs_opt: &Option<T>) -> Ordering {
+        match (lhs_opt, rhs_opt) {
+            (None, None) => Ordering::Equal,
+            (None, Some(_)) => Ordering::Greater,
+            (Some(_), None) => Ordering::Less,
+            (Some(lhs), Some(rhs)) => NaturalComparator.compare(lhs, rhs),
+        }
+    }
+
+    fn threshold_to_valuerange(&self, threshold: Option<T>) -> ValueRange<Option<T>> {
+        if threshold.is_some() {
+            let is_some = threshold.is_some();
+            ValueRange::GreaterThan(threshold, is_some)
+        } else {
+            ValueRange::LessThan(threshold, false)
+        }
+    }
+}
+
+impl Comparator<u32> for NaturalNoneIsHigherComparator {
+    #[inline(always)]
+    fn compare(&self, lhs: &u32, rhs: &u32) -> Ordering {
+        NaturalComparator.compare(lhs, rhs)
+    }
+
+    fn threshold_to_valuerange(&self, threshold: u32) -> ValueRange<u32> {
+        ValueRange::GreaterThan(threshold, true)
+    }
+}
+
+impl Comparator<u64> for NaturalNoneIsHigherComparator {
+    #[inline(always)]
+    fn compare(&self, lhs: &u64, rhs: &u64) -> Ordering {
+        NaturalComparator.compare(lhs, rhs)
+    }
+
+    fn threshold_to_valuerange(&self, threshold: u64) -> ValueRange<u64> {
+        ValueRange::GreaterThan(threshold, true)
+    }
+}
+
+impl Comparator<f64> for NaturalNoneIsHigherComparator {
+    #[inline(always)]
+    fn compare(&self, lhs: &f64, rhs: &f64) -> Ordering {
+        NaturalComparator.compare(lhs, rhs)
+    }
+
+    fn threshold_to_valuerange(&self, threshold: f64) -> ValueRange<f64> {
+        ValueRange::GreaterThan(threshold, true)
+    }
+}
+
+impl Comparator<f32> for NaturalNoneIsHigherComparator {
+    #[inline(always)]
+    fn compare(&self, lhs: &f32, rhs: &f32) -> Ordering {
+        NaturalComparator.compare(lhs, rhs)
+    }
+
+    fn threshold_to_valuerange(&self, threshold: f32) -> ValueRange<f32> {
+        ValueRange::GreaterThan(threshold, true)
+    }
+}
+
+impl Comparator<i64> for NaturalNoneIsHigherComparator {
+    #[inline(always)]
+    fn compare(&self, lhs: &i64, rhs: &i64) -> Ordering {
+        NaturalComparator.compare(lhs, rhs)
+    }
+
+    fn threshold_to_valuerange(&self, threshold: i64) -> ValueRange<i64> {
+        ValueRange::GreaterThan(threshold, true)
+    }
+}
+
+impl Comparator<String> for NaturalNoneIsHigherComparator {
+    #[inline(always)]
+    fn compare(&self, lhs: &String, rhs: &String) -> Ordering {
+        NaturalComparator.compare(lhs, rhs)
+    }
+
+    fn threshold_to_valuerange(&self, threshold: String) -> ValueRange<String> {
+        ValueRange::GreaterThan(threshold, true)
+    }
+}
+
+impl Comparator<OwnedValue> for NaturalNoneIsHigherComparator {
+    #[inline(always)]
+    fn compare(&self, lhs: &OwnedValue, rhs: &OwnedValue) -> Ordering {
+        compare_owned_value::</* NULLS_FIRST= */ false>(lhs, rhs)
+    }
+
+    fn threshold_to_valuerange(&self, threshold: OwnedValue) -> ValueRange<OwnedValue> {
+        ValueRange::GreaterThan(threshold, true)
+    }
+}
+
+/// An enum representing the different sort orders.
+#[derive(Debug, Clone, Copy, Eq, PartialEq, Default)]
+pub enum ComparatorEnum {
+    /// Natural order (See [NaturalComparator])
+    #[default]
+    Natural,
+    /// Reverse order (See [ReverseComparator])
+    Reverse,
+    /// Reverse order by treating None as the lowest value. (See [ReverseNoneLowerComparator])
+    ReverseNoneLower,
+    /// Natural order but treating None as the highest value. (See [NaturalNoneIsHigherComparator])
+    NaturalNoneHigher,
+}
+
+impl From<Order> for ComparatorEnum {
+    fn from(order: Order) -> Self {
+        match order {
+            Order::Asc => ComparatorEnum::ReverseNoneLower,
+            Order::Desc => ComparatorEnum::Natural,
+        }
+    }
+}
+
+impl<T> Comparator<T> for ComparatorEnum
+where
+    ReverseNoneIsLowerComparator: Comparator<T>,
+    NaturalComparator: Comparator<T>,
+    ReverseComparator: Comparator<T>,
+    NaturalNoneIsHigherComparator: Comparator<T>,
+{
+    #[inline(always)]
+    fn compare(&self, lhs: &T, rhs: &T) -> Ordering {
+        match self {
+            ComparatorEnum::Natural => NaturalComparator.compare(lhs, rhs),
+            ComparatorEnum::Reverse => ReverseComparator.compare(lhs, rhs),
+            ComparatorEnum::ReverseNoneLower => ReverseNoneIsLowerComparator.compare(lhs, rhs),
+            ComparatorEnum::NaturalNoneHigher => NaturalNoneIsHigherComparator.compare(lhs, rhs),
+        }
+    }
+
+    fn threshold_to_valuerange(&self, threshold: T) -> ValueRange<T> {
+        match self {
+            ComparatorEnum::Natural => NaturalComparator.threshold_to_valuerange(threshold),
+            ComparatorEnum::Reverse => ReverseComparator.threshold_to_valuerange(threshold),
+            ComparatorEnum::ReverseNoneLower => {
+                ReverseNoneIsLowerComparator.threshold_to_valuerange(threshold)
+            }
+            ComparatorEnum::NaturalNoneHigher => {
+                NaturalNoneIsHigherComparator.threshold_to_valuerange(threshold)
+            }
+        }
+    }
+}
+
+impl<Head, Tail, LeftComparator, RightComparator> Comparator<(Head, Tail)>
+    for (LeftComparator, RightComparator)
+where
+    LeftComparator: Comparator<Head>,
+    RightComparator: Comparator<Tail>,
+{
+    #[inline(always)]
+    fn compare(&self, lhs: &(Head, Tail), rhs: &(Head, Tail)) -> Ordering {
+        self.0
+            .compare(&lhs.0, &rhs.0)
+            .then_with(|| self.1.compare(&lhs.1, &rhs.1))
+    }
+
+    fn threshold_to_valuerange(&self, threshold: (Head, Tail)) -> ValueRange<(Head, Tail)> {
+        ValueRange::GreaterThan(threshold, false)
+    }
+}
+
+impl<Type1, Type2, Type3, Comparator1, Comparator2, Comparator3> Comparator<(Type1, (Type2, Type3))>
+    for (Comparator1, Comparator2, Comparator3)
+where
+    Comparator1: Comparator<Type1>,
+    Comparator2: Comparator<Type2>,
+    Comparator3: Comparator<Type3>,
+{
+    #[inline(always)]
+    fn compare(&self, lhs: &(Type1, (Type2, Type3)), rhs: &(Type1, (Type2, Type3))) -> Ordering {
+        self.0
+            .compare(&lhs.0, &rhs.0)
+            .then_with(|| self.1.compare(&lhs.1 .0, &rhs.1 .0))
+            .then_with(|| self.2.compare(&lhs.1 .1, &rhs.1 .1))
+    }
+
+    fn threshold_to_valuerange(
+        &self,
+        threshold: (Type1, (Type2, Type3)),
+    ) -> ValueRange<(Type1, (Type2, Type3))> {
+        ValueRange::GreaterThan(threshold, false)
+    }
+}
+
+impl<Type1, Type2, Type3, Comparator1, Comparator2, Comparator3> Comparator<(Type1, Type2, Type3)>
+    for (Comparator1, Comparator2, Comparator3)
+where
+    Comparator1: Comparator<Type1>,
+    Comparator2: Comparator<Type2>,
+    Comparator3: Comparator<Type3>,
+{
+    #[inline(always)]
+    fn compare(&self, lhs: &(Type1, Type2, Type3), rhs: &(Type1, Type2, Type3)) -> Ordering {
+        self.0
+            .compare(&lhs.0, &rhs.0)
+            .then_with(|| self.1.compare(&lhs.1, &rhs.1))
+            .then_with(|| self.2.compare(&lhs.2, &rhs.2))
+    }
+
+    fn threshold_to_valuerange(
+        &self,
+        threshold: (Type1, Type2, Type3),
+    ) -> ValueRange<(Type1, Type2, Type3)> {
+        ValueRange::GreaterThan(threshold, false)
+    }
+}
+
+impl<Type1, Type2, Type3, Type4, Comparator1, Comparator2, Comparator3, Comparator4>
+    Comparator<(Type1, (Type2, (Type3, Type4)))>
+    for (Comparator1, Comparator2, Comparator3, Comparator4)
+where
+    Comparator1: Comparator<Type1>,
+    Comparator2: Comparator<Type2>,
+    Comparator3: Comparator<Type3>,
+    Comparator4: Comparator<Type4>,
+{
+    #[inline(always)]
+    fn compare(
+        &self,
+        lhs: &(Type1, (Type2, (Type3, Type4))),
+        rhs: &(Type1, (Type2, (Type3, Type4))),
+    ) -> Ordering {
+        self.0
+            .compare(&lhs.0, &rhs.0)
+            .then_with(|| self.1.compare(&lhs.1 .0, &rhs.1 .0))
+            .then_with(|| self.2.compare(&lhs.1 .1 .0, &rhs.1 .1 .0))
+            .then_with(|| self.3.compare(&lhs.1 .1 .1, &rhs.1 .1 .1))
+    }
+
+    fn threshold_to_valuerange(
+        &self,
+        threshold: (Type1, (Type2, (Type3, Type4))),
+    ) -> ValueRange<(Type1, (Type2, (Type3, Type4)))> {
+        ValueRange::GreaterThan(threshold, false)
+    }
+}
+
+impl<Type1, Type2, Type3, Type4, Comparator1, Comparator2, Comparator3, Comparator4>
+    Comparator<(Type1, Type2, Type3, Type4)>
+    for (Comparator1, Comparator2, Comparator3, Comparator4)
+where
+    Comparator1: Comparator<Type1>,
+    Comparator2: Comparator<Type2>,
+    Comparator3: Comparator<Type3>,
+    Comparator4: Comparator<Type4>,
+{
+    #[inline(always)]
+    fn compare(
+        &self,
+        lhs: &(Type1, Type2, Type3, Type4),
+        rhs: &(Type1, Type2, Type3, Type4),
+    ) -> Ordering {
+        self.0
+            .compare(&lhs.0, &rhs.0)
+            .then_with(|| self.1.compare(&lhs.1, &rhs.1))
+            .then_with(|| self.2.compare(&lhs.2, &rhs.2))
+            .then_with(|| self.3.compare(&lhs.3, &rhs.3))
+    }
+
+    fn threshold_to_valuerange(
+        &self,
+        threshold: (Type1, Type2, Type3, Type4),
+    ) -> ValueRange<(Type1, Type2, Type3, Type4)> {
+        ValueRange::GreaterThan(threshold, false)
+    }
+}
+
+impl<TSortKeyComputer> SortKeyComputer for (TSortKeyComputer, ComparatorEnum)
+where
+    TSortKeyComputer: SortKeyComputer,
+    ComparatorEnum: Comparator<TSortKeyComputer::SortKey>,
+    ComparatorEnum: Comparator<
+        <<TSortKeyComputer as SortKeyComputer>::Child as SegmentSortKeyComputer>::SegmentSortKey,
+    >,
+{
+    type SortKey = TSortKeyComputer::SortKey;
+
+    type Child = SegmentSortKeyComputerWithComparator<TSortKeyComputer::Child, Self::Comparator>;
+
+    type Comparator = ComparatorEnum;
+
+    fn check_schema(&self, schema: &Schema) -> crate::Result<()> {
+        self.0.check_schema(schema)
+    }
+
+    fn requires_scoring(&self) -> bool {
+        self.0.requires_scoring()
+    }
+
+    fn comparator(&self) -> Self::Comparator {
+        self.1
+    }
+
+    fn segment_sort_key_computer(
+        &self,
+        segment_reader: &crate::SegmentReader,
+    ) -> crate::Result<Self::Child> {
+        let child = self.0.segment_sort_key_computer(segment_reader)?;
+        Ok(SegmentSortKeyComputerWithComparator {
+            segment_sort_key_computer: child,
+            comparator: self.comparator(),
+        })
+    }
+}
+
+impl<TSortKeyComputer> SortKeyComputer for (TSortKeyComputer, Order)
+where
+    TSortKeyComputer: SortKeyComputer,
+    ComparatorEnum: Comparator<TSortKeyComputer::SortKey>,
+    ComparatorEnum: Comparator<
+        <<TSortKeyComputer as SortKeyComputer>::Child as SegmentSortKeyComputer>::SegmentSortKey,
+    >,
+{
+    type SortKey = TSortKeyComputer::SortKey;
+
+    type Child = SegmentSortKeyComputerWithComparator<TSortKeyComputer::Child, Self::Comparator>;
+
+    type Comparator = ComparatorEnum;
+
+    fn check_schema(&self, schema: &Schema) -> crate::Result<()> {
+        self.0.check_schema(schema)
+    }
+
+    fn requires_scoring(&self) -> bool {
+        self.0.requires_scoring()
+    }
+
+    fn comparator(&self) -> Self::Comparator {
+        self.1.into()
+    }
+
+    fn segment_sort_key_computer(
+        &self,
+        segment_reader: &crate::SegmentReader,
+    ) -> crate::Result<Self::Child> {
+        let child = self.0.segment_sort_key_computer(segment_reader)?;
+        Ok(SegmentSortKeyComputerWithComparator {
+            segment_sort_key_computer: child,
+            comparator: self.comparator(),
+        })
+    }
+}
+
+/// A segment sort key computer with a custom ordering.
+pub struct SegmentSortKeyComputerWithComparator<TSegmentSortKeyComputer, TComparator> {
+    segment_sort_key_computer: TSegmentSortKeyComputer,
+    comparator: TComparator,
+}
+
+impl<TSegmentSortKeyComputer, TSegmentSortKey, TComparator> SegmentSortKeyComputer
+    for SegmentSortKeyComputerWithComparator<TSegmentSortKeyComputer, TComparator>
+where
+    TSegmentSortKeyComputer: SegmentSortKeyComputer<SegmentSortKey = TSegmentSortKey>,
+    TSegmentSortKey: Clone + 'static + Sync + Send,
+    TComparator: Comparator<TSegmentSortKey> + Clone + 'static + Sync + Send,
+{
+    type SortKey = TSegmentSortKeyComputer::SortKey;
+    type SegmentSortKey = TSegmentSortKey;
+    type SegmentComparator = TComparator;
+    type Buffer = TSegmentSortKeyComputer::Buffer;
+
+    fn segment_comparator(&self) -> Self::SegmentComparator {
+        self.comparator.clone()
+    }
+
+    fn segment_sort_key(&mut self, doc: DocId, score: Score) -> Self::SegmentSortKey {
+        self.segment_sort_key_computer.segment_sort_key(doc, score)
+    }
+
+    fn segment_sort_keys(
+        &mut self,
+        input_docs: &[DocId],
+        output: &mut Vec<ComparableDoc<Self::SegmentSortKey, DocId>>,
+        buffer: &mut Self::Buffer,
+        filter: ValueRange<Self::SegmentSortKey>,
+    ) {
+        self.segment_sort_key_computer
+            .segment_sort_keys(input_docs, output, buffer, filter)
+    }
+
+    #[inline(always)]
+    fn compare_segment_sort_key(
+        &self,
+        left: &Self::SegmentSortKey,
+        right: &Self::SegmentSortKey,
+    ) -> Ordering {
+        self.comparator.compare(left, right)
+    }
+
+    fn convert_segment_sort_key(&self, sort_key: Self::SegmentSortKey) -> Self::SortKey {
+        self.segment_sort_key_computer
+            .convert_segment_sort_key(sort_key)
+    }
+}
+
+#[cfg(test)]
+mod tests {
+    use super::*;
+    use crate::schema::OwnedValue;
+
+    #[test]
+    fn test_natural_none_is_higher() {
+        let comp = NaturalNoneIsHigherComparator;
+        let null = None;
+        let v1 = Some(1_u64);
+        let v2 = Some(2_u64);
+
+        // NaturalNoneIsGreaterComparator logic:
+        // 1. Delegates to NaturalComparator for non-nulls.
+        // NaturalComparator compare(2, 1) -> 2.cmp(1) -> Greater.
+        assert_eq!(comp.compare(&v2, &v1), Ordering::Greater);
+
+        // 2. Treats None (Null) as Greater than any value.
+        // compare(None, Some(2)) should be Greater.
+        assert_eq!(comp.compare(&null, &v2), Ordering::Greater);
+
+        // compare(Some(1), None) should be Less.
+        assert_eq!(comp.compare(&v1, &null), Ordering::Less);
+
+        // compare(None, None) should be Equal.
+        assert_eq!(comp.compare(&null, &null), Ordering::Equal);
+    }
+
+    #[test]
+    fn test_mixed_ownedvalue_compare() {
+        let u = OwnedValue::U64(10);
+        let i = OwnedValue::I64(10);
+        let f = OwnedValue::F64(10.0);
+
+        let nc = NaturalComparator;
+        assert_eq!(nc.compare(&u, &i), Ordering::Equal);
+        assert_eq!(nc.compare(&u, &f), Ordering::Equal);
+        assert_eq!(nc.compare(&i, &f), Ordering::Equal);
+
+        let u2 = OwnedValue::U64(11);
+        assert_eq!(nc.compare(&u2, &f), Ordering::Greater);
+
+        let s = OwnedValue::Str("a".to_string());
+        // Str < U64
+        assert_eq!(nc.compare(&s, &u), Ordering::Less);
+        // Str < I64
+        assert_eq!(nc.compare(&s, &i), Ordering::Less);
+        // Str < F64
+        assert_eq!(nc.compare(&s, &f), Ordering::Less);
+    }
+}
--- a/src/collector/sort_key/sort_by_erased_type.rs
+++ b/src/collector/sort_key/sort_by_erased_type.rs
@@ -0,0 +1,410 @@
+use columnar::{ColumnType, MonotonicallyMappableToU64, ValueRange};
+
+use crate::collector::sort_key::sort_by_score::SortBySimilarityScoreSegmentComputer;
+use crate::collector::sort_key::{
+    NaturalComparator, SortBySimilarityScore, SortByStaticFastValue, SortByString,
+};
+use crate::collector::{ComparableDoc, SegmentSortKeyComputer, SortKeyComputer};
+use crate::fastfield::FastFieldNotAvailableError;
+use crate::schema::OwnedValue;
+use crate::{DateTime, DocId, Score};
+
+/// Sort by the boxed / OwnedValue representation of either a fast field, or of the score.
+///
+/// Using the OwnedValue representation allows for type erasure, and can be useful when sort orders
+/// are not known until runtime. But it comes with a performance cost: wherever possible, prefer to
+/// use a SortKeyComputer implementation with a known-type at compile time.
+#[derive(Debug, Clone)]
+pub enum SortByErasedType {
+    /// Sort by a fast field
+    Field(String),
+    /// Sort by score
+    Score,
+}
+
+impl SortByErasedType {
+    /// Creates a new sort key computer which will sort by the given fast field column, with type
+    /// erasure.
+    pub fn for_field(column_name: impl ToString) -> Self {
+        Self::Field(column_name.to_string())
+    }
+
+    /// Creates a new sort key computer which will sort by score, with type erasure.
+    pub fn for_score() -> Self {
+        Self::Score
+    }
+}
+
+trait ErasedSegmentSortKeyComputer: Send + Sync {
+    fn segment_sort_key(&mut self, doc: DocId, score: Score) -> Option<u64>;
+    fn segment_sort_keys(
+        &mut self,
+        input_docs: &[DocId],
+        output: &mut Vec<ComparableDoc<Option<u64>, DocId>>,
+        filter: ValueRange<Option<u64>>,
+    );
+    fn convert_segment_sort_key(&self, sort_key: Option<u64>) -> OwnedValue;
+}
+
+struct ErasedSegmentSortKeyComputerWrapper<C, F>
+where
+    C: SegmentSortKeyComputer<SegmentSortKey = Option<u64>> + Send + Sync,
+    F: Fn(C::SortKey) -> OwnedValue + Send + Sync + 'static,
+{
+    inner: C,
+    converter: F,
+    buffer: C::Buffer,
+}
+
+impl<C, F> ErasedSegmentSortKeyComputer for ErasedSegmentSortKeyComputerWrapper<C, F>
+where
+    C: SegmentSortKeyComputer<SegmentSortKey = Option<u64>> + Send + Sync,
+    F: Fn(C::SortKey) -> OwnedValue + Send + Sync + 'static,
+{
+    fn segment_sort_key(&mut self, doc: DocId, score: Score) -> Option<u64> {
+        self.inner.segment_sort_key(doc, score)
+    }
+
+    fn segment_sort_keys(
+        &mut self,
+        input_docs: &[DocId],
+        output: &mut Vec<ComparableDoc<Option<u64>, DocId>>,
+        filter: ValueRange<Option<u64>>,
+    ) {
+        self.inner
+            .segment_sort_keys(input_docs, output, &mut self.buffer, filter)
+    }
+
+    fn convert_segment_sort_key(&self, sort_key: Option<u64>) -> OwnedValue {
+        let val = self.inner.convert_segment_sort_key(sort_key);
+        (self.converter)(val)
+    }
+}
+
+struct ScoreSegmentSortKeyComputer {
+    segment_computer: SortBySimilarityScoreSegmentComputer,
+}
+
+impl ErasedSegmentSortKeyComputer for ScoreSegmentSortKeyComputer {
+    fn segment_sort_key(&mut self, doc: DocId, score: Score) -> Option<u64> {
+        let score_value: f64 = self.segment_computer.segment_sort_key(doc, score).into();
+        Some(score_value.to_u64())
+    }
+
+    fn segment_sort_keys(
+        &mut self,
+        _input_docs: &[DocId],
+        _output: &mut Vec<ComparableDoc<Option<u64>, DocId>>,
+        _filter: ValueRange<Option<u64>>,
+    ) {
+        unimplemented!("Batch computation not supported for score sorting")
+    }
+
+    fn convert_segment_sort_key(&self, sort_key: Option<u64>) -> OwnedValue {
+        let score_value: u64 = sort_key.expect("This implementation always produces a score.");
+        OwnedValue::F64(f64::from_u64(score_value))
+    }
+}
+
+impl SortKeyComputer for SortByErasedType {
+    type SortKey = OwnedValue;
+    type Child = ErasedColumnSegmentSortKeyComputer;
+    type Comparator = NaturalComparator;
+
+    fn requires_scoring(&self) -> bool {
+        matches!(self, Self::Score)
+    }
+
+    fn segment_sort_key_computer(
+        &self,
+        segment_reader: &crate::SegmentReader,
+    ) -> crate::Result<Self::Child> {
+        let inner: Box<dyn ErasedSegmentSortKeyComputer> = match self {
+            Self::Field(column_name) => {
+                let fast_fields = segment_reader.fast_fields();
+                // TODO: We currently double-open the column to avoid relying on the implementation
+                // details of `SortByString` or `SortByStaticFastValue`. Once
+                // https://github.com/quickwit-oss/tantivy/issues/2776 is resolved, we should
+                // consider directly constructing the appropriate `SegmentSortKeyComputer` type for
+                // the column that we open here.
+                let (_column, column_type) =
+                    fast_fields.u64_lenient(column_name)?.ok_or_else(|| {
+                        FastFieldNotAvailableError {
+                            field_name: column_name.to_owned(),
+                        }
+                    })?;
+
+                match column_type {
+                    ColumnType::Str => {
+                        let computer = SortByString::for_field(column_name);
+                        let inner = computer.segment_sort_key_computer(segment_reader)?;
+                        Box::new(ErasedSegmentSortKeyComputerWrapper {
+                            inner,
+                            converter: |val: Option<String>| {
+                                val.map(OwnedValue::Str).unwrap_or(OwnedValue::Null)
+                            },
+                            buffer: Default::default(),
+                        })
+                    }
+                    ColumnType::U64 => {
+                        let computer = SortByStaticFastValue::<u64>::for_field(column_name);
+                        let inner = computer.segment_sort_key_computer(segment_reader)?;
+                        Box::new(ErasedSegmentSortKeyComputerWrapper {
+                            inner,
+                            converter: |val: Option<u64>| {
+                                val.map(OwnedValue::U64).unwrap_or(OwnedValue::Null)
+                            },
+                            buffer: Default::default(),
+                        })
+                    }
+                    ColumnType::I64 => {
+                        let computer = SortByStaticFastValue::<i64>::for_field(column_name);
+                        let inner = computer.segment_sort_key_computer(segment_reader)?;
+                        Box::new(ErasedSegmentSortKeyComputerWrapper {
+                            inner,
+                            converter: |val: Option<i64>| {
+                                val.map(OwnedValue::I64).unwrap_or(OwnedValue::Null)
+                            },
+                            buffer: Default::default(),
+                        })
+                    }
+                    ColumnType::F64 => {
+                        let computer = SortByStaticFastValue::<f64>::for_field(column_name);
+                        let inner = computer.segment_sort_key_computer(segment_reader)?;
+                        Box::new(ErasedSegmentSortKeyComputerWrapper {
+                            inner,
+                            converter: |val: Option<f64>| {
+                                val.map(OwnedValue::F64).unwrap_or(OwnedValue::Null)
+                            },
+                            buffer: Default::default(),
+                        })
+                    }
+                    ColumnType::Bool => {
+                        let computer = SortByStaticFastValue::<bool>::for_field(column_name);
+                        let inner = computer.segment_sort_key_computer(segment_reader)?;
+                        Box::new(ErasedSegmentSortKeyComputerWrapper {
+                            inner,
+                            converter: |val: Option<bool>| {
+                                val.map(OwnedValue::Bool).unwrap_or(OwnedValue::Null)
+                            },
+                            buffer: Default::default(),
+                        })
+                    }
+                    ColumnType::DateTime => {
+                        let computer = SortByStaticFastValue::<DateTime>::for_field(column_name);
+                        let inner = computer.segment_sort_key_computer(segment_reader)?;
+                        Box::new(ErasedSegmentSortKeyComputerWrapper {
+                            inner,
+                            converter: |val: Option<DateTime>| {
+                                val.map(OwnedValue::Date).unwrap_or(OwnedValue::Null)
+                            },
+                            buffer: Default::default(),
+                        })
+                    }
+                    column_type => {
+                        return Err(crate::TantivyError::SchemaError(format!(
+                            "Field `{}` is of type {column_type:?}, which is not supported for \
+                             sorting by owned value yet.",
+                            column_name
+                        )))
+                    }
+                }
+            }
+            Self::Score => Box::new(ScoreSegmentSortKeyComputer {
+                segment_computer: SortBySimilarityScore
+                    .segment_sort_key_computer(segment_reader)?,
+            }),
+        };
+        Ok(ErasedColumnSegmentSortKeyComputer { inner })
+    }
+}
+
+pub struct ErasedColumnSegmentSortKeyComputer {
+    inner: Box<dyn ErasedSegmentSortKeyComputer>,
+}
+
+impl SegmentSortKeyComputer for ErasedColumnSegmentSortKeyComputer {
+    type SortKey = OwnedValue;
+    type SegmentSortKey = Option<u64>;
+    type SegmentComparator = NaturalComparator;
+    type Buffer = ();
+
+    #[inline(always)]
+    fn segment_sort_key(&mut self, doc: DocId, score: Score) -> Option<u64> {
+        self.inner.segment_sort_key(doc, score)
+    }
+
+    fn segment_sort_keys(
+        &mut self,
+        input_docs: &[DocId],
+        output: &mut Vec<ComparableDoc<Self::SegmentSortKey, DocId>>,
+        _buffer: &mut Self::Buffer,
+        filter: ValueRange<Self::SegmentSortKey>,
+    ) {
+        self.inner.segment_sort_keys(input_docs, output, filter)
+    }
+
+    fn convert_segment_sort_key(&self, segment_sort_key: Self::SegmentSortKey) -> OwnedValue {
+        self.inner.convert_segment_sort_key(segment_sort_key)
+    }
+}
+
+#[cfg(test)]
+mod tests {
+    use crate::collector::sort_key::{ComparatorEnum, SortByErasedType};
+    use crate::collector::TopDocs;
+    use crate::query::AllQuery;
+    use crate::schema::{OwnedValue, Schema, FAST, TEXT};
+    use crate::Index;
+
+    #[test]
+    fn test_sort_by_owned_u64() {
+        let mut schema_builder = Schema::builder();
+        let id_field = schema_builder.add_u64_field("id", FAST);
+        let schema = schema_builder.build();
+        let index = Index::create_in_ram(schema);
+        let mut writer = index.writer_for_tests().unwrap();
+        writer.add_document(doc!(id_field => 10u64)).unwrap();
+        writer.add_document(doc!(id_field => 2u64)).unwrap();
+        writer.add_document(doc!()).unwrap();
+        writer.commit().unwrap();
+
+        let reader = index.reader().unwrap();
+        let searcher = reader.searcher();
+
+        let collector = TopDocs::with_limit(10)
+            .order_by((SortByErasedType::for_field("id"), ComparatorEnum::Natural));
+        let top_docs = searcher.search(&AllQuery, &collector).unwrap();
+
+        let values: Vec<OwnedValue> = top_docs.into_iter().map(|(key, _)| key).collect();
+
+        assert_eq!(
+            values,
+            vec![OwnedValue::U64(10), OwnedValue::U64(2), OwnedValue::Null]
+        );
+
+        let collector = TopDocs::with_limit(10).order_by((
+            SortByErasedType::for_field("id"),
+            ComparatorEnum::ReverseNoneLower,
+        ));
+        let top_docs = searcher.search(&AllQuery, &collector).unwrap();
+
+        let values: Vec<OwnedValue> = top_docs.into_iter().map(|(key, _)| key).collect();
+
+        assert_eq!(
+            values,
+            vec![OwnedValue::U64(2), OwnedValue::U64(10), OwnedValue::Null]
+        );
+    }
+
+    #[test]
+    fn test_sort_by_owned_string() {
+        let mut schema_builder = Schema::builder();
+        let city_field = schema_builder.add_text_field("city", FAST | TEXT);
+        let schema = schema_builder.build();
+        let index = Index::create_in_ram(schema);
+        let mut writer = index.writer_for_tests().unwrap();
+        writer.add_document(doc!(city_field => "tokyo")).unwrap();
+        writer.add_document(doc!(city_field => "austin")).unwrap();
+        writer.add_document(doc!()).unwrap();
+        writer.commit().unwrap();
+
+        let reader = index.reader().unwrap();
+        let searcher = reader.searcher();
+
+        let collector = TopDocs::with_limit(10).order_by((
+            SortByErasedType::for_field("city"),
+            ComparatorEnum::ReverseNoneLower,
+        ));
+        let top_docs = searcher.search(&AllQuery, &collector).unwrap();
+
+        let values: Vec<OwnedValue> = top_docs.into_iter().map(|(key, _)| key).collect();
+
+        assert_eq!(
+            values,
+            vec![
+                OwnedValue::Str("austin".to_string()),
+                OwnedValue::Str("tokyo".to_string()),
+                OwnedValue::Null
+            ]
+        );
+    }
+
+    #[test]
+    fn test_sort_by_owned_reverse() {
+        let mut schema_builder = Schema::builder();
+        let id_field = schema_builder.add_u64_field("id", FAST);
+        let schema = schema_builder.build();
+        let index = Index::create_in_ram(schema);
+        let mut writer = index.writer_for_tests().unwrap();
+        writer.add_document(doc!(id_field => 10u64)).unwrap();
+        writer.add_document(doc!(id_field => 2u64)).unwrap();
+        writer.add_document(doc!()).unwrap();
+        writer.commit().unwrap();
+
+        let reader = index.reader().unwrap();
+        let searcher = reader.searcher();
+
+        let collector = TopDocs::with_limit(10)
+            .order_by((SortByErasedType::for_field("id"), ComparatorEnum::Reverse));
+        let top_docs = searcher.search(&AllQuery, &collector).unwrap();
+
+        let values: Vec<OwnedValue> = top_docs.into_iter().map(|(key, _)| key).collect();
+
+        assert_eq!(
+            values,
+            vec![OwnedValue::Null, OwnedValue::U64(2), OwnedValue::U64(10)]
+        );
+    }
+
+    #[test]
+    fn test_sort_by_owned_score() {
+        let mut schema_builder = Schema::builder();
+        let body_field = schema_builder.add_text_field("body", TEXT);
+        let schema = schema_builder.build();
+        let index = Index::create_in_ram(schema);
+        let mut writer = index.writer_for_tests().unwrap();
+        writer.add_document(doc!(body_field => "a a")).unwrap();
+        writer.add_document(doc!(body_field => "a")).unwrap();
+        writer.commit().unwrap();
+
+        let reader = index.reader().unwrap();
+        let searcher = reader.searcher();
+        let query_parser = crate::query::QueryParser::for_index(&index, vec![body_field]);
+        let query = query_parser.parse_query("a").unwrap();
+
+        // Sort by score descending (Natural)
+        let collector = TopDocs::with_limit(10)
+            .order_by((SortByErasedType::for_score(), ComparatorEnum::Natural));
+        let top_docs = searcher.search(&query, &collector).unwrap();
+
+        let values: Vec<f64> = top_docs
+            .into_iter()
+            .map(|(key, _)| match key {
+                OwnedValue::F64(val) => val,
+                _ => panic!("Wrong type {key:?}"),
+            })
+            .collect();
+
+        assert_eq!(values.len(), 2);
+        assert!(values[0] > values[1]);
+
+        // Sort by score ascending (ReverseNoneLower)
+        let collector = TopDocs::with_limit(10).order_by((
+            SortByErasedType::for_score(),
+            ComparatorEnum::ReverseNoneLower,
+        ));
+        let top_docs = searcher.search(&query, &collector).unwrap();
+
+        let values: Vec<f64> = top_docs
+            .into_iter()
+            .map(|(key, _)| match key {
+                OwnedValue::F64(val) => val,
+                _ => panic!("Wrong type {key:?}"),
+            })
+            .collect();
+
+        assert_eq!(values.len(), 2);
+        assert!(values[0] < values[1]);
+    }
+}
--- a/src/collector/sort_key/sort_by_score.rs
+++ b/src/collector/sort_key/sort_by_score.rs
@@ -0,0 +1,92 @@
+use columnar::ValueRange;
+
+use crate::collector::sort_key::NaturalComparator;
+use crate::collector::{ComparableDoc, SegmentSortKeyComputer, SortKeyComputer, TopNComputer};
+use crate::{DocAddress, DocId, Score};
+
+/// Sort by similarity score.
+#[derive(Clone, Debug, Copy)]
+pub struct SortBySimilarityScore;
+
+impl SortKeyComputer for SortBySimilarityScore {
+    type SortKey = Score;
+
+    type Child = SortBySimilarityScoreSegmentComputer;
+
+    type Comparator = NaturalComparator;
+
+    fn requires_scoring(&self) -> bool {
+        true
+    }
+
+    fn segment_sort_key_computer(
+        &self,
+        _segment_reader: &crate::SegmentReader,
+    ) -> crate::Result<Self::Child> {
+        Ok(SortBySimilarityScoreSegmentComputer)
+    }
+
+    // Sorting by score is special in that it allows for the Block-Wand optimization.
+    fn collect_segment_top_k(
+        &self,
+        k: usize,
+        weight: &dyn crate::query::Weight,
+        reader: &crate::SegmentReader,
+        segment_ord: u32,
+    ) -> crate::Result<Vec<(Self::SortKey, DocAddress)>> {
+        let mut top_n: TopNComputer<Score, DocId, Self::Comparator> =
+            TopNComputer::new_with_comparator(k, self.comparator());
+
+        if let Some(alive_bitset) = reader.alive_bitset() {
+            let mut threshold = Score::MIN;
+            top_n.threshold = Some(threshold);
+            weight.for_each_pruning(Score::MIN, reader, &mut |doc, score| {
+                if alive_bitset.is_deleted(doc) {
+                    return threshold;
+                }
+                top_n.push(score, doc);
+                threshold = top_n.threshold.unwrap_or(Score::MIN);
+                threshold
+            })?;
+        } else {
+            weight.for_each_pruning(Score::MIN, reader, &mut |doc, score| {
+                top_n.push(score, doc);
+                top_n.threshold.unwrap_or(Score::MIN)
+            })?;
+        }
+
+        Ok(top_n
+            .into_vec()
+            .into_iter()
+            .map(|cid| (cid.sort_key, DocAddress::new(segment_ord, cid.doc)))
+            .collect())
+    }
+}
+
+pub struct SortBySimilarityScoreSegmentComputer;
+
+impl SegmentSortKeyComputer for SortBySimilarityScoreSegmentComputer {
+    type SortKey = Score;
+    type SegmentSortKey = Score;
+    type SegmentComparator = NaturalComparator;
+    type Buffer = ();
+
+    #[inline(always)]
+    fn segment_sort_key(&mut self, _doc: DocId, score: Score) -> Score {
+        score
+    }
+
+    fn segment_sort_keys(
+        &mut self,
+        _input_docs: &[DocId],
+        _output: &mut Vec<ComparableDoc<Self::SegmentSortKey, DocId>>,
+        _buffer: &mut Self::Buffer,
+        _filter: ValueRange<Self::SegmentSortKey>,
+    ) {
+        unimplemented!("Batch computation not supported for score sorting")
+    }
+
+    fn convert_segment_sort_key(&self, score: Score) -> Score {
+        score
+    }
+}
--- a/src/collector/sort_key/sort_by_static_fast_value.rs
+++ b/src/collector/sort_key/sort_by_static_fast_value.rs
@@ -0,0 +1,194 @@
+use std::marker::PhantomData;
+
+use columnar::{Column, ValueRange};
+
+use crate::collector::sort_key::sort_key_computer::convert_optional_u64_range_to_u64_range;
+use crate::collector::sort_key::NaturalComparator;
+use crate::collector::{ComparableDoc, SegmentSortKeyComputer, SortKeyComputer};
+use crate::fastfield::{FastFieldNotAvailableError, FastValue};
+use crate::{DocId, Score, SegmentReader};
+
+/// Sorts by a fast value (u64, i64, f64, bool).
+///
+/// The field must appear explicitly in the schema, with the right type, and declared as
+/// a fast field..
+///
+/// If the field is multivalued, only the first value is considered.
+///
+/// Documents that do not have this value are still considered.
+/// Their sort key will simply be `None`.
+#[derive(Debug, Clone)]
+pub struct SortByStaticFastValue<T: FastValue> {
+    field: String,
+    typ: PhantomData<T>,
+}
+
+impl<T: FastValue> SortByStaticFastValue<T> {
+    /// Creates a new `SortByStaticFastValue` instance for the given field.
+    pub fn for_field(column_name: impl ToString) -> SortByStaticFastValue<T> {
+        Self {
+            field: column_name.to_string(),
+            typ: PhantomData,
+        }
+    }
+}
+
+impl<T: FastValue> SortKeyComputer for SortByStaticFastValue<T> {
+    type Child = SortByFastValueSegmentSortKeyComputer<T>;
+    type SortKey = Option<T>;
+    type Comparator = NaturalComparator;
+
+    fn check_schema(&self, schema: &crate::schema::Schema) -> crate::Result<()> {
+        // At the segment sort key computer level, we rely on the u64 representation.
+        // The mapping is monotonic, so it is sufficient to compute our top-K docs.
+        let field = schema.get_field(&self.field)?;
+        let field_entry = schema.get_field_entry(field);
+        if !field_entry.is_fast() {
+            return Err(crate::TantivyError::SchemaError(format!(
+                "Field `{}` is not a fast field.",
+                self.field,
+            )));
+        }
+        let schema_type = field_entry.field_type().value_type();
+        if schema_type != T::to_type() {
+            return Err(crate::TantivyError::SchemaError(format!(
+                "Field `{}` is of type {schema_type:?}, not of the type {:?}.",
+                &self.field,
+                T::to_type()
+            )));
+        }
+        Ok(())
+    }
+
+    fn segment_sort_key_computer(
+        &self,
+        segment_reader: &SegmentReader,
+    ) -> crate::Result<Self::Child> {
+        let sort_column_opt = segment_reader.fast_fields().u64_lenient(&self.field)?;
+        let (sort_column, _sort_column_type) =
+            sort_column_opt.ok_or_else(|| FastFieldNotAvailableError {
+                field_name: self.field.clone(),
+            })?;
+        Ok(SortByFastValueSegmentSortKeyComputer {
+            sort_column,
+            typ: PhantomData,
+        })
+    }
+}
+
+pub struct SortByFastValueSegmentSortKeyComputer<T> {
+    sort_column: Column<u64>,
+    typ: PhantomData<T>,
+}
+
+impl<T: FastValue> SegmentSortKeyComputer for SortByFastValueSegmentSortKeyComputer<T> {
+    type SortKey = Option<T>;
+    type SegmentSortKey = Option<u64>;
+    type SegmentComparator = NaturalComparator;
+    type Buffer = ();
+
+    #[inline(always)]
+    fn segment_sort_key(&mut self, doc: DocId, _score: Score) -> Self::SegmentSortKey {
+        self.sort_column.first(doc)
+    }
+
+    fn segment_sort_keys(
+        &mut self,
+        input_docs: &[DocId],
+        output: &mut Vec<ComparableDoc<Self::SegmentSortKey, DocId>>,
+        _buffer: &mut Self::Buffer,
+        filter: ValueRange<Self::SegmentSortKey>,
+    ) {
+        let u64_filter = convert_optional_u64_range_to_u64_range(filter);
+        self.sort_column
+            .first_vals_in_value_range(input_docs, output, u64_filter);
+    }
+
+    fn convert_segment_sort_key(&self, sort_key: Self::SegmentSortKey) -> Self::SortKey {
+        sort_key.map(T::from_u64)
+    }
+}
+
+#[cfg(test)]
+mod tests {
+    use super::*;
+    use crate::schema::{Schema, FAST};
+    use crate::Index;
+
+    #[test]
+    fn test_sort_by_fast_value_batch() {
+        let mut schema_builder = Schema::builder();
+        let field_col = schema_builder.add_u64_field("field", FAST);
+        let schema = schema_builder.build();
+        let index = Index::create_in_ram(schema);
+        let mut index_writer = index.writer_for_tests().unwrap();
+
+        index_writer
+            .add_document(crate::doc!(field_col => 10u64))
+            .unwrap();
+        index_writer
+            .add_document(crate::doc!(field_col => 20u64))
+            .unwrap();
+        index_writer.add_document(crate::doc!()).unwrap();
+        index_writer.commit().unwrap();
+
+        let reader = index.reader().unwrap();
+        let searcher = reader.searcher();
+        let segment_reader = searcher.segment_reader(0);
+
+        let sorter = SortByStaticFastValue::<u64>::for_field("field");
+        let mut computer = sorter.segment_sort_key_computer(segment_reader).unwrap();
+
+        let mut docs = vec![0, 1, 2];
+        let mut output = Vec::new();
+        let mut buffer = ();
+        computer.segment_sort_keys(&mut docs, &mut output, &mut buffer, ValueRange::All);
+
+        assert_eq!(
+            output.iter().map(|c| c.sort_key).collect::<Vec<_>>(),
+            &[Some(10), Some(20), None]
+        );
+        assert_eq!(output.iter().map(|c| c.doc).collect::<Vec<_>>(), &[0, 1, 2]);
+    }
+
+    #[test]
+    fn test_sort_by_fast_value_batch_with_filter() {
+        let mut schema_builder = Schema::builder();
+        let field_col = schema_builder.add_u64_field("field", FAST);
+        let schema = schema_builder.build();
+        let index = Index::create_in_ram(schema);
+        let mut index_writer = index.writer_for_tests().unwrap();
+
+        index_writer
+            .add_document(crate::doc!(field_col => 10u64))
+            .unwrap();
+        index_writer
+            .add_document(crate::doc!(field_col => 20u64))
+            .unwrap();
+        index_writer.add_document(crate::doc!()).unwrap();
+        index_writer.commit().unwrap();
+
+        let reader = index.reader().unwrap();
+        let searcher = reader.searcher();
+        let segment_reader = searcher.segment_reader(0);
+
+        let sorter = SortByStaticFastValue::<u64>::for_field("field");
+        let mut computer = sorter.segment_sort_key_computer(segment_reader).unwrap();
+
+        let mut docs = vec![0, 1, 2];
+        let mut output = Vec::new();
+        let mut buffer = ();
+        computer.segment_sort_keys(
+            &mut docs,
+            &mut output,
+            &mut buffer,
+            ValueRange::GreaterThan(Some(15u64), false /* inclusive */),
+        );
+
+        assert_eq!(
+            output.iter().map(|c| c.sort_key).collect::<Vec<_>>(),
+            &[Some(20)]
+        );
+        assert_eq!(output.iter().map(|c| c.doc).collect::<Vec<_>>(), &[1]);
+    }
+}
--- a/src/collector/sort_key/sort_by_string.rs
+++ b/src/collector/sort_key/sort_by_string.rs
@@ -0,0 +1,185 @@
+use columnar::{StrColumn, ValueRange};
+
+use crate::collector::sort_key::sort_key_computer::{
+    convert_optional_u64_range_to_u64_range, range_contains_none,
+};
+use crate::collector::sort_key::NaturalComparator;
+use crate::collector::{ComparableDoc, SegmentSortKeyComputer, SortKeyComputer};
+use crate::termdict::TermOrdinal;
+use crate::{DocId, Score};
+
+/// Sort by the first value of a string column.
+///
+/// The string can be dynamic (coming from a json field)
+/// or static (being specificaly defined in the configuration).
+///
+/// If the field is multivalued, only the first value is considered.
+///
+/// Documents that do not have this value are still considered.
+/// Their sort key will simply be `None`.
+#[derive(Debug, Clone)]
+pub struct SortByString {
+    column_name: String,
+}
+
+impl SortByString {
+    /// Creates a new sort by string sort key computer.
+    pub fn for_field(column_name: impl ToString) -> Self {
+        SortByString {
+            column_name: column_name.to_string(),
+        }
+    }
+}
+
+impl SortKeyComputer for SortByString {
+    type SortKey = Option<String>;
+    type Child = ByStringColumnSegmentSortKeyComputer;
+    type Comparator = NaturalComparator;
+
+    fn segment_sort_key_computer(
+        &self,
+        segment_reader: &crate::SegmentReader,
+    ) -> crate::Result<Self::Child> {
+        let str_column_opt = segment_reader.fast_fields().str(&self.column_name)?;
+        Ok(ByStringColumnSegmentSortKeyComputer { str_column_opt })
+    }
+}
+
+pub struct ByStringColumnSegmentSortKeyComputer {
+    str_column_opt: Option<StrColumn>,
+}
+
+impl SegmentSortKeyComputer for ByStringColumnSegmentSortKeyComputer {
+    type SortKey = Option<String>;
+    type SegmentSortKey = Option<TermOrdinal>;
+    type SegmentComparator = NaturalComparator;
+    type Buffer = ();
+
+    #[inline(always)]
+    fn segment_sort_key(&mut self, doc: DocId, _score: Score) -> Option<TermOrdinal> {
+        let str_column = self.str_column_opt.as_ref()?;
+        str_column.ords().first(doc)
+    }
+
+    fn segment_sort_keys(
+        &mut self,
+        input_docs: &[DocId],
+        output: &mut Vec<ComparableDoc<Self::SegmentSortKey, DocId>>,
+        _buffer: &mut Self::Buffer,
+        filter: ValueRange<Self::SegmentSortKey>,
+    ) {
+        if let Some(str_column) = &self.str_column_opt {
+            let u64_filter = convert_optional_u64_range_to_u64_range(filter);
+            str_column
+                .ords()
+                .first_vals_in_value_range(input_docs, output, u64_filter);
+        } else if range_contains_none(&filter) {
+            for &doc in input_docs {
+                output.push(ComparableDoc {
+                    doc,
+                    sort_key: None,
+                });
+            }
+        }
+    }
+
+    fn convert_segment_sort_key(&self, term_ord_opt: Option<TermOrdinal>) -> Option<String> {
+        // TODO: Individual lookups to the dictionary like this are very likely to repeatedly
+        // decompress the same blocks. See https://github.com/quickwit-oss/tantivy/issues/2776
+        let term_ord = term_ord_opt?;
+        let str_column = self.str_column_opt.as_ref()?;
+        let mut bytes = Vec::new();
+        str_column
+            .dictionary()
+            .ord_to_term(term_ord, &mut bytes)
+            .ok()?;
+        String::try_from(bytes).ok()
+    }
+}
+
+#[cfg(test)]
+mod tests {
+    use super::*;
+    use crate::schema::{Schema, FAST, TEXT};
+    use crate::Index;
+
+    #[test]
+    fn test_sort_by_string_batch() {
+        let mut schema_builder = Schema::builder();
+        let field_col = schema_builder.add_text_field("field", FAST | TEXT);
+        let schema = schema_builder.build();
+        let index = Index::create_in_ram(schema);
+        let mut index_writer = index.writer_for_tests().unwrap();
+
+        index_writer
+            .add_document(crate::doc!(field_col => "a"))
+            .unwrap();
+        index_writer
+            .add_document(crate::doc!(field_col => "c"))
+            .unwrap();
+        index_writer.add_document(crate::doc!()).unwrap();
+        index_writer.commit().unwrap();
+
+        let reader = index.reader().unwrap();
+        let searcher = reader.searcher();
+        let segment_reader = searcher.segment_reader(0);
+
+        let sorter = SortByString::for_field("field");
+        let mut computer = sorter.segment_sort_key_computer(segment_reader).unwrap();
+
+        let mut docs = vec![0, 1, 2];
+        let mut output = Vec::new();
+        let mut buffer = ();
+        computer.segment_sort_keys(&mut docs, &mut output, &mut buffer, ValueRange::All);
+
+        assert_eq!(
+            output.iter().map(|c| c.sort_key).collect::<Vec<_>>(),
+            &[Some(0), Some(1), None]
+        );
+        assert_eq!(output.iter().map(|c| c.doc).collect::<Vec<_>>(), &[0, 1, 2]);
+    }
+
+    #[test]
+    fn test_sort_by_string_batch_with_filter() {
+        let mut schema_builder = Schema::builder();
+        let field_col = schema_builder.add_text_field("field", FAST | TEXT);
+        let schema = schema_builder.build();
+        let index = Index::create_in_ram(schema);
+        let mut index_writer = index.writer_for_tests().unwrap();
+
+        index_writer
+            .add_document(crate::doc!(field_col => "a"))
+            .unwrap();
+        index_writer
+            .add_document(crate::doc!(field_col => "c"))
+            .unwrap();
+        index_writer.add_document(crate::doc!()).unwrap();
+        index_writer.commit().unwrap();
+
+        let reader = index.reader().unwrap();
+        let searcher = reader.searcher();
+        let segment_reader = searcher.segment_reader(0);
+
+        let sorter = SortByString::for_field("field");
+        let mut computer = sorter.segment_sort_key_computer(segment_reader).unwrap();
+
+        let mut docs = vec![0, 1, 2];
+        let mut output = Vec::new();
+        // Filter: > "b". "a" is 0, "c" is 1.
+        // We want > "a" (ord 0). So we filter > ord 0.
+        // 0 is "a", 1 is "c".
+        let mut buffer = ();
+        computer.segment_sort_keys(
+            &mut docs,
+            &mut output,
+            &mut buffer,
+            ValueRange::GreaterThan(Some(0), false /* inclusive */),
+        );
+
+        assert_eq!(
+            output.iter().map(|c| c.sort_key).collect::<Vec<_>>(),
+            &[Some(1)]
+        );
+        assert_eq!(output.iter().map(|c| c.doc).collect::<Vec<_>>(), &[1]);
+    }
+}
--- a/Show More
+++ b/Show More
Author	SHA1	Message	Date
Stu Hood	db9e35e7ee	Property test for `Comparator`/`ValueRange` consistency, and fixes.	2026-01-04 19:19:08 -08:00
Stu Hood	7f39d5eab9	`test_order_by_u64_prop`	2026-01-04 15:23:30 -08:00
Stu Hood	af53ffe5df	Use a `Buffer` generic scratch buffer parameter on `TopNComputer` and push directly from `ColumnValues` into a `TopNComputer` buffer in some cases.	2026-01-04 15:23:28 -08:00
Stu Hood	041c6f01a3	Convert test_order_by_compound_filtering_with_none to a proptest.	2026-01-04 15:16:05 -08:00
Stu Hood	9615eb73b8	Implement `collect_block` for lazy scorers using `SegmentSortKeyComputer::segment_sort_keys`.	2026-01-04 15:16:00 -08:00
Paul Masurel	77505c3d03	Making stemming optional. (#2791 ) Fixed code and CI to run on no default features. Co-authored-by: Paul Masurel <paul.masurel@datadoghq.com>	2026-01-02 12:40:42 +01:00
PSeitz	735c588f4f	fix union performance regression (#2790 ) * add inlines * fix union performance regression Remove unwrap from hotpath generates better assembly. closes #2788	2026-01-02 12:06:51 +01:00
PSeitz	242a1531bf	fix flaky test (#2784 ) Signed-off-by: Pascal Seitz <pascal.seitz@gmail.com>	2026-01-02 11:30:51 +01:00
trinity-1686a	6443b63177	document 1bit hole and some queries supporting running with just fastfield (#2779 ) * add small doc on some queries using fast field when not indexed * document 1 unused bit in skiplist	2026-01-02 10:32:37 +01:00
Stu Hood	4987495ee4	Add an erased `SortKeyComputer` to sort on types which are not known until runtime (#2770 ) * Remove PartialOrd bound on compared values. * Fix declared `SortKey` type of `impl<..> SortKeyComputer for (HeadSortKeyComputer, TailSortKeyComputer)` * Add a SortByOwnedValue implementation to provide a type-erased column. * Add support for comparing mismatched `OwnedValue` types. * Support JSON columns. * Refer to https://github.com/quickwit-oss/tantivy/issues/2776 * Rename to `SortByErasedType`. * Comment on transitivity. Co-authored-by: Paul Masurel <paul@quickwit.io> * Fix clippy warnings in new code. --------- Co-authored-by: Paul Masurel <paul@quickwit.io>	2026-01-02 10:28:47 +01:00
Paul Masurel	b11605f045	Addressing clippy comments (#2789 ) Co-authored-by: Paul Masurel <paul.masurel@datadoghq.com>	2025-12-31 18:02:00 +01:00
ChangRui-Ryan	75d7989cc6	add benchmark for boolean query with range sub query (#2787 )	2025-12-31 12:00:53 +01:00
PSeitz	923f0508f2	seek_exact + cost based intersection (#2538 ) * seek_exact + cost based intersection Adds `seek_exact` and `cost` to `DocSet` for a more efficient intersection. Unlike `seek`, `seek_exact` does not require the DocSet to advance to the next hit, if the target does not exist. `cost` allows to address the different DocSet types and their cost model and is used to determine the DocSet that drives the intersection. E.g. fast field range queries may do a full scan. Phrase queries load the positions to check if a we have a hit. They both have a higher cost than their size_hint would suggest. Improves `size_hint` estimation for intersection and union, by having a estimation based on random distribution with a co-location factor. Refactor range query benchmark. Closes #2531 Future Work Implement `seek_exact` for BufferedUnionScorer and RangeDocSet (fast field range queries) Evaluate replacing `seek` with `seek_exact` to reduce code complexity * Apply suggestions from code review Co-authored-by: Paul Masurel <paul@quickwit.io> * add API contract verfication * impl seek_exact on union * rename seek_exact * add mixed AND OR test, fix buffered_union * Add a proptest of BooleanQuery. (#2690) * fix build * Increase the document count. * fix merge conflict * fix debug assert * Fix compilation errors after rebase - Remove duplicate proptest_boolean_query module - Remove duplicate cost() method implementations - Fix TopDocs API usage (add .order_by_score()) - Remove duplicate imports - Remove unused variable assignments --------- Co-authored-by: Paul Masurel <paul@quickwit.io> Co-authored-by: Pascal Seitz <pascal.seitz@datadoghq.com> Co-authored-by: Stu Hood <stuhood@gmail.com>	2025-12-30 14:43:25 +01:00
ChangRui-Ryan	e0b62e00ac	optimize RangeDocSet for non-overlapping query ranges (#2783 )	2025-12-29 16:55:28 +01:00
Stu Hood	ce97beb86f	Add support for natural-order-with-none-highest in `TopDocs::order_by` (#2780 ) * Add `ComparatorEnum::NaturalNoneHigher`. * Fix comments.	2025-12-23 09:22:20 +01:00
Stu Hood	c0f21a45ae	Use a strict comparison in TopNComputer (#2777 ) * Remove `(Partial)Ord` from `ComparableDoc`, and unify comparison between `TopNComputer` and `Comparator`. * Doc cleanups. * Require Ord for `ComparableDoc`. * Semantics are actually _ascending_ DocId order. * Adjust docs again for ascending DocId order. * minor change --------- Co-authored-by: Paul Masurel <paul.masurel@datadoghq.com>	2025-12-18 12:13:23 +01:00
Moe	73657dff77	fix: fixed integer overflow in ExpUnrolledLinkedList for large datasets (#2735 ) * Fixed the overflow issue. * Fixed lint issues. * Applied PR fixes. * Fixed a lint issue.	2025-12-16 22:57:12 +01:00
Moe	e3c9be1f92	fix: boolean query incorrectly dropping documents when AllScorer is present (#2760 ) * Fixed the range issue. * Fixed the second all scorer issue * Improved docs + tests * Improved code. * Fixed lint issues. * Improved tests + logic based on PR comments. * Fixed lint issues. * Increase the document count. * Improved the prop-tests * Expand the index size, and remove unused parameter. --------- Co-authored-by: Stu Hood <stuhood@gmail.com>	2025-12-16 22:52:02 +01:00
Ming	ba61ed6ef3	fix: vint buffer can overflow (#2778 ) * fix vint overflow * comment	2025-12-16 22:50:41 +01:00
trinity-1686a	d0e1600135	fix bug with minimum_should_match and AllScorer (#2774 )	2025-12-14 10:10:45 +01:00
PSeitz-dd	e9020d17d4	fix coverage (#2769 )	2025-12-11 11:35:58 +01:00
PSeitz-dd	5ba0031f7d	move rand_distr to dev_dep (#2772 )	2025-12-11 18:23:50 +08:00
Philippe Noël	22dde8f9ae	chore: Make some delete-related functions public (#46 ) (#2766 ) Co-authored-by: Ming <ming.ying.nyc@gmail.com>	2025-12-11 01:22:15 +01:00
Philippe Noël	14cc24614e	Make DeleteMeta pub (#2765 ) Co-authored-by: Ming Ying <ming.ying.nyc@gmail.com>	2025-12-11 00:11:03 +01:00
Philippe Noël	8a1079b2dc	expose AddOperation and with_max_doc (#7 ) (#2762 ) Co-authored-by: Ming <ming.ying.nyc@gmail.com>	2025-12-11 00:10:42 +01:00
Philippe Noël	794ff1ffc9	chore: Make `Language` hashable (#79 ) (#2763 ) Co-authored-by: Ming <ming.ying.nyc@gmail.com>	2025-12-10 15:38:43 +01:00
PSeitz-dd	c6912ce89a	Handle JSON fields and columnar in space_usage (#2761 ) return field names in space_usage instead of `Field` more detailed info for columns	2025-12-10 20:33:33 +08:00
PSeitz	618e3bd11b	Term and IndexingTerm cleanup (#2750 ) * refactor term * add deprecated functions --------- Co-authored-by: Pascal Seitz <pascal.seitz@datadoghq.com>	2025-12-05 09:48:40 +08:00
PSeitz	b2f99c6217	add term->histogram benchmark (#2758 ) * add term->histogram benchmark * add more term aggs --------- Co-authored-by: Pascal Seitz <pascal.seitz@datadoghq.com>	2025-12-04 02:29:37 +01:00
PSeitz	76de5bab6f	fix unsafe warnings (#2757 )	2025-12-03 20:15:21 +08:00
rustmailer	b7eb31162b	docs: add usage example to README (#2743 )	2025-12-02 21:56:57 +01:00
Paul Masurel	63c66005db	Lazy scorers (#2726 ) * Refactoring of the score tweaker into `SortKeyComputer`s to unlock two features. - Allow lazy evaluation of score. As soon as we identified that a doc won't reach the topK threshold, we can stop the evaluation. - Allow for a different segment level score, segment level score and their conversion. This PR breaks public API, but fixing code is straightforward. * Bumping tantivy version --------- Co-authored-by: Paul Masurel <paul.masurel@datadoghq.com>	2025-12-01 15:38:57 +01:00
Paul Masurel	7d513a44c5	Added some benchmark for top K by a fast field (#2754 ) Also removed query parsing from the bench code. Co-authored-by: Paul Masurel <paul.masurel@datadoghq.com>	2025-12-01 14:58:29 +01:00
Stu Hood	ca87fcd454	Implement `collect_block` for `Collector`s which wrap other `Collector`s (#2727 ) * Implement `collect_block` for tuple Collectors, and for MultiCollector. * Two more.	2025-12-01 12:26:29 +01:00
Ang	08a92675dc	Fix typos again (#2753 ) Found via `codespell -S benches,stopwords.rs -L womens,parth,abd,childs,ond,ser,ue,mot,hel,atleast,pris,claus,allo`	2025-12-01 12:15:41 +01:00
Raphaël Cohen	f7f4b354d6	fix: Handle phrase prefixed with star (#2751 ) Signed-off-by: Darkheir <raphael.cohen@sekoia.io>	2025-12-01 11:43:25 +01:00
Paul Masurel	25d44fcec8	Revert "remove unused columnar api (#2742 )" (#2748 ) * Revert "remove unused columnar api (#2742)" This reverts commit `8725594d47`. * Clippy comment + removing fill_vals --------- Co-authored-by: Paul Masurel <paul.masurel@datadoghq.com>	2025-11-26 17:44:02 +01:00
PSeitz-dd	842fe9295f	split Term in Term and IndexingTerm (#2744 ) * split Term in Term and IndexingTerm * add append_json_path to JsonTermSerializer	2025-11-26 16:48:59 +01:00
Paul Masurel	f88b7200b2	Optimization when posting list are saturated. (#2745 ) * Optimization when posting list are saturated. If a posting list doc freq is the segment reader's max_doc, and if scoring does not matter, we can replace it by a AllScorer. In turn, in a boolean query, we can dismiss all scorers and empty scorers, to accelerate the request. * Added range query optimization * CR comment * CR comments * CR comment --------- Co-authored-by: Paul Masurel <paul.masurel@datadoghq.com>	2025-11-26 15:50:57 +01:00
PSeitz-dd	8725594d47	remove unused columnar api (#2742 )	2025-11-21 18:07:25 +01:00
PSeitz	43a784671a	clippy (#2741 ) Co-authored-by: Pascal Seitz <pascal.seitz@datadoghq.com>	2025-11-21 18:07:03 +01:00
Paul Masurel	c363bbd23d	Optimize term aggregation with low cardinality + some refactoring (#2740 ) This introduce an optimization of top level term aggregation on field with a low cardinality. We then use a Vec as the underlying map. In addition, we buffer subaggregations. --------- Co-authored-by: Pascal Seitz <pascal.seitz@datadoghq.com> Co-authored-by: Paul Masurel <paul@quickwit.io>	2025-11-21 14:46:29 +01:00
Moe	70e591e230	feat: added filter aggregation (#2711 ) * Initial impl * Added `Filter` impl in `build_single_agg_segment_collector_with_reader` + Added tests * Added `Filter(FilterBucketResult)` + Made tests work. * Fixed type issues. * Fixed a test. * 8a7a73a: Pass `segment_reader` * Added more tests. * Improved parsing + tests * refactoring * Added more tests. * refactoring: moved parsing code under QueryParser * Use Tantivy syntax instead of ES * Added a sanity check test. * Simplified impl + tests * Added back tests in a more maintable way * nitz. * nitz * implemented very simple fast-path * improved a comment * implemented fast field support * Used `BoundsRange` * Improved fast field impl + tests * Simplified execution. * Fixed exports + nitz * Improved the tests to check to the expected result. * Improved test by checking the whole result JSON * Removed brittle perf checks. * Added efficiency verification tests. * Added one more efficiency check test. * Improved the efficiency tests. * Removed unnecessary parsing code + added direct Query obj * Fixed tests. * Improved tests * Fixed code structure * Fixed lint issues * nitz. * nitz * nitz. * nitz. * nitz. * Added an example * Fixed PR comments. * Applied PR comments + nitz * nitz. * Improved the code. * Fixed a perf issue. * Added batch processing. * Made the example more interesting * Fixed bucket count * Renamed Direct to CustomQuery * Fixed lint issues. * No need for scorer to be an `Option` * nitz * Used BitSet * Added an optimization for AllQuery * Fixed merge issues. * Fixed lint issues. * Added benchmark for FILTER * Removed the Option wrapper. * nitz. * Applied PR comments. * Fixed the AllQuery optimization * Applied PR comments. * feat: used `erased_serde` to allow filter query to be serialized * further improved a comment * Added back tests. * removed an unused method * removed an unused method * Added documentation * nitz. * Added query builder. * Fixed a comment. * Applied PR comments. * Fixed doctest issues. * Added ser/de * Removed bench in test * Fixed a lint issue.	2025-11-18 20:54:31 +01:00
Arthur	5277367cb0	remove duplicated call to `index_writer.commit()` in example (#2732 )	2025-11-12 14:52:44 +01:00
Paul Masurel	8b02bff9b8	Removing obsolete benchmark screenshot (#2730 ) Co-authored-by: Paul Masurel <paul.masurel@datadoghq.com>	2025-11-05 09:55:13 +01:00
PSeitz	60225bdd45	cleanup (#2724 ) Co-authored-by: Pascal Seitz <pascal.seitz@datadoghq.com>	2025-10-23 10:23:34 +02:00
PSeitz	938bfec8b7	use FxHashMap for Aggregations Request (#2722 ) Co-authored-by: Pascal Seitz <pascal.seitz@datadoghq.com>	2025-10-21 15:59:18 +02:00
PSeitz	dabcaa5809	fix merge intermediate aggregation results (#2719 ) Previously the merging relied on the order of the results, which is invalid since https://github.com/quickwit-oss/tantivy/pull/2035. This bug is only hit in specific scenarios, when the aggregation collectors are built in a different order on different segments. Co-authored-by: Pascal Seitz <pascal.seitz@datadoghq.com>	2025-10-17 12:41:31 +02:00
PSeitz	d410a3b0c0	Add Filtering for Term Aggregations (#2717 ) * Add Filtering for Term Aggregations Closes #2702 * add AggregationsSegmentCtx memory consumption --------- Co-authored-by: Pascal Seitz <pascal.seitz@datadoghq.com>	2025-10-15 17:39:53 +02:00
Remi	fc93391d0e	Minor clarifications on the AggregationsWithAccessor refacto (#2716 )	2025-10-14 19:59:33 +02:00
PSeitz	f8e79271ab	Replace AggregationsWithAccessor (#2715 ) * add nested histogram-termagg benchmark * Replace AggregationsWithAccessor with AggData With AggregationsWithAccessor pre-computation and caching was done on the collector level. If you have 10000 sub collectors (e.g. a term aggregation with sub aggregations) this is very inefficient. `AggData` instead moves the data from the collector to a node which reflects the cardinality of the request tree instead of the cardinality of the segment collector. It also moves the global struct shared with all aggregations in to aggregation specific structs. So each aggregation has its own space to store cached data and aggregation specific information. This also breaks up the dependency to the elastic search aggregation structure somewhat. Due to lifetime issues, we move the agg request specific object out of `AggData` during the collection and move it back at the end (for now). That's some unnecessary work, which costs CPU. This allows better caching and will also pave the way for another potential optimization, by separating the collector and its storage. Currently we allocate a new collector for each sub aggregation bucket (for nested aggregations), but ideally we would have just one collector instance. * renames * move request data to agg request files --------- Co-authored-by: Pascal Seitz <pascal.seitz@datadoghq.com>	2025-10-14 09:22:11 +02:00
PSeitz	33835b6a01	Add DocSet::cost() (#2707 ) * query: add DocSet cost hint and use it for intersection ordering - Add DocSet::cost() - Use cost() instead of size_hint() to order scorers in intersect_scorers This isolates cost-related changes without the new seek APIs from PR #2538 * add comments --------- Co-authored-by: Pascal Seitz <pascal.seitz@datadoghq.com>	2025-10-13 16:25:49 +02:00
PSeitz	270ca5123c	refactor postings (#2709 ) rename shallow_seek to seek_block remove full_block from public postings API This is as preparation to optionally handle Bitsets in the postings	2025-10-08 16:55:25 +02:00
Mustafa S. Moiz	714366d3b9	docs: correct grammar (#2704 ) Correct phrasing for a single line in the docs (`one documents` -> `a document`).	2025-10-08 16:47:09 +02:00
PSeitz-dd	40659d4d07	improve naming in buffered_union (#2705 )	2025-09-24 10:58:46 +02:00
PSeitz	e1e131a804	add and/or queries benchmark (#2701 )	2025-09-22 16:32:49 +02:00
PSeitz-dd	70da310b2d	perf: deduplicate queries (#2698 ) * deduplicate queries Deduplicate queries in the UserInputAst after parsing queries * add return type	2025-09-22 12:16:58 +02:00
PSeitz	85010b589a	clippy (#2700 ) * clippy * clippy * clippy * clippy + fmt --------- Co-authored-by: Pascal Seitz <pascal.seitz@datadoghq.com>	2025-09-19 18:04:25 +02:00
PSeitz-dd	2340dca628	fix compiler warnings (#2699 ) * fix compiler warnings * fix import	2025-09-19 15:55:04 +02:00
Remi	71a26d5b24	Fix CI with rust 1.90 (#2696 ) * Empty commit * Fix dead code lint error	2025-09-18 23:06:33 +02:00
PSeitz-dd	203751f2fe	Optimize ExistsQuery for a high number of dynamic columns (#2694 ) * Optimize ExistsQuery for a high number of dynamic columns The previous algorithm checked _each_ doc in _each_ column for existence. This causes huge cost on JSON fields with e.g. 100k columns. Compute a bitset instead if we have more than one column. add `iter_docs` to the multivalued_index * add benchmark subfields=1 exists_json_union Memory: 89.3 KB (+2.01%) Avg: 0.4865ms (-26.03%) Median: 0.4865ms (-26.03%) [0.4865ms .. 0.4865ms] subfields=2 exists_json_union Memory: 68.1 KB Avg: 1.7048ms (-0.46%) Median: 1.7048ms (-0.46%) [1.7048ms .. 1.7048ms] subfields=3 exists_json_union Memory: 61.8 KB Avg: 2.0742ms (-2.22%) Median: 2.0742ms (-2.22%) [2.0742ms .. 2.0742ms] subfields=4 exists_json_union Memory: 119.8 KB (+103.44%) Avg: 3.9500ms (+42.62%) Median: 3.9500ms (+42.62%) [3.9500ms .. 3.9500ms] subfields=5 exists_json_union Memory: 120.4 KB (+107.65%) Avg: 3.9610ms (+20.65%) Median: 3.9610ms (+20.65%) [3.9610ms .. 3.9610ms] subfields=6 exists_json_union Memory: 120.6 KB (+107.49%) Avg: 3.8903ms (+3.11%) Median: 3.8903ms (+3.11%) [3.8903ms .. 3.8903ms] subfields=7 exists_json_union Memory: 120.9 KB (+106.93%) Avg: 3.6220ms (-16.22%) Median: 3.6220ms (-16.22%) [3.6220ms .. 3.6220ms] subfields=8 exists_json_union Memory: 121.3 KB (+106.23%) Avg: 4.0981ms (-15.97%) Median: 4.0981ms (-15.97%) [4.0981ms .. 4.0981ms] subfields=16 exists_json_union Memory: 123.1 KB (+103.09%) Avg: 4.3483ms (-92.26%) Median: 4.3483ms (-92.26%) [4.3483ms .. 4.3483ms] subfields=256 exists_json_union Memory: 204.6 KB (+19.85%) Avg: 3.8874ms (-99.01%) Median: 3.8874ms (-99.01%) [3.8874ms .. 3.8874ms] subfields=4096 exists_json_union Memory: 2.0 MB Avg: 3.5571ms (-99.90%) Median: 3.5571ms (-99.90%) [3.5571ms .. 3.5571ms] subfields=65536 exists_json_union Memory: 28.3 MB Avg: 14.4417ms (-99.97%) Median: 14.4417ms (-99.97%) [14.4417ms .. 14.4417ms] subfields=262144 exists_json_union Memory: 113.3 MB Avg: 66.2860ms (-99.95%) Median: 66.2860ms (-99.95%) [66.2860ms .. 66.2860ms] * rename methods	2025-09-16 18:21:03 +02:00
PSeitz-dd	7963b0b4aa	Add fast field fallback for term query if not indexed (#2693 ) * Add fast field fallback for term query if not indexed * only fallback without scores	2025-09-12 14:58:21 +02:00
Paul Masurel	d5eefca11d	Merge pull request #2692 from quickwit-oss/paul.masurel/coerce-floats-too-in-search-too This PR changes the logic used on the ingestion of floats.	2025-09-10 09:46:54 +02:00
Paul Masurel	5d6c8de23e	Align search float search logic to the columnar coercion rules It applies the same logic on floats as for u64 or i64. In all case, the idea is (for the inverted index) to coerce number to their canonical representation, before indexing and before searching. That way a document with the float 1.0 will be searchable when the user searches for 1. Note that contrary to the columnar, we do not attempt to coerce all of the terms associated to a given json path to a single numerical type. We simply rely on this "point-wise" canonicalization.	2025-09-09 19:28:17 +02:00
PSeitz	a06365f39f	Update CHANGELOG.md for bugfixes (#2674 ) * Update CHANGELOG.md * Update CHANGELOG.md	2025-09-04 11:51:00 +02:00
Raphaël Cohen	f4b374110f	feat: Regex query grammar (#2677 ) * feat: Regex query grammar * feat: Disable regexes by default * chore: Apply formatting	2025-09-03 10:07:04 +02:00
PSeitz-dd	c37af9c1ff	update release instructions (#2687 )	2025-08-22 07:57:48 +08:00
PSeitz	33794a114c	chore: Release (#2686 ) Co-authored-by: Pascal Seitz <pascal.seitz@datadoghq.com>	2025-08-20 18:29:37 +08:00