pub method on Term

allow Searcher to be constructed without index
add to_json->Value method
2026-05-05 10:50:39 +00:00 · 2026-02-24 13:31:51 +01:00 · 2026-02-17 17:56:33 +01:00 · 2026-02-17 13:21:23 +01:00 · 2026-02-16 17:33:49 +01:00 · 2026-02-16 10:32:32 +01:00
250 changed files with 14929 additions and 6497 deletions
--- a/.claude/skills/rationalize-deps/SKILL.md
+++ b/.claude/skills/rationalize-deps/SKILL.md
@@ -0,0 +1,125 @@
+---
+name: rationalize-deps
+description: Analyze Cargo.toml dependencies and attempt to remove unused features to reduce compile times and binary size
+---
+
+# Rationalize Dependencies
+
+This skill analyzes Cargo.toml dependencies to identify and remove unused features.
+
+## Overview
+
+Many crates enable features by default that may not be needed. This skill:
+1. Identifies dependencies with default features enabled
+2. Tests if `default-features = false` works
+3. Identifies which specific features are actually needed
+4. Verifies compilation after changes
+
+## Step 1: Identify the target
+
+Ask the user which crate(s) to analyze:
+- A specific crate name (e.g., "tokio", "serde")
+- A specific workspace member (e.g., "quickwit-search")
+- "all" to scan the entire workspace
+
+## Step 2: Analyze current dependencies
+
+For the workspace Cargo.toml (`quickwit/Cargo.toml`), list dependencies that:
+- Do NOT have `default-features = false`
+- Have default features that might be unnecessary
+
+Run: `cargo tree -p <crate> -f "{p} {f}" --edges features` to see what features are actually used.
+
+## Step 3: For each candidate dependency
+
+### 3a: Check the crate's default features
+
+Look up the crate on crates.io or check its Cargo.toml to understand:
+- What features are enabled by default
+- What each feature provides
+
+Use: `cargo metadata --format-version=1 | jq '.packages[] | select(.name == "<crate>") | .features'`
+
+### 3b: Try disabling default features
+
+Modify the dependency in `quickwit/Cargo.toml`:
+
+From:
+```toml
+some-crate = { version = "1.0" }
+```
+
+To:
+```toml
+some-crate = { version = "1.0", default-features = false }
+```
+
+### 3c: Run cargo check
+
+Run: `cargo check --workspace` (or target specific packages for faster feedback)
+
+If compilation fails:
+1. Read the error messages to identify which features are needed
+2. Add only the required features explicitly:
+   ```toml
+   some-crate = { version = "1.0", default-features = false, features = ["needed-feature"] }
+   ```
+3. Re-run cargo check
+
+### 3d: Binary search for minimal features
+
+If there are many default features, use binary search:
+1. Start with no features
+2. If it fails, add half the default features
+3. Continue until you find the minimal set
+
+## Step 4: Document findings
+
+For each dependency analyzed, report:
+- Original configuration
+- New configuration (if changed)
+- Features that were removed
+- Any features that are required
+
+## Step 5: Verify full build
+
+After all changes, run:
+```bash
+cargo check --workspace --all-targets
+cargo test --workspace --no-run
+```
+
+## Common Patterns
+
+### Serde
+Often only needs `derive`:
+```toml
+serde = { version = "1.0", default-features = false, features = ["derive", "std"] }
+```
+
+### Tokio
+Identify which runtime features are actually used:
+```toml
+tokio = { version = "1.0", default-features = false, features = ["rt-multi-thread", "macros", "sync"] }
+```
+
+### Reqwest
+Often doesn't need all TLS backends:
+```toml
+reqwest = { version = "0.11", default-features = false, features = ["rustls-tls", "json"] }
+```
+
+## Rollback
+
+If changes cause issues:
+```bash
+git checkout quickwit/Cargo.toml
+cargo check --workspace
+```
+
+## Tips
+
+- Start with large crates that have many default features (tokio, reqwest, hyper)
+- Use `cargo bloat --crates` to identify large dependencies
+- Check `cargo tree -d` for duplicate dependencies that might indicate feature conflicts
+- Some features are needed only for tests - consider using `[dev-dependencies]` features
--- a/.claude/skills/simple-pr/SKILL.md
+++ b/.claude/skills/simple-pr/SKILL.md
@@ -0,0 +1,60 @@
+---
+name: simple-pr
+description: Create a simple PR from staged changes with an auto-generated commit message
+disable-model-invocation: true
+---
+
+# Simple PR
+
+Follow these steps to create a simple PR from staged changes:
+
+## Step 1: Check workspace state
+
+Run: `git status`
+
+Verify that all changes have been staged (no unstaged changes). If there are unstaged changes, abort and ask the user to stage their changes first with `git add`.
+
+Also verify that we are on the `main` branch. If not, abort and ask the user to switch to main first.
+
+## Step 2: Ensure main is up to date
+
+Run: `git pull origin main`
+
+This ensures we're working from the latest code.
+
+## Step 3: Review staged changes
+
+Run: `git diff --cached`
+
+Review the staged changes to understand what the PR will contain.
+
+## Step 4: Generate commit message
+
+Based on the staged changes, generate a concise commit message (1-2 sentences) that describes the "why" rather than the "what".
+
+Display the proposed commit message to the user and ask for confirmation before proceeding.
+
+## Step 5: Create a new branch
+
+Get the git username: `git config user.name | tr ' ' '-' | tr '[:upper:]' '[:lower:]'`
+
+Create a short, descriptive branch name based on the changes (e.g., `fix-typo-in-readme`, `add-retry-logic`, `update-deps`).
+
+Create and checkout the branch: `git checkout -b {username}/{short-descriptive-name}`
+
+## Step 6: Commit changes
+
+Commit with the message from step 3:
+```
+git commit -m "{commit-message}"
+```
+
+## Step 7: Push and open a PR
+
+Push the branch and open a PR:
+```
+git push -u origin {branch-name}
+gh pr create --title "{commit-message-title}" --body "{longer-description-if-needed}"
+```
+
+Report the PR URL to the user when complete.
--- a/.github/workflows/coverage.yml
+++ b/.github/workflows/coverage.yml
@@ -15,11 +15,11 @@ jobs:
    steps:
      - uses: actions/checkout@v4
      - name: Install Rust
-        run: rustup toolchain install nightly-2024-07-01 --profile minimal --component llvm-tools-preview
+        run: rustup toolchain install nightly-2025-12-01 --profile minimal --component llvm-tools-preview
      - uses: Swatinem/rust-cache@v2
      - uses: taiki-e/install-action@cargo-llvm-cov
      - name: Generate code coverage
-        run: cargo +nightly-2024-07-01 llvm-cov --all-features --workspace --doctests --lcov --output-path lcov.info
+        run: cargo +nightly-2025-12-01 llvm-cov --all-features --workspace --doctests --lcov --output-path lcov.info
      - name: Upload coverage to Codecov
        uses: codecov/codecov-action@v3
        continue-on-error: true
--- a/.github/workflows/test.yml
+++ b/.github/workflows/test.yml
@@ -39,11 +39,11 @@ jobs:

    - name: Check Formatting
      run: cargo +nightly fmt --all -- --check
-    
+
    - name: Check Stable Compilation
      run: cargo build --all-features

-    
+
    - name: Check Bench Compilation
      run: cargo +nightly bench --no-run --profile=dev --all-features

@@ -59,10 +59,10 @@ jobs:

    strategy:
      matrix:
-        features: [
-            { label: "all", flags: "mmap,stopwords,lz4-compression,zstd-compression,failpoints" },
-            { label: "quickwit", flags: "mmap,quickwit,failpoints" }
-        ]
+        features:
+          - { label: "all", flags: "mmap,stopwords,lz4-compression,zstd-compression,failpoints,stemmer" }
+          - { label: "quickwit", flags: "mmap,quickwit,failpoints" }
+          - { label: "none", flags: "" }

    name: test-${{ matrix.features.label}}

@@ -80,7 +80,21 @@ jobs:
    - uses: Swatinem/rust-cache@v2

    - name: Run tests
-      run: cargo +stable nextest run --features ${{ matrix.features.flags }} --verbose --workspace
+      run: |
+        # if matrix.feature.flags is empty then run on --lib to avoid compiling examples
+        # (as most of them rely on mmap) otherwise run all
+        if [ -z "${{ matrix.features.flags }}" ]; then
+          cargo +stable nextest run --lib --no-default-features --verbose --workspace
+        else
+          cargo +stable nextest run --features ${{ matrix.features.flags }} --no-default-features --verbose --workspace
+        fi

    - name: Run doctests
-      run: cargo +stable test --doc --features ${{ matrix.features.flags }} --verbose --workspace
+      run: |
+        # if matrix.feature.flags is empty then run on --lib to avoid compiling examples
+        # (as most of them rely on mmap) otherwise run all
+        if [ -z "${{ matrix.features.flags }}" ]; then
+          echo "no doctest for no feature flag"
+        else
+          cargo +stable test --doc --features ${{ matrix.features.flags }} --verbose --workspace
+        fi
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -78,7 +78,7 @@ This will slightly increase space and access time. [#2439](https://github.com/qu

 - **Store DateTime as nanoseconds in doc store** DateTime in the doc store was truncated to microseconds previously. This removes this truncation, while still keeping backwards compatibility. [#2486](https://github.com/quickwit-oss/tantivy/pull/2486)(@PSeitz)

- **Performace/Memory**
+- **Performance/Memory**
    - lift clauses in LogicalAst for optimized ast during execution [#2449](https://github.com/quickwit-oss/tantivy/pull/2449)(@PSeitz)
    - Use Vec instead of BTreeMap to back OwnedValue object [#2364](https://github.com/quickwit-oss/tantivy/pull/2364)(@fulmicoton)
    - Replace TantivyDocument with CompactDoc. CompactDoc is much smaller and provides similar performance. [#2402](https://github.com/quickwit-oss/tantivy/pull/2402)(@PSeitz)
--- a/Cargo.toml
+++ b/Cargo.toml
@@ -1,6 +1,6 @@
 [package]
 name = "tantivy"
-version = "0.25.0"
+version = "0.26.0"
 authors = ["Paul Masurel <paul.masurel@gmail.com>"]
 license = "MIT"
 categories = ["database-implementations", "data-structures"]
@@ -15,7 +15,7 @@ rust-version = "1.85"
 exclude = ["benches/*.json", "benches/*.txt"]

 [dependencies]
-oneshot = "0.1.7"
+oneshot = "0.1.13"
 base64 = "0.22.0"
 byteorder = "1.4.3"
 crc32fast = "1.3.2"
@@ -27,7 +27,7 @@ regex = { version = "1.5.5", default-features = false, features = [
 aho-corasick = "1.0"
 tantivy-fst = "0.5"
 memmap2 = { version = "0.9.0", optional = true }
-lz4_flex = { version = "0.11", default-features = false, optional = true }
+lz4_flex = { version = "0.12", default-features = false, optional = true }
 zstd = { version = "0.13", optional = true, default-features = false }
 tempfile = { version = "3.12.0", optional = true }
 log = "0.4.16"
@@ -37,9 +37,9 @@ fs4 = { version = "0.13.1", optional = true }
 levenshtein_automata = "0.2.1"
 uuid = { version = "1.0.0", features = ["v4", "serde"] }
 crossbeam-channel = "0.5.4"
-rust-stemmers = "1.2.0"
+rust-stemmers = { version = "1.2.0", optional = true }
 downcast-rs = "2.0.1"
-bitpacking = { version = "0.9.2", default-features = false, features = [
+bitpacking = { version = "0.9.3", default-features = false, features = [
    "bitpacker4x",
 ] }
 census = "0.4.2"
@@ -50,7 +50,7 @@ fail = { version = "0.5.0", optional = true }
 time = { version = "0.3.35", features = ["serde-well-known"] }
 smallvec = "1.8.0"
 rayon = "1.5.2"
-lru = "0.12.0"
+lru = "0.16.3"
 fastdivide = "0.4.0"
 itertools = "0.14.0"
 measure_time = "0.9.0"
@@ -75,17 +75,17 @@ typetag = "0.2.21"
 winapi = "0.3.9"

 [dev-dependencies]
-binggan = "0.14.0"
-rand = "0.8.5"
+binggan = "0.14.2"
+rand = "0.9"
 maplit = "1.0.2"
 matches = "0.1.9"
 pretty_assertions = "1.2.1"
-proptest = "1.0.0"
+proptest = "1.7.0"
 test-log = "0.2.10"
 futures = "0.3.21"
 paste = "1.0.11"
 more-asserts = "0.3.1"
-rand_distr = "0.4.3"
+rand_distr = "0.5"
 time = { version = "0.3.10", features = ["serde-well-known", "macros"] }
 postcard = { version = "1.0.4", features = [
    "use-std",
@@ -113,7 +113,8 @@ debug-assertions = true
 overflow-checks = true

 [features]
-default = ["mmap", "stopwords", "lz4-compression", "columnar-zstd-compression"]
+default = ["mmap", "stopwords", "lz4-compression", "columnar-zstd-compression", "stemmer"]
+stemmer = ["rust-stemmers"]
 mmap = ["fs4", "tempfile", "memmap2"]
 stopwords = []

@@ -173,6 +174,31 @@ harness = false
 name = "exists_json"
 harness = false

+[[bench]]
+name = "range_query"
+harness = false
+
 [[bench]]
 name = "and_or_queries"
 harness = false
+
+[[bench]]
+name = "range_queries"
+harness = false
+
+[[bench]]
+name = "bool_queries_with_range"
+harness = false
+
+[[bench]]
+name = "str_search_and_get"
+harness = false
+
+[[bench]]
+name = "merge_segments"
+harness = false
+
+[[bench]]
+name = "regex_all_terms"
+harness = false
+
--- a/README.md
+++ b/README.md
@@ -123,6 +123,7 @@ You can also find other bindings on [GitHub](https://github.com/search?q=tantivy
 - [seshat](https://github.com/matrix-org/seshat/): A matrix message database/indexer
 - [tantiny](https://github.com/baygeldin/tantiny): Tiny full-text search for Ruby
 - [lnx](https://github.com/lnx-search/lnx): adaptable, typo tolerant search engine with a REST API
+- [Bichon](https://github.com/rustmailer/bichon): A lightweight, high-performance Rust email archiver with WebUI
 - and [more](https://github.com/search?q=tantivy)!

 ### On average, how much faster is Tantivy compared to Lucene?
--- a/TODO.txt
+++ b/TODO.txt
@@ -10,7 +10,7 @@ rename FastFieldReaders::open to load
 remove fast field reader

 find a way to unify the two DateTime.
-readd type check in the filter wrapper
+re-add type check in the filter wrapper

 add unit test on columnar list columns.

--- a/benches/agg_bench.rs
+++ b/benches/agg_bench.rs
@@ -1,7 +1,8 @@
 use binggan::plugins::PeakMemAllocPlugin;
 use binggan::{black_box, InputGroup, PeakMemAlloc, INSTRUMENTED_SYSTEM};
-use rand::prelude::SliceRandom;
+use rand::distr::weighted::WeightedIndex;
 use rand::rngs::StdRng;
+use rand::seq::IndexedRandom;
 use rand::{Rng, SeedableRng};
 use rand_distr::Distribution;
 use serde_json::json;
@@ -53,25 +54,33 @@ fn bench_agg(mut group: InputGroup<Index>) {
    register!(group, stats_f64);
    register!(group, extendedstats_f64);
    register!(group, percentiles_f64);
-    register!(group, terms_few);
-    register!(group, terms_many);
+    register!(group, terms_7);
+    register!(group, terms_all_unique);
+    register!(group, terms_150_000);
    register!(group, terms_many_top_1000);
    register!(group, terms_many_order_by_term);
    register!(group, terms_many_with_top_hits);
+    register!(group, terms_all_unique_with_avg_sub_agg);
    register!(group, terms_many_with_avg_sub_agg);
+    register!(group, terms_status_with_avg_sub_agg);
+    register!(group, terms_status_with_histogram);
+    register!(group, terms_zipf_1000);
+    register!(group, terms_zipf_1000_with_histogram);
+    register!(group, terms_zipf_1000_with_avg_sub_agg);
+
    register!(group, terms_many_json_mixed_type_with_avg_sub_agg);

    register!(group, cardinality_agg);
-    register!(group, terms_few_with_cardinality_agg);
+    register!(group, terms_status_with_cardinality_agg);

    register!(group, range_agg);
    register!(group, range_agg_with_avg_sub_agg);
-    register!(group, range_agg_with_term_agg_few);
+    register!(group, range_agg_with_term_agg_status);
    register!(group, range_agg_with_term_agg_many);
    register!(group, histogram);
    register!(group, histogram_hard_bounds);
    register!(group, histogram_with_avg_sub_agg);
-    register!(group, histogram_with_term_agg_few);
+    register!(group, histogram_with_term_agg_status);
    register!(group, avg_and_range_with_avg_sub_agg);

    // Filter aggregation benchmarks
@@ -130,12 +139,12 @@ fn extendedstats_f64(index: &Index) {
 }
 fn percentiles_f64(index: &Index) {
    let agg_req = json!({
-      "mypercentiles": {
-        "percentiles": {
-          "field": "score_f64",
-          "percents": [ 95, 99, 99.9 ]
+        "mypercentiles": {
+            "percentiles": {
+                "field": "score_f64",
+                "percents": [ 95, 99, 99.9 ]
+            }
        }
-      }
    });
    execute_agg(index, agg_req);
 }
@@ -150,10 +159,10 @@ fn cardinality_agg(index: &Index) {
    });
    execute_agg(index, agg_req);
 }
-fn terms_few_with_cardinality_agg(index: &Index) {
+fn terms_status_with_cardinality_agg(index: &Index) {
    let agg_req = json!({
        "my_texts": {
-            "terms": { "field": "text_few_terms" },
+            "terms": { "field": "text_few_terms_status" },
            "aggs": {
                "cardinality": {
                    "cardinality": {
@@ -166,13 +175,20 @@ fn terms_few_with_cardinality_agg(index: &Index) {
    execute_agg(index, agg_req);
 }

-fn terms_few(index: &Index) {
+fn terms_7(index: &Index) {
    let agg_req = json!({
-        "my_texts": { "terms": { "field": "text_few_terms" } },
+        "my_texts": { "terms": { "field": "text_few_terms_status" } },
    });
    execute_agg(index, agg_req);
 }
-fn terms_many(index: &Index) {
+fn terms_all_unique(index: &Index) {
+    let agg_req = json!({
+        "my_texts": { "terms": { "field": "text_all_unique_terms" } },
+    });
+    execute_agg(index, agg_req);
+}
+
+fn terms_150_000(index: &Index) {
    let agg_req = json!({
        "my_texts": { "terms": { "field": "text_many_terms" } },
    });
@@ -220,6 +236,72 @@ fn terms_many_with_avg_sub_agg(index: &Index) {
    });
    execute_agg(index, agg_req);
 }
+fn terms_all_unique_with_avg_sub_agg(index: &Index) {
+    let agg_req = json!({
+        "my_texts": {
+            "terms": { "field": "text_all_unique_terms" },
+            "aggs": {
+                "average_f64": { "avg": { "field": "score_f64" } }
+            }
+        },
+    });
+    execute_agg(index, agg_req);
+}
+fn terms_status_with_histogram(index: &Index) {
+    let agg_req = json!({
+        "my_texts": {
+            "terms": { "field": "text_few_terms_status" },
+            "aggs": {
+                "histo": {"histogram": { "field": "score_f64", "interval": 10 }}
+            }
+        }
+    });
+    execute_agg(index, agg_req);
+}
+
+fn terms_zipf_1000_with_histogram(index: &Index) {
+    let agg_req = json!({
+        "my_texts": {
+            "terms": { "field": "text_1000_terms_zipf" },
+            "aggs": {
+                "histo": {"histogram": { "field": "score_f64", "interval": 10 }}
+            }
+        }
+    });
+    execute_agg(index, agg_req);
+}
+
+fn terms_status_with_avg_sub_agg(index: &Index) {
+    let agg_req = json!({
+        "my_texts": {
+            "terms": { "field": "text_few_terms_status" },
+            "aggs": {
+                "average_f64": { "avg": { "field": "score_f64" } }
+            }
+        },
+    });
+    execute_agg(index, agg_req);
+}
+
+fn terms_zipf_1000_with_avg_sub_agg(index: &Index) {
+    let agg_req = json!({
+        "my_texts": {
+            "terms": { "field": "text_1000_terms_zipf" },
+            "aggs": {
+                "average_f64": { "avg": { "field": "score_f64" } }
+            }
+        },
+    });
+    execute_agg(index, agg_req);
+}
+
+fn terms_zipf_1000(index: &Index) {
+    let agg_req = json!({
+        "my_texts": { "terms": { "field": "text_1000_terms_zipf" } },
+    });
+    execute_agg(index, agg_req);
+}
+
 fn terms_many_json_mixed_type_with_avg_sub_agg(index: &Index) {
    let agg_req = json!({
        "my_texts": {
@@ -275,7 +357,7 @@ fn range_agg_with_avg_sub_agg(index: &Index) {
    execute_agg(index, agg_req);
 }

-fn range_agg_with_term_agg_few(index: &Index) {
+fn range_agg_with_term_agg_status(index: &Index) {
    let agg_req = json!({
        "rangef64": {
            "range": {
@@ -290,7 +372,7 @@ fn range_agg_with_term_agg_few(index: &Index) {
                ]
            },
            "aggs": {
-                "my_texts": { "terms": { "field": "text_few_terms" } },
+                "my_texts": { "terms": { "field": "text_few_terms_status" } },
            }
        },
    });
@@ -346,12 +428,12 @@ fn histogram_with_avg_sub_agg(index: &Index) {
    });
    execute_agg(index, agg_req);
 }
-fn histogram_with_term_agg_few(index: &Index) {
+fn histogram_with_term_agg_status(index: &Index) {
    let agg_req = json!({
        "rangef64": {
            "histogram": { "field": "score_f64", "interval": 10 },
            "aggs": {
-                "my_texts": { "terms": { "field": "text_few_terms" } }
+                "my_texts": { "terms": { "field": "text_few_terms_status" } }
            }
        }
    });
@@ -396,6 +478,13 @@ fn get_collector(agg_req: Aggregations) -> AggregationCollector {
 }

 fn get_test_index_bench(cardinality: Cardinality) -> tantivy::Result<Index> {
+    // Flag to use existing index
+    let reuse_index = std::env::var("REUSE_AGG_BENCH_INDEX").is_ok();
+    if reuse_index && std::path::Path::new("agg_bench").exists() {
+        return Index::open_in_dir("agg_bench");
+    }
+    // crreate dir
+    std::fs::create_dir_all("agg_bench")?;
    let mut schema_builder = Schema::builder();
    let text_fieldtype = tantivy::schema::TextOptions::default()
        .set_indexing_options(
@@ -404,20 +493,47 @@ fn get_test_index_bench(cardinality: Cardinality) -> tantivy::Result<Index> {
        .set_stored();
    let text_field = schema_builder.add_text_field("text", text_fieldtype);
    let json_field = schema_builder.add_json_field("json", FAST);
+    let text_field_all_unique_terms =
+        schema_builder.add_text_field("text_all_unique_terms", STRING | FAST);
    let text_field_many_terms = schema_builder.add_text_field("text_many_terms", STRING | FAST);
-    let text_field_few_terms = schema_builder.add_text_field("text_few_terms", STRING | FAST);
+    let text_field_few_terms_status =
+        schema_builder.add_text_field("text_few_terms_status", STRING | FAST);
+    let text_field_1000_terms_zipf =
+        schema_builder.add_text_field("text_1000_terms_zipf", STRING | FAST);
    let score_fieldtype = tantivy::schema::NumericOptions::default().set_fast();
    let score_field = schema_builder.add_u64_field("score", score_fieldtype.clone());
    let score_field_f64 = schema_builder.add_f64_field("score_f64", score_fieldtype.clone());
    let score_field_i64 = schema_builder.add_i64_field("score_i64", score_fieldtype);
-    let index = Index::create_from_tempdir(schema_builder.build())?;
-    let few_terms_data = ["INFO", "ERROR", "WARN", "DEBUG"];
+    // use tmp dir
+    let index = if reuse_index {
+        Index::create_in_dir("agg_bench", schema_builder.build())?
+    } else {
+        Index::create_from_tempdir(schema_builder.build())?
+    };
+    // Approximate log proportions
+    let status_field_data = [
+        ("INFO", 8000),
+        ("ERROR", 300),
+        ("WARN", 1200),
+        ("DEBUG", 500),
+        ("OK", 500),
+        ("CRITICAL", 20),
+        ("EMERGENCY", 1),
+    ];
+    let log_level_distribution =
+        WeightedIndex::new(status_field_data.iter().map(|item| item.1)).unwrap();

    let lg_norm = rand_distr::LogNormal::new(2.996f64, 0.979f64).unwrap();

    let many_terms_data = (0..150_000)
        .map(|num| format!("author{num}"))
        .collect::<Vec<_>>();
+
+    // Prepare 1000 unique terms sampled using a Zipf distribution.
+    // Exponent ~1.1 approximates top-20 terms covering around ~20%.
+    let terms_1000: Vec<String> = (1..=1000).map(|i| format!("term_{i}")).collect();
+    let zipf_1000 = rand_distr::Zipf::new(1000.0, 1.1f64).unwrap();
+
    {
        let mut rng = StdRng::from_seed([1u8; 32]);
        let mut index_writer = index.writer_with_num_threads(1, 200_000_000)?;
@@ -427,15 +543,25 @@ fn get_test_index_bench(cardinality: Cardinality) -> tantivy::Result<Index> {
            index_writer.add_document(doc!())?;
        }
        if cardinality == Cardinality::Multivalued {
+            let log_level_sample_a = status_field_data[log_level_distribution.sample(&mut rng)].0;
+            let log_level_sample_b = status_field_data[log_level_distribution.sample(&mut rng)].0;
+            let idx_a = zipf_1000.sample(&mut rng) as usize - 1;
+            let idx_b = zipf_1000.sample(&mut rng) as usize - 1;
+            let term_1000_a = &terms_1000[idx_a];
+            let term_1000_b = &terms_1000[idx_b];
            index_writer.add_document(doc!(
                json_field => json!({"mixed_type": 10.0}),
                json_field => json!({"mixed_type": 10.0}),
                text_field => "cool",
                text_field => "cool",
+                text_field_all_unique_terms => "cool",
+                text_field_all_unique_terms => "coolo",
                text_field_many_terms => "cool",
                text_field_many_terms => "cool",
-                text_field_few_terms => "cool",
-                text_field_few_terms => "cool",
+                text_field_few_terms_status => log_level_sample_a,
+                text_field_few_terms_status => log_level_sample_b,
+                text_field_1000_terms_zipf => term_1000_a.as_str(),
+                text_field_1000_terms_zipf => term_1000_b.as_str(),
                score_field => 1u64,
                score_field => 1u64,
                score_field_f64 => lg_norm.sample(&mut rng),
@@ -450,8 +576,8 @@ fn get_test_index_bench(cardinality: Cardinality) -> tantivy::Result<Index> {
        }
        let _val_max = 1_000_000.0;
        for _ in 0..doc_with_value {
-            let val: f64 = rng.gen_range(0.0..1_000_000.0);
-            let json = if rng.gen_bool(0.1) {
+            let val: f64 = rng.random_range(0.0..1_000_000.0);
+            let json = if rng.random_bool(0.1) {
                // 10% are numeric values
                json!({ "mixed_type": val })
            } else {
@@ -460,8 +586,10 @@ fn get_test_index_bench(cardinality: Cardinality) -> tantivy::Result<Index> {
            index_writer.add_document(doc!(
                text_field => "cool",
                json_field => json,
+                text_field_all_unique_terms => format!("unique_term_{}", rng.random::<u64>()),
                text_field_many_terms => many_terms_data.choose(&mut rng).unwrap().to_string(),
-                text_field_few_terms => few_terms_data.choose(&mut rng).unwrap().to_string(),
+                text_field_few_terms_status => status_field_data[log_level_distribution.sample(&mut rng)].0,
+                text_field_1000_terms_zipf => terms_1000[zipf_1000.sample(&mut rng) as usize - 1].as_str(),
                score_field => val as u64,
                score_field_f64 => lg_norm.sample(&mut rng),
                score_field_i64 => val as i64,
@@ -513,7 +641,7 @@ fn filter_agg_all_query_with_sub_aggs(index: &Index) {
                "avg_score": { "avg": { "field": "score" } },
                "stats_score": { "stats": { "field": "score_f64" } },
                "terms_text": {
-                    "terms": { "field": "text_few_terms" }
+                    "terms": { "field": "text_few_terms_status" }
                }
            }
        }
@@ -529,7 +657,7 @@ fn filter_agg_term_query_with_sub_aggs(index: &Index) {
                "avg_score": { "avg": { "field": "score" } },
                "stats_score": { "stats": { "field": "score_f64" } },
                "terms_text": {
-                    "terms": { "field": "text_few_terms" }
+                    "terms": { "field": "text_few_terms_status" }
                }
            }
        }
--- a/benches/and_or_queries.rs
+++ b/benches/and_or_queries.rs
@@ -16,14 +16,15 @@
 // - This bench isolates boolean iteration speed and intersection/union cost.
 // - Use `cargo bench --bench boolean_conjunction` to run.

-use binggan::{black_box, BenchRunner};
+use binggan::{black_box, BenchGroup, BenchRunner};
 use rand::prelude::*;
 use rand::rngs::StdRng;
 use rand::SeedableRng;
-use tantivy::collector::{Count, TopDocs};
-use tantivy::query::QueryParser;
-use tantivy::schema::{Schema, TEXT};
-use tantivy::{doc, Index, ReloadPolicy, Searcher};
+use tantivy::collector::sort_key::SortByStaticFastValue;
+use tantivy::collector::{Collector, Count, TopDocs};
+use tantivy::query::{Query, QueryParser};
+use tantivy::schema::{Schema, FAST, TEXT};
+use tantivy::{doc, Index, Order, ReloadPolicy, Searcher};

 #[derive(Clone)]
 struct BenchIndex {
@@ -33,23 +34,6 @@ struct BenchIndex {
    query_parser: QueryParser,
 }

-impl BenchIndex {
-    #[inline(always)]
-    fn count_query(&self, query_str: &str) -> usize {
-        let query = self.query_parser.parse_query(query_str).unwrap();
-        self.searcher.search(&query, &Count).unwrap()
-    }
-
-    #[inline(always)]
-    fn topk_len(&self, query_str: &str, k: usize) -> usize {
-        let query = self.query_parser.parse_query(query_str).unwrap();
-        self.searcher
-            .search(&query, &TopDocs::with_limit(k))
-            .unwrap()
-            .len()
-    }
-}
-
 /// Build a single index containing both fields (title, body) and
 /// return two BenchIndex views:
 /// - single_field: QueryParser defaults to only "body"
@@ -59,6 +43,8 @@ fn build_shared_indices(num_docs: usize, p_a: f32, p_b: f32, p_c: f32) -> (Bench
    let mut schema_builder = Schema::builder();
    let f_title = schema_builder.add_text_field("title", TEXT);
    let f_body = schema_builder.add_text_field("body", TEXT);
+    let f_score = schema_builder.add_u64_field("score", FAST);
+    let f_score2 = schema_builder.add_u64_field("score2", FAST);
    let schema = schema_builder.build();
    let index = Index::create_in_ram(schema.clone());

@@ -67,29 +53,31 @@ fn build_shared_indices(num_docs: usize, p_a: f32, p_b: f32, p_c: f32) -> (Bench

    // Populate: spread each present token 90/10 to body/title
    {
-        let mut writer = index.writer(500_000_000).unwrap();
+        let mut writer = index.writer_with_num_threads(1, 500_000_000).unwrap();
        for _ in 0..num_docs {
-            let has_a = rng.gen_bool(p_a as f64);
-            let has_b = rng.gen_bool(p_b as f64);
-            let has_c = rng.gen_bool(p_c as f64);
+            let has_a = rng.random_bool(p_a as f64);
+            let has_b = rng.random_bool(p_b as f64);
+            let has_c = rng.random_bool(p_c as f64);
+            let score = rng.random_range(0u64..100u64);
+            let score2 = rng.random_range(0u64..100_000u64);
            let mut title_tokens: Vec<&str> = Vec::new();
            let mut body_tokens: Vec<&str> = Vec::new();
            if has_a {
-                if rng.gen_bool(0.1) {
+                if rng.random_bool(0.1) {
                    title_tokens.push("a");
                } else {
                    body_tokens.push("a");
                }
            }
            if has_b {
-                if rng.gen_bool(0.1) {
+                if rng.random_bool(0.1) {
                    title_tokens.push("b");
                } else {
                    body_tokens.push("b");
                }
            }
            if has_c {
-                if rng.gen_bool(0.1) {
+                if rng.random_bool(0.1) {
                    title_tokens.push("c");
                } else {
                    body_tokens.push("c");
@@ -101,7 +89,9 @@ fn build_shared_indices(num_docs: usize, p_a: f32, p_b: f32, p_c: f32) -> (Bench
            writer
                .add_document(doc!(
                    f_title=>title_tokens.join(" "),
-                    f_body=>body_tokens.join(" ")
+                    f_body=>body_tokens.join(" "),
+                    f_score=>score,
+                    f_score2=>score2,
                ))
                .unwrap();
        }
@@ -153,72 +143,76 @@ fn main() {
        ),
    ];

+    let queries = &["a", "+a +b", "+a +b +c", "a OR b", "a OR b OR c"];
+
    let mut runner = BenchRunner::new();
    for (label, n, pa, pb, pc) in scenarios {
        let (single_view, multi_view) = build_shared_indices(n, pa, pb, pc);

-        // Single-field group: default field is body only
+        for (view_name, bench_index) in [("single_field", single_view), ("multi_field", multi_view)]
        {
+            // Single-field group: default field is body only
            let mut group = runner.new_group();
-            group.set_name(format!("single_field — {}", label));
-            group.register_with_input("+a_+b_count", &single_view, |benv: &BenchIndex| {
-                black_box(benv.count_query("+a +b"))
-            });
-            group.register_with_input("+a_+b_+c_count", &single_view, |benv: &BenchIndex| {
-                black_box(benv.count_query("+a +b +c"))
-            });
-            group.register_with_input("+a_+b_top10", &single_view, |benv: &BenchIndex| {
-                black_box(benv.topk_len("+a +b", 10))
-            });
-            group.register_with_input("+a_+b_+c_top10", &single_view, |benv: &BenchIndex| {
-                black_box(benv.topk_len("+a +b +c", 10))
-            });
-            // OR queries
-            group.register_with_input("a_OR_b_count", &single_view, |benv: &BenchIndex| {
-                black_box(benv.count_query("a OR b"))
-            });
-            group.register_with_input("a_OR_b_OR_c_count", &single_view, |benv: &BenchIndex| {
-                black_box(benv.count_query("a OR b OR c"))
-            });
-            group.register_with_input("a_OR_b_top10", &single_view, |benv: &BenchIndex| {
-                black_box(benv.topk_len("a OR b", 10))
-            });
-            group.register_with_input("a_OR_b_OR_c_top10", &single_view, |benv: &BenchIndex| {
-                black_box(benv.topk_len("a OR b OR c", 10))
-            });
-            group.run();
-        }
-
-        // Multi-field group: default fields are [title, body]
-        {
-            let mut group = runner.new_group();
-            group.set_name(format!("multi_field — {}", label));
-            group.register_with_input("+a_+b_count", &multi_view, |benv: &BenchIndex| {
-                black_box(benv.count_query("+a +b"))
-            });
-            group.register_with_input("+a_+b_+c_count", &multi_view, |benv: &BenchIndex| {
-                black_box(benv.count_query("+a +b +c"))
-            });
-            group.register_with_input("+a_+b_top10", &multi_view, |benv: &BenchIndex| {
-                black_box(benv.topk_len("+a +b", 10))
-            });
-            group.register_with_input("+a_+b_+c_top10", &multi_view, |benv: &BenchIndex| {
-                black_box(benv.topk_len("+a +b +c", 10))
-            });
-            // OR queries
-            group.register_with_input("a_OR_b_count", &multi_view, |benv: &BenchIndex| {
-                black_box(benv.count_query("a OR b"))
-            });
-            group.register_with_input("a_OR_b_OR_c_count", &multi_view, |benv: &BenchIndex| {
-                black_box(benv.count_query("a OR b OR c"))
-            });
-            group.register_with_input("a_OR_b_top10", &multi_view, |benv: &BenchIndex| {
-                black_box(benv.topk_len("a OR b", 10))
-            });
-            group.register_with_input("a_OR_b_OR_c_top10", &multi_view, |benv: &BenchIndex| {
-                black_box(benv.topk_len("a OR b OR c", 10))
-            });
+            group.set_name(format!("{} — {}", view_name, label));
+            for query_str in queries {
+                add_bench_task(&mut group, &bench_index, query_str, Count, "count");
+                add_bench_task(
+                    &mut group,
+                    &bench_index,
+                    query_str,
+                    TopDocs::with_limit(10).order_by_score(),
+                    "top10",
+                );
+                add_bench_task(
+                    &mut group,
+                    &bench_index,
+                    query_str,
+                    TopDocs::with_limit(10).order_by_fast_field::<u64>("score", Order::Asc),
+                    "top10_by_ff",
+                );
+                add_bench_task(
+                    &mut group,
+                    &bench_index,
+                    query_str,
+                    TopDocs::with_limit(10).order_by((
+                        SortByStaticFastValue::<u64>::for_field("score"),
+                        SortByStaticFastValue::<u64>::for_field("score2"),
+                    )),
+                    "top10_by_2ff",
+                );
+            }
            group.run();
        }
    }
 }
+
+fn add_bench_task<C: Collector + 'static>(
+    bench_group: &mut BenchGroup,
+    bench_index: &BenchIndex,
+    query_str: &str,
+    collector: C,
+    collector_name: &str,
+) {
+    let task_name = format!("{}_{}", query_str.replace(" ", "_"), collector_name);
+    let query = bench_index.query_parser.parse_query(query_str).unwrap();
+    let search_task = SearchTask {
+        searcher: bench_index.searcher.clone(),
+        collector,
+        query,
+    };
+    bench_group.register(task_name, move |_| black_box(search_task.run()));
+}
+
+struct SearchTask<C: Collector> {
+    searcher: Searcher,
+    collector: C,
+    query: Box<dyn Query>,
+}
+
+impl<C: Collector> SearchTask<C> {
+    #[inline(never)]
+    pub fn run(&self) -> usize {
+        self.searcher.search(&self.query, &self.collector).unwrap();
+        1
+    }
+}
--- a/benches/bool_queries_with_range.rs
+++ b/benches/bool_queries_with_range.rs
@@ -0,0 +1,288 @@
+use binggan::{black_box, BenchGroup, BenchRunner};
+use rand::prelude::*;
+use rand::rngs::StdRng;
+use rand::SeedableRng;
+use tantivy::collector::{Collector, Count, DocSetCollector, TopDocs};
+use tantivy::query::{Query, QueryParser};
+use tantivy::schema::{Schema, FAST, INDEXED, TEXT};
+use tantivy::{doc, Index, Order, ReloadPolicy, Searcher};
+
+#[derive(Clone)]
+struct BenchIndex {
+    #[allow(dead_code)]
+    index: Index,
+    searcher: Searcher,
+    query_parser: QueryParser,
+}
+
+fn build_shared_indices(num_docs: usize, p_title_a: f32, distribution: &str) -> BenchIndex {
+    // Unified schema
+    let mut schema_builder = Schema::builder();
+    let f_title = schema_builder.add_text_field("title", TEXT);
+    let f_num_rand = schema_builder.add_u64_field("num_rand", INDEXED);
+    let f_num_asc = schema_builder.add_u64_field("num_asc", INDEXED);
+    let f_num_rand_fast = schema_builder.add_u64_field("num_rand_fast", INDEXED | FAST);
+    let f_num_asc_fast = schema_builder.add_u64_field("num_asc_fast", INDEXED | FAST);
+    let schema = schema_builder.build();
+    let index = Index::create_in_ram(schema.clone());
+
+    // Populate index with stable RNG for reproducibility.
+    let mut rng = StdRng::from_seed([7u8; 32]);
+
+    {
+        let mut writer = index.writer_with_num_threads(1, 4_000_000_000).unwrap();
+
+        match distribution {
+            "dense" => {
+                for doc_id in 0..num_docs {
+                    // Always add title to avoid empty documents
+                    let title_token = if rng.random_bool(p_title_a as f64) {
+                        "a"
+                    } else {
+                        "b"
+                    };
+
+                    let num_rand = rng.random_range(0u64..1000u64);
+
+                    let num_asc = (doc_id / 10000) as u64;
+
+                    writer
+                        .add_document(doc!(
+                            f_title=>title_token,
+                            f_num_rand=>num_rand,
+                            f_num_asc=>num_asc,
+                            f_num_rand_fast=>num_rand,
+                            f_num_asc_fast=>num_asc,
+                        ))
+                        .unwrap();
+                }
+            }
+            "sparse" => {
+                for doc_id in 0..num_docs {
+                    // Always add title to avoid empty documents
+                    let title_token = if rng.random_bool(p_title_a as f64) {
+                        "a"
+                    } else {
+                        "b"
+                    };
+
+                    let num_rand = rng.random_range(0u64..10000000u64);
+
+                    let num_asc = doc_id as u64;
+
+                    writer
+                        .add_document(doc!(
+                            f_title=>title_token,
+                            f_num_rand=>num_rand,
+                            f_num_asc=>num_asc,
+                            f_num_rand_fast=>num_rand,
+                            f_num_asc_fast=>num_asc,
+                        ))
+                        .unwrap();
+                }
+            }
+            _ => {
+                panic!("Unsupported distribution type");
+            }
+        }
+        writer.commit().unwrap();
+    }
+
+    // Prepare reader/searcher once.
+    let reader = index
+        .reader_builder()
+        .reload_policy(ReloadPolicy::Manual)
+        .try_into()
+        .unwrap();
+    let searcher = reader.searcher();
+
+    // Build query parser for title field
+    let qp_title = QueryParser::for_index(&index, vec![f_title]);
+
+    BenchIndex {
+        index,
+        searcher,
+        query_parser: qp_title,
+    }
+}
+
+fn main() {
+    // Prepare corpora with varying scenarios
+    let scenarios = vec![
+        (
+            "dense and 99% a".to_string(),
+            10_000_000,
+            0.99,
+            "dense",
+            0,
+            9,
+        ),
+        (
+            "dense and 99% a".to_string(),
+            10_000_000,
+            0.99,
+            "dense",
+            990,
+            999,
+        ),
+        (
+            "sparse and 99% a".to_string(),
+            10_000_000,
+            0.99,
+            "sparse",
+            0,
+            9,
+        ),
+        (
+            "sparse and 99% a".to_string(),
+            10_000_000,
+            0.99,
+            "sparse",
+            9_999_990,
+            9_999_999,
+        ),
+    ];
+
+    let mut runner = BenchRunner::new();
+    for (scenario_id, n, p_title_a, num_rand_distribution, range_low, range_high) in scenarios {
+        // Build index for this scenario
+        let bench_index = build_shared_indices(n, p_title_a, num_rand_distribution);
+
+        // Create benchmark group
+        let mut group = runner.new_group();
+
+        // Now set the name (this moves scenario_id)
+        group.set_name(scenario_id);
+
+        // Define all four field types
+        let field_names = ["num_rand", "num_asc", "num_rand_fast", "num_asc_fast"];
+
+        // Define the three terms we want to test with
+        let terms = ["a", "b", "z"];
+
+        // Generate all combinations of terms and field names
+        let mut queries = Vec::new();
+        for &term in &terms {
+            for &field_name in &field_names {
+                let query_str = format!(
+                    "{} AND {}:[{} TO {}]",
+                    term, field_name, range_low, range_high
+                );
+                queries.push((query_str, field_name.to_string()));
+            }
+        }
+
+        let query_str = format!(
+            "{}:[{} TO {}] AND {}:[{} TO {}]",
+            "num_rand_fast", range_low, range_high, "num_asc_fast", range_low, range_high
+        );
+        queries.push((query_str, "num_asc_fast".to_string()));
+
+        // Run all benchmark tasks for each query and its corresponding field name
+        for (query_str, field_name) in queries {
+            run_benchmark_tasks(&mut group, &bench_index, &query_str, &field_name);
+        }
+
+        group.run();
+    }
+}
+
+/// Run all benchmark tasks for a given query string and field name
+fn run_benchmark_tasks(
+    bench_group: &mut BenchGroup,
+    bench_index: &BenchIndex,
+    query_str: &str,
+    field_name: &str,
+) {
+    // Test count
+    add_bench_task(bench_group, bench_index, query_str, Count, "count");
+
+    // Test all results
+    add_bench_task(
+        bench_group,
+        bench_index,
+        query_str,
+        DocSetCollector,
+        "all results",
+    );
+
+    // Test top 100 by the field (if it's a FAST field)
+    if field_name.ends_with("_fast") {
+        // Ascending order
+        {
+            let collector_name = format!("top100_by_{}_asc", field_name);
+            let field_name_owned = field_name.to_string();
+            add_bench_task(
+                bench_group,
+                bench_index,
+                query_str,
+                TopDocs::with_limit(100).order_by_fast_field::<u64>(field_name_owned, Order::Asc),
+                &collector_name,
+            );
+        }
+
+        // Descending order
+        {
+            let collector_name = format!("top100_by_{}_desc", field_name);
+            let field_name_owned = field_name.to_string();
+            add_bench_task(
+                bench_group,
+                bench_index,
+                query_str,
+                TopDocs::with_limit(100).order_by_fast_field::<u64>(field_name_owned, Order::Desc),
+                &collector_name,
+            );
+        }
+    }
+}
+
+fn add_bench_task<C: Collector + 'static>(
+    bench_group: &mut BenchGroup,
+    bench_index: &BenchIndex,
+    query_str: &str,
+    collector: C,
+    collector_name: &str,
+) {
+    let task_name = format!("{}_{}", query_str.replace(" ", "_"), collector_name);
+    let query = bench_index.query_parser.parse_query(query_str).unwrap();
+    let search_task = SearchTask {
+        searcher: bench_index.searcher.clone(),
+        collector,
+        query,
+    };
+    bench_group.register(task_name, move |_| black_box(search_task.run()));
+}
+
+struct SearchTask<C: Collector> {
+    searcher: Searcher,
+    collector: C,
+    query: Box<dyn Query>,
+}
+
+impl<C: Collector> SearchTask<C> {
+    #[inline(never)]
+    pub fn run(&self) -> usize {
+        let result = self.searcher.search(&self.query, &self.collector).unwrap();
+        if let Some(count) = (&result as &dyn std::any::Any).downcast_ref::<usize>() {
+            *count
+        } else if let Some(top_docs) = (&result as &dyn std::any::Any)
+            .downcast_ref::<Vec<(Option<u64>, tantivy::DocAddress)>>()
+        {
+            top_docs.len()
+        } else if let Some(top_docs) =
+            (&result as &dyn std::any::Any).downcast_ref::<Vec<(u64, tantivy::DocAddress)>>()
+        {
+            top_docs.len()
+        } else if let Some(doc_set) = (&result as &dyn std::any::Any)
+            .downcast_ref::<std::collections::HashSet<tantivy::DocAddress>>()
+        {
+            doc_set.len()
+        } else {
+            eprintln!(
+                "Unknown collector result type: {:?}",
+                std::any::type_name::<C::Fruit>()
+            );
+            0
+        }
+    }
+}
--- a/benches/merge_segments.rs
+++ b/benches/merge_segments.rs
@@ -0,0 +1,224 @@
+// Benchmarks segment merging
+//
+// Notes:
+// - Input segments are kept intact (no deletes / no IndexWriter merge).
+// - Output is written to a `NullDirectory` that discards all files except
+//  fieldnorms (needed for merging).
+
+use std::collections::HashMap;
+use std::io::{self, Write};
+use std::path::{Path, PathBuf};
+use std::sync::{Arc, RwLock};
+
+use binggan::{black_box, BenchRunner};
+use rand::prelude::*;
+use rand::rngs::StdRng;
+use rand::SeedableRng;
+use tantivy::directory::error::{DeleteError, OpenReadError, OpenWriteError};
+use tantivy::directory::{
+    AntiCallToken, Directory, FileHandle, OwnedBytes, TerminatingWrite, WatchCallback, WatchHandle,
+    WritePtr,
+};
+use tantivy::indexer::{merge_filtered_segments, NoMergePolicy};
+use tantivy::schema::{Schema, TEXT};
+use tantivy::{doc, HasLen, Index, IndexSettings, Segment};
+
+#[derive(Clone, Default, Debug)]
+struct NullDirectory {
+    blobs: Arc<RwLock<HashMap<PathBuf, OwnedBytes>>>,
+}
+
+struct NullWriter;
+
+impl Write for NullWriter {
+    fn write(&mut self, buf: &[u8]) -> io::Result<usize> {
+        Ok(buf.len())
+    }
+
+    fn flush(&mut self) -> io::Result<()> {
+        Ok(())
+    }
+}
+
+impl TerminatingWrite for NullWriter {
+    fn terminate_ref(&mut self, _token: AntiCallToken) -> io::Result<()> {
+        Ok(())
+    }
+}
+
+struct InMemoryWriter {
+    path: PathBuf,
+    buffer: Vec<u8>,
+    blobs: Arc<RwLock<HashMap<PathBuf, OwnedBytes>>>,
+}
+
+impl Write for InMemoryWriter {
+    fn write(&mut self, buf: &[u8]) -> io::Result<usize> {
+        self.buffer.extend_from_slice(buf);
+        Ok(buf.len())
+    }
+
+    fn flush(&mut self) -> io::Result<()> {
+        Ok(())
+    }
+}
+
+impl TerminatingWrite for InMemoryWriter {
+    fn terminate_ref(&mut self, _token: AntiCallToken) -> io::Result<()> {
+        let bytes = OwnedBytes::new(std::mem::take(&mut self.buffer));
+        self.blobs.write().unwrap().insert(self.path.clone(), bytes);
+        Ok(())
+    }
+}
+
+#[derive(Debug, Default)]
+struct NullFileHandle;
+impl HasLen for NullFileHandle {
+    fn len(&self) -> usize {
+        0
+    }
+}
+impl FileHandle for NullFileHandle {
+    fn read_bytes(&self, _range: std::ops::Range<usize>) -> io::Result<OwnedBytes> {
+        unimplemented!()
+    }
+}
+
+impl Directory for NullDirectory {
+    fn get_file_handle(&self, path: &Path) -> Result<Arc<dyn FileHandle>, OpenReadError> {
+        if let Some(bytes) = self.blobs.read().unwrap().get(path) {
+            return Ok(Arc::new(bytes.clone()));
+        }
+        Ok(Arc::new(NullFileHandle))
+    }
+
+    fn delete(&self, _path: &Path) -> Result<(), DeleteError> {
+        Ok(())
+    }
+
+    fn exists(&self, _path: &Path) -> Result<bool, OpenReadError> {
+        Ok(true)
+    }
+
+    fn open_write(&self, path: &Path) -> Result<WritePtr, OpenWriteError> {
+        let path_buf = path.to_path_buf();
+        if path.to_string_lossy().ends_with(".fieldnorm") {
+            let writer = InMemoryWriter {
+                path: path_buf,
+                buffer: Vec::new(),
+                blobs: Arc::clone(&self.blobs),
+            };
+            Ok(io::BufWriter::new(Box::new(writer)))
+        } else {
+            Ok(io::BufWriter::new(Box::new(NullWriter)))
+        }
+    }
+
+    fn atomic_read(&self, path: &Path) -> Result<Vec<u8>, OpenReadError> {
+        if let Some(bytes) = self.blobs.read().unwrap().get(path) {
+            return Ok(bytes.as_slice().to_vec());
+        }
+        Err(OpenReadError::FileDoesNotExist(path.to_path_buf()))
+    }
+
+    fn atomic_write(&self, _path: &Path, _data: &[u8]) -> io::Result<()> {
+        Ok(())
+    }
+
+    fn sync_directory(&self) -> io::Result<()> {
+        Ok(())
+    }
+
+    fn watch(&self, _watch_callback: WatchCallback) -> tantivy::Result<WatchHandle> {
+        Ok(WatchHandle::empty())
+    }
+}
+
+struct MergeScenario {
+    #[allow(dead_code)]
+    index: Index,
+    segments: Vec<Segment>,
+    settings: IndexSettings,
+    label: String,
+}
+
+fn build_index(
+    num_segments: usize,
+    docs_per_segment: usize,
+    tokens_per_doc: usize,
+    vocab_size: usize,
+) -> MergeScenario {
+    let mut schema_builder = Schema::builder();
+    let body = schema_builder.add_text_field("body", TEXT);
+    let schema = schema_builder.build();
+    let index = Index::create_in_ram(schema.clone());
+
+    assert!(vocab_size > 0);
+    let total_tokens = num_segments * docs_per_segment * tokens_per_doc;
+    let use_unique_terms = vocab_size >= total_tokens;
+    let mut rng = StdRng::from_seed([7u8; 32]);
+    let mut next_token_id: u64 = 0;
+
+    {
+        let mut writer = index.writer_with_num_threads(1, 256_000_000).unwrap();
+        writer.set_merge_policy(Box::new(NoMergePolicy));
+        for _ in 0..num_segments {
+            for _ in 0..docs_per_segment {
+                let mut tokens = Vec::with_capacity(tokens_per_doc);
+                for _ in 0..tokens_per_doc {
+                    let token_id = if use_unique_terms {
+                        let id = next_token_id;
+                        next_token_id += 1;
+                        id
+                    } else {
+                        rng.random_range(0..vocab_size as u64)
+                    };
+                    tokens.push(format!("term_{token_id}"));
+                }
+                writer.add_document(doc!(body => tokens.join(" "))).unwrap();
+            }
+            writer.commit().unwrap();
+        }
+    }
+
+    let segments = index.searchable_segments().unwrap();
+    let settings = index.settings().clone();
+    let label = format!(
+        "segments={}, docs/seg={}, tokens/doc={}, vocab={}",
+        num_segments, docs_per_segment, tokens_per_doc, vocab_size
+    );
+
+    MergeScenario {
+        index,
+        segments,
+        settings,
+        label,
+    }
+}
+
+fn main() {
+    let scenarios = vec![
+        build_index(8, 50_000, 12, 8),
+        build_index(16, 50_000, 12, 8),
+        build_index(16, 100_000, 12, 8),
+        build_index(8, 50_000, 8, 8 * 50_000 * 8),
+    ];
+
+    let mut runner = BenchRunner::new();
+    for scenario in scenarios {
+        let mut group = runner.new_group();
+        group.set_name(format!("merge_segments inv_index — {}", scenario.label));
+        let segments = scenario.segments.clone();
+        let settings = scenario.settings.clone();
+        group.register("merge", move |_| {
+            let output_dir = NullDirectory::default();
+            let filter_doc_ids = vec![None; segments.len()];
+            let merged_index =
+                merge_filtered_segments(&segments, settings.clone(), filter_doc_ids, output_dir)
+                    .unwrap();
+            black_box(merged_index);
+        });
+
+        group.run();
+    }
+}
--- a/benches/range_queries.rs
+++ b/benches/range_queries.rs
@@ -0,0 +1,365 @@
+use std::ops::Bound;
+
+use binggan::{black_box, BenchGroup, BenchRunner};
+use rand::prelude::*;
+use rand::rngs::StdRng;
+use rand::SeedableRng;
+use tantivy::collector::{Count, DocSetCollector, TopDocs};
+use tantivy::query::RangeQuery;
+use tantivy::schema::{Schema, FAST, INDEXED};
+use tantivy::{doc, Index, Order, ReloadPolicy, Searcher, Term};
+
+#[derive(Clone)]
+struct BenchIndex {
+    #[allow(dead_code)]
+    index: Index,
+    searcher: Searcher,
+}
+
+fn build_shared_indices(num_docs: usize, distribution: &str) -> BenchIndex {
+    // Schema with fast fields only
+    let mut schema_builder = Schema::builder();
+    let f_num_rand_fast = schema_builder.add_u64_field("num_rand_fast", INDEXED | FAST);
+    let f_num_asc_fast = schema_builder.add_u64_field("num_asc_fast", INDEXED | FAST);
+    let schema = schema_builder.build();
+    let index = Index::create_in_ram(schema.clone());
+
+    // Populate index with stable RNG for reproducibility.
+    let mut rng = StdRng::from_seed([7u8; 32]);
+
+    {
+        let mut writer = index.writer_with_num_threads(1, 4_000_000_000).unwrap();
+
+        match distribution {
+            "dense" => {
+                for doc_id in 0..num_docs {
+                    let num_rand = rng.random_range(0u64..1000u64);
+                    let num_asc = (doc_id / 10000) as u64;
+
+                    writer
+                        .add_document(doc!(
+                            f_num_rand_fast=>num_rand,
+                            f_num_asc_fast=>num_asc,
+                        ))
+                        .unwrap();
+                }
+            }
+            "sparse" => {
+                for doc_id in 0..num_docs {
+                    let num_rand = rng.random_range(0u64..10000000u64);
+                    let num_asc = doc_id as u64;
+
+                    writer
+                        .add_document(doc!(
+                            f_num_rand_fast=>num_rand,
+                            f_num_asc_fast=>num_asc,
+                        ))
+                        .unwrap();
+                }
+            }
+            _ => {
+                panic!("Unsupported distribution type");
+            }
+        }
+        writer.commit().unwrap();
+    }
+
+    // Prepare reader/searcher once.
+    let reader = index
+        .reader_builder()
+        .reload_policy(ReloadPolicy::Manual)
+        .try_into()
+        .unwrap();
+    let searcher = reader.searcher();
+
+    BenchIndex { index, searcher }
+}
+
+fn main() {
+    // Prepare corpora with varying scenarios
+    let scenarios = vec![
+        // Dense distribution - random values in small range (0-999)
+        (
+            "dense_values_search_low_value_range".to_string(),
+            10_000_000,
+            "dense",
+            0,
+            9,
+        ),
+        (
+            "dense_values_search_high_value_range".to_string(),
+            10_000_000,
+            "dense",
+            990,
+            999,
+        ),
+        (
+            "dense_values_search_out_of_range".to_string(),
+            10_000_000,
+            "dense",
+            1000,
+            1002,
+        ),
+        (
+            "sparse_values_search_low_value_range".to_string(),
+            10_000_000,
+            "sparse",
+            0,
+            9,
+        ),
+        (
+            "sparse_values_search_high_value_range".to_string(),
+            10_000_000,
+            "sparse",
+            9_999_990,
+            9_999_999,
+        ),
+        (
+            "sparse_values_search_out_of_range".to_string(),
+            10_000_000,
+            "sparse",
+            10_000_000,
+            10_000_002,
+        ),
+    ];
+
+    let mut runner = BenchRunner::new();
+    for (scenario_id, n, num_rand_distribution, range_low, range_high) in scenarios {
+        // Build index for this scenario
+        let bench_index = build_shared_indices(n, num_rand_distribution);
+
+        // Create benchmark group
+        let mut group = runner.new_group();
+
+        // Now set the name (this moves scenario_id)
+        group.set_name(scenario_id);
+
+        // Define fast field types
+        let field_names = ["num_rand_fast", "num_asc_fast"];
+
+        // Generate range queries for fast fields
+        for &field_name in &field_names {
+            // Create the range query
+            let field = bench_index.searcher.schema().get_field(field_name).unwrap();
+            let lower_term = Term::from_field_u64(field, range_low);
+            let upper_term = Term::from_field_u64(field, range_high);
+
+            let query = RangeQuery::new(Bound::Included(lower_term), Bound::Included(upper_term));
+
+            run_benchmark_tasks(
+                &mut group,
+                &bench_index,
+                query,
+                field_name,
+                range_low,
+                range_high,
+            );
+        }
+
+        group.run();
+    }
+}
+
+/// Run all benchmark tasks for a given range query and field name
+fn run_benchmark_tasks(
+    bench_group: &mut BenchGroup,
+    bench_index: &BenchIndex,
+    query: RangeQuery,
+    field_name: &str,
+    range_low: u64,
+    range_high: u64,
+) {
+    // Test count
+    add_bench_task_count(
+        bench_group,
+        bench_index,
+        query.clone(),
+        "count",
+        field_name,
+        range_low,
+        range_high,
+    );
+
+    // Test top 100 by the field (ascending order)
+    {
+        let collector_name = format!("top100_by_{}_asc", field_name);
+        let field_name_owned = field_name.to_string();
+        add_bench_task_top100_asc(
+            bench_group,
+            bench_index,
+            query.clone(),
+            &collector_name,
+            field_name,
+            range_low,
+            range_high,
+            field_name_owned,
+        );
+    }
+
+    // Test top 100 by the field (descending order)
+    {
+        let collector_name = format!("top100_by_{}_desc", field_name);
+        let field_name_owned = field_name.to_string();
+        add_bench_task_top100_desc(
+            bench_group,
+            bench_index,
+            query,
+            &collector_name,
+            field_name,
+            range_low,
+            range_high,
+            field_name_owned,
+        );
+    }
+}
+
+fn add_bench_task_count(
+    bench_group: &mut BenchGroup,
+    bench_index: &BenchIndex,
+    query: RangeQuery,
+    collector_name: &str,
+    field_name: &str,
+    range_low: u64,
+    range_high: u64,
+) {
+    let task_name = format!(
+        "range_{}_[{} TO {}]_{}",
+        field_name, range_low, range_high, collector_name
+    );
+
+    let search_task = CountSearchTask {
+        searcher: bench_index.searcher.clone(),
+        query,
+    };
+    bench_group.register(task_name, move |_| black_box(search_task.run()));
+}
+
+fn add_bench_task_docset(
+    bench_group: &mut BenchGroup,
+    bench_index: &BenchIndex,
+    query: RangeQuery,
+    collector_name: &str,
+    field_name: &str,
+    range_low: u64,
+    range_high: u64,
+) {
+    let task_name = format!(
+        "range_{}_[{} TO {}]_{}",
+        field_name, range_low, range_high, collector_name
+    );
+
+    let search_task = DocSetSearchTask {
+        searcher: bench_index.searcher.clone(),
+        query,
+    };
+    bench_group.register(task_name, move |_| black_box(search_task.run()));
+}
+
+fn add_bench_task_top100_asc(
+    bench_group: &mut BenchGroup,
+    bench_index: &BenchIndex,
+    query: RangeQuery,
+    collector_name: &str,
+    field_name: &str,
+    range_low: u64,
+    range_high: u64,
+    field_name_owned: String,
+) {
+    let task_name = format!(
+        "range_{}_[{} TO {}]_{}",
+        field_name, range_low, range_high, collector_name
+    );
+
+    let search_task = Top100AscSearchTask {
+        searcher: bench_index.searcher.clone(),
+        query,
+        field_name: field_name_owned,
+    };
+    bench_group.register(task_name, move |_| black_box(search_task.run()));
+}
+
+fn add_bench_task_top100_desc(
+    bench_group: &mut BenchGroup,
+    bench_index: &BenchIndex,
+    query: RangeQuery,
+    collector_name: &str,
+    field_name: &str,
+    range_low: u64,
+    range_high: u64,
+    field_name_owned: String,
+) {
+    let task_name = format!(
+        "range_{}_[{} TO {}]_{}",
+        field_name, range_low, range_high, collector_name
+    );
+
+    let search_task = Top100DescSearchTask {
+        searcher: bench_index.searcher.clone(),
+        query,
+        field_name: field_name_owned,
+    };
+    bench_group.register(task_name, move |_| black_box(search_task.run()));
+}
+
+struct CountSearchTask {
+    searcher: Searcher,
+    query: RangeQuery,
+}
+
+impl CountSearchTask {
+    #[inline(never)]
+    pub fn run(&self) -> usize {
+        self.searcher.search(&self.query, &Count).unwrap()
+    }
+}
+
+struct DocSetSearchTask {
+    searcher: Searcher,
+    query: RangeQuery,
+}
+
+impl DocSetSearchTask {
+    #[inline(never)]
+    pub fn run(&self) -> usize {
+        let result = self.searcher.search(&self.query, &DocSetCollector).unwrap();
+        result.len()
+    }
+}
+
+struct Top100AscSearchTask {
+    searcher: Searcher,
+    query: RangeQuery,
+    field_name: String,
+}
+
+impl Top100AscSearchTask {
+    #[inline(never)]
+    pub fn run(&self) -> usize {
+        let collector =
+            TopDocs::with_limit(100).order_by_fast_field::<u64>(&self.field_name, Order::Asc);
+        let result = self.searcher.search(&self.query, &collector).unwrap();
+        for (_score, doc_address) in &result {
+            let _doc: tantivy::TantivyDocument = self.searcher.doc(*doc_address).unwrap();
+        }
+        result.len()
+    }
+}
+
+struct Top100DescSearchTask {
+    searcher: Searcher,
+    query: RangeQuery,
+    field_name: String,
+}
+
+impl Top100DescSearchTask {
+    #[inline(never)]
+    pub fn run(&self) -> usize {
+        let collector =
+            TopDocs::with_limit(100).order_by_fast_field::<u64>(&self.field_name, Order::Desc);
+        let result = self.searcher.search(&self.query, &collector).unwrap();
+        for (_score, doc_address) in &result {
+            let _doc: tantivy::TantivyDocument = self.searcher.doc(*doc_address).unwrap();
+        }
+        result.len()
+    }
+}
--- a/benches/range_query.rs
+++ b/benches/range_query.rs
@@ -0,0 +1,260 @@
+use std::fmt::Display;
+use std::net::Ipv6Addr;
+use std::ops::RangeInclusive;
+
+use binggan::plugins::PeakMemAllocPlugin;
+use binggan::{black_box, BenchRunner, OutputValue, PeakMemAlloc, INSTRUMENTED_SYSTEM};
+use columnar::MonotonicallyMappableToU128;
+use rand::rngs::StdRng;
+use rand::{Rng, SeedableRng};
+use tantivy::collector::{Count, TopDocs};
+use tantivy::query::QueryParser;
+use tantivy::schema::*;
+use tantivy::{doc, Index};
+
+#[global_allocator]
+pub static GLOBAL: &PeakMemAlloc<std::alloc::System> = &INSTRUMENTED_SYSTEM;
+
+fn main() {
+    bench_range_query();
+}
+
+fn bench_range_query() {
+    let index = get_index_0_to_100();
+    let mut runner = BenchRunner::new();
+    runner.add_plugin(PeakMemAllocPlugin::new(GLOBAL));
+
+    runner.set_name("range_query on u64");
+    let field_name_and_descr: Vec<_> = vec![
+        ("id", "Single Valued Range Field"),
+        ("ids", "Multi Valued Range Field"),
+    ];
+    let range_num_hits = vec![
+        ("90_percent", get_90_percent()),
+        ("10_percent", get_10_percent()),
+        ("1_percent", get_1_percent()),
+    ];
+
+    test_range(&mut runner, &index, &field_name_and_descr, range_num_hits);
+
+    runner.set_name("range_query on ip");
+    let field_name_and_descr: Vec<_> = vec![
+        ("ip", "Single Valued Range Field"),
+        ("ips", "Multi Valued Range Field"),
+    ];
+    let range_num_hits = vec![
+        ("90_percent", get_90_percent_ip()),
+        ("10_percent", get_10_percent_ip()),
+        ("1_percent", get_1_percent_ip()),
+    ];
+
+    test_range(&mut runner, &index, &field_name_and_descr, range_num_hits);
+}
+
+fn test_range<T: Display>(
+    runner: &mut BenchRunner,
+    index: &Index,
+    field_name_and_descr: &[(&str, &str)],
+    range_num_hits: Vec<(&str, RangeInclusive<T>)>,
+) {
+    for (field, suffix) in field_name_and_descr {
+        let term_num_hits = vec![
+            ("", ""),
+            ("1_percent", "veryfew"),
+            ("10_percent", "few"),
+            ("90_percent", "most"),
+        ];
+        let mut group = runner.new_group();
+        group.set_name(suffix);
+        // all intersect combinations
+        for (range_name, range) in &range_num_hits {
+            for (term_name, term) in &term_num_hits {
+                let index = &index;
+                let test_name = if term_name.is_empty() {
+                    format!("id_range_hit_{}", range_name)
+                } else {
+                    format!(
+                        "id_range_hit_{}_intersect_with_term_{}",
+                        range_name, term_name
+                    )
+                };
+                group.register(test_name, move |_| {
+                    let query = if term_name.is_empty() {
+                        "".to_string()
+                    } else {
+                        format!("AND id_name:{}", term)
+                    };
+                    black_box(execute_query(field, range, &query, index));
+                });
+            }
+        }
+        group.run();
+    }
+}
+
+fn get_index_0_to_100() -> Index {
+    let mut rng = StdRng::from_seed([1u8; 32]);
+    let num_vals = 100_000;
+    let docs: Vec<_> = (0..num_vals)
+        .map(|_i| {
+            let id_name = if rng.random_bool(0.01) {
+                "veryfew".to_string() // 1%
+            } else if rng.random_bool(0.1) {
+                "few".to_string() // 9%
+            } else {
+                "most".to_string() // 90%
+            };
+            Doc {
+                id_name,
+                id: rng.random_range(0..100),
+                // Multiply by 1000, so that we create most buckets in the compact space
+                // The benches depend on this range to select n-percent of elements with the
+                // methods below.
+                ip: Ipv6Addr::from_u128(rng.random_range(0..100) * 1000),
+            }
+        })
+        .collect();
+
+    create_index_from_docs(&docs)
+}
+
+#[derive(Clone, Debug)]
+pub struct Doc {
+    pub id_name: String,
+    pub id: u64,
+    pub ip: Ipv6Addr,
+}
+
+pub fn create_index_from_docs(docs: &[Doc]) -> Index {
+    let mut schema_builder = Schema::builder();
+    let id_u64_field = schema_builder.add_u64_field("id", INDEXED | STORED | FAST);
+    let ids_u64_field =
+        schema_builder.add_u64_field("ids", NumericOptions::default().set_fast().set_indexed());
+
+    let id_f64_field = schema_builder.add_f64_field("id_f64", INDEXED | STORED | FAST);
+    let ids_f64_field = schema_builder.add_f64_field(
+        "ids_f64",
+        NumericOptions::default().set_fast().set_indexed(),
+    );
+
+    let id_i64_field = schema_builder.add_i64_field("id_i64", INDEXED | STORED | FAST);
+    let ids_i64_field = schema_builder.add_i64_field(
+        "ids_i64",
+        NumericOptions::default().set_fast().set_indexed(),
+    );
+
+    let text_field = schema_builder.add_text_field("id_name", STRING | STORED);
+    let text_field2 = schema_builder.add_text_field("id_name_fast", STRING | STORED | FAST);
+
+    let ip_field = schema_builder.add_ip_addr_field("ip", FAST);
+    let ips_field = schema_builder.add_ip_addr_field("ips", FAST);
+
+    let schema = schema_builder.build();
+
+    let index = Index::create_in_ram(schema);
+
+    {
+        let mut index_writer = index.writer_with_num_threads(1, 50_000_000).unwrap();
+        for doc in docs.iter() {
+            index_writer
+                .add_document(doc!(
+                    ids_i64_field => doc.id as i64,
+                    ids_i64_field => doc.id as i64,
+                    ids_f64_field => doc.id as f64,
+                    ids_f64_field => doc.id as f64,
+                    ids_u64_field => doc.id,
+                    ids_u64_field => doc.id,
+                    id_u64_field => doc.id,
+                    id_f64_field => doc.id as f64,
+                    id_i64_field => doc.id as i64,
+                    text_field => doc.id_name.to_string(),
+                    text_field2 => doc.id_name.to_string(),
+                    ips_field => doc.ip,
+                    ips_field => doc.ip,
+                    ip_field => doc.ip,
+                ))
+                .unwrap();
+        }
+
+        index_writer.commit().unwrap();
+    }
+    index
+}
+
+fn get_90_percent() -> RangeInclusive<u64> {
+    0..=90
+}
+
+fn get_10_percent() -> RangeInclusive<u64> {
+    0..=10
+}
+
+fn get_1_percent() -> RangeInclusive<u64> {
+    10..=10
+}
+
+fn get_90_percent_ip() -> RangeInclusive<Ipv6Addr> {
+    let start = Ipv6Addr::from_u128(0);
+    let end = Ipv6Addr::from_u128(90 * 1000);
+    start..=end
+}
+
+fn get_10_percent_ip() -> RangeInclusive<Ipv6Addr> {
+    let start = Ipv6Addr::from_u128(0);
+    let end = Ipv6Addr::from_u128(10 * 1000);
+    start..=end
+}
+
+fn get_1_percent_ip() -> RangeInclusive<Ipv6Addr> {
+    let start = Ipv6Addr::from_u128(10 * 1000);
+    let end = Ipv6Addr::from_u128(10 * 1000);
+    start..=end
+}
+
+struct NumHits {
+    count: usize,
+}
+impl OutputValue for NumHits {
+    fn column_title() -> &'static str {
+        "NumHits"
+    }
+    fn format(&self) -> Option<String> {
+        Some(self.count.to_string())
+    }
+}
+
+fn execute_query<T: Display>(
+    field: &str,
+    id_range: &RangeInclusive<T>,
+    suffix: &str,
+    index: &Index,
+) -> NumHits {
+    let gen_query_inclusive = |from: &T, to: &T| {
+        format!(
+            "{}:[{} TO {}] {}",
+            field,
+            &from.to_string(),
+            &to.to_string(),
+            suffix
+        )
+    };
+
+    let query = gen_query_inclusive(id_range.start(), id_range.end());
+    execute_query_(&query, index)
+}
+
+fn execute_query_(query: &str, index: &Index) -> NumHits {
+    let query_from_text = |text: &str| {
+        QueryParser::for_index(index, vec![])
+            .parse_query(text)
+            .unwrap()
+    };
+    let query = query_from_text(query);
+    let reader = index.reader().unwrap();
+    let searcher = reader.searcher();
+    let num_hits = searcher
+        .search(&query, &(TopDocs::with_limit(10).order_by_score(), Count))
+        .unwrap()
+        .1;
+    NumHits { count: num_hits }
+}
--- a/benches/regex_all_terms.rs
+++ b/benches/regex_all_terms.rs
@@ -0,0 +1,113 @@
+// Benchmarks regex query that matches all terms in a synthetic index.
+//
+// Corpus model:
+// - N unique terms: t000000, t000001, ...
+// - M docs
+// - K tokens per doc: doc i gets terms derived from (i, token_index)
+//
+// Query:
+// - Regex "t.*" to match all terms
+//
+// Run with:
+// - cargo bench --bench regex_all_terms
+//
+
+use std::fmt::Write;
+
+use binggan::{black_box, BenchRunner};
+use tantivy::collector::Count;
+use tantivy::query::RegexQuery;
+use tantivy::schema::{Schema, TEXT};
+use tantivy::{doc, Index, ReloadPolicy};
+
+const HEAP_SIZE_BYTES: usize = 200_000_000;
+
+#[derive(Clone, Copy)]
+struct BenchConfig {
+    num_terms: usize,
+    num_docs: usize,
+    tokens_per_doc: usize,
+}
+
+fn main() {
+    let configs = default_configs();
+
+    let mut runner = BenchRunner::new();
+    for config in configs {
+        let (index, text_field) = build_index(config, HEAP_SIZE_BYTES);
+        let reader = index
+            .reader_builder()
+            .reload_policy(ReloadPolicy::Manual)
+            .try_into()
+            .expect("reader");
+        let searcher = reader.searcher();
+        let query = RegexQuery::from_pattern("t.*", text_field).expect("regex query");
+
+        let mut group = runner.new_group();
+        group.set_name(format!(
+            "regex_all_terms_t{}_d{}_k{}",
+            config.num_terms, config.num_docs, config.tokens_per_doc
+        ));
+        group.register("regex_count", move |_| {
+            let count = searcher.search(&query, &Count).expect("search");
+            black_box(count);
+        });
+        group.run();
+    }
+}
+
+fn default_configs() -> Vec<BenchConfig> {
+    vec![
+        BenchConfig {
+            num_terms: 10_000,
+            num_docs: 100_000,
+            tokens_per_doc: 1,
+        },
+        BenchConfig {
+            num_terms: 10_000,
+            num_docs: 100_000,
+            tokens_per_doc: 8,
+        },
+        BenchConfig {
+            num_terms: 100_000,
+            num_docs: 100_000,
+            tokens_per_doc: 1,
+        },
+        BenchConfig {
+            num_terms: 100_000,
+            num_docs: 100_000,
+            tokens_per_doc: 8,
+        },
+    ]
+}
+
+fn build_index(config: BenchConfig, heap_size_bytes: usize) -> (Index, tantivy::schema::Field) {
+    let mut schema_builder = Schema::builder();
+    let text_field = schema_builder.add_text_field("text", TEXT);
+    let schema = schema_builder.build();
+    let index = Index::create_in_ram(schema);
+
+    let term_width = config.num_terms.to_string().len();
+    {
+        let mut writer = index
+            .writer_with_num_threads(1, heap_size_bytes)
+            .expect("writer");
+        let mut buffer = String::new();
+        for doc_id in 0..config.num_docs {
+            buffer.clear();
+            for token_idx in 0..config.tokens_per_doc {
+                if token_idx > 0 {
+                    buffer.push(' ');
+                }
+                let term_id = (doc_id * config.tokens_per_doc + token_idx) % config.num_terms;
+                write!(&mut buffer, "t{term_id:0term_width$}").expect("write token");
+            }
+            writer
+                .add_document(doc!(text_field => buffer.as_str()))
+                .expect("add_document");
+        }
+        writer.commit().expect("commit");
+    }
+
+    (index, text_field)
+}
--- a/benches/str_search_and_get.rs
+++ b/benches/str_search_and_get.rs
@@ -0,0 +1,420 @@
+// This benchmark compares different approaches for retrieving string values:
+//
+// 1. Fast Field Approach: retrieves string values via term_ords() and ord_to_str()
+//
+// 2. Doc Store Approach: retrieves string values via searcher.doc() and field extraction
+//
+// The benchmark includes various data distributions:
+// - Dense Sequential: Sequential document IDs with dense data
+// - Dense Random: Random document IDs with dense data
+// - Sparse Sequential: Sequential document IDs with sparse data
+// - Sparse Random: Random document IDs with sparse data
+use std::ops::Bound;
+
+use binggan::{black_box, BenchGroup, BenchRunner};
+use rand::prelude::*;
+use rand::rngs::StdRng;
+use rand::SeedableRng;
+use tantivy::collector::{Count, DocSetCollector};
+use tantivy::query::RangeQuery;
+use tantivy::schema::{Schema, Value, FAST, STORED, STRING};
+use tantivy::{doc, Index, ReloadPolicy, Searcher, Term};
+
+#[derive(Clone)]
+struct BenchIndex {
+    #[allow(dead_code)]
+    index: Index,
+    searcher: Searcher,
+}
+
+fn build_shared_indices(num_docs: usize, distribution: &str) -> BenchIndex {
+    // Schema with string fast field and stored field for doc access
+    let mut schema_builder = Schema::builder();
+    let f_str_fast = schema_builder.add_text_field("str_fast", STRING | STORED | FAST);
+    let f_str_stored = schema_builder.add_text_field("str_stored", STRING | STORED);
+    let schema = schema_builder.build();
+    let index = Index::create_in_ram(schema.clone());
+
+    // Populate index with stable RNG for reproducibility.
+    let mut rng = StdRng::from_seed([7u8; 32]);
+
+    {
+        let mut writer = index.writer_with_num_threads(1, 4_000_000_000).unwrap();
+
+        match distribution {
+            "dense_random" => {
+                for _doc_id in 0..num_docs {
+                    let suffix = rng.random_range(0u64..1000u64);
+                    let str_val = format!("str_{:03}", suffix);
+
+                    writer
+                        .add_document(doc!(
+                            f_str_fast=>str_val.clone(),
+                            f_str_stored=>str_val,
+                        ))
+                        .unwrap();
+                }
+            }
+            "dense_sequential" => {
+                for doc_id in 0..num_docs {
+                    let suffix = doc_id as u64 % 1000;
+                    let str_val = format!("str_{:03}", suffix);
+
+                    writer
+                        .add_document(doc!(
+                            f_str_fast=>str_val.clone(),
+                            f_str_stored=>str_val,
+                        ))
+                        .unwrap();
+                }
+            }
+            "sparse_random" => {
+                for _doc_id in 0..num_docs {
+                    let suffix = rng.random_range(0u64..1000000u64);
+                    let str_val = format!("str_{:07}", suffix);
+
+                    writer
+                        .add_document(doc!(
+                            f_str_fast=>str_val.clone(),
+                            f_str_stored=>str_val,
+                        ))
+                        .unwrap();
+                }
+            }
+            "sparse_sequential" => {
+                for doc_id in 0..num_docs {
+                    let suffix = doc_id as u64;
+                    let str_val = format!("str_{:07}", suffix);
+
+                    writer
+                        .add_document(doc!(
+                            f_str_fast=>str_val.clone(),
+                            f_str_stored=>str_val,
+                        ))
+                        .unwrap();
+                }
+            }
+            _ => {
+                panic!("Unsupported distribution type");
+            }
+        }
+        writer.commit().unwrap();
+    }
+
+    // Prepare reader/searcher once.
+    let reader = index
+        .reader_builder()
+        .reload_policy(ReloadPolicy::Manual)
+        .try_into()
+        .unwrap();
+    let searcher = reader.searcher();
+
+    BenchIndex { index, searcher }
+}
+
+fn main() {
+    // Prepare corpora with varying scenarios
+    let scenarios = vec![
+        (
+            "dense_random_search_low_range".to_string(),
+            1_000_000,
+            "dense_random",
+            0,
+            9,
+        ),
+        (
+            "dense_random_search_high_range".to_string(),
+            1_000_000,
+            "dense_random",
+            990,
+            999,
+        ),
+        (
+            "dense_sequential_search_low_range".to_string(),
+            1_000_000,
+            "dense_sequential",
+            0,
+            9,
+        ),
+        (
+            "dense_sequential_search_high_range".to_string(),
+            1_000_000,
+            "dense_sequential",
+            990,
+            999,
+        ),
+        (
+            "sparse_random_search_low_range".to_string(),
+            1_000_000,
+            "sparse_random",
+            0,
+            9999,
+        ),
+        (
+            "sparse_random_search_high_range".to_string(),
+            1_000_000,
+            "sparse_random",
+            990_000,
+            999_999,
+        ),
+        (
+            "sparse_sequential_search_low_range".to_string(),
+            1_000_000,
+            "sparse_sequential",
+            0,
+            9999,
+        ),
+        (
+            "sparse_sequential_search_high_range".to_string(),
+            1_000_000,
+            "sparse_sequential",
+            990_000,
+            999_999,
+        ),
+    ];
+
+    let mut runner = BenchRunner::new();
+    for (scenario_id, n, distribution, range_low, range_high) in scenarios {
+        let bench_index = build_shared_indices(n, distribution);
+        let mut group = runner.new_group();
+        group.set_name(scenario_id);
+
+        let field = bench_index.searcher.schema().get_field("str_fast").unwrap();
+
+        let (lower_str, upper_str) =
+            if distribution == "dense_sequential" || distribution == "dense_random" {
+                (
+                    format!("str_{:03}", range_low),
+                    format!("str_{:03}", range_high),
+                )
+            } else {
+                (
+                    format!("str_{:07}", range_low),
+                    format!("str_{:07}", range_high),
+                )
+            };
+
+        let lower_term = Term::from_field_text(field, &lower_str);
+        let upper_term = Term::from_field_text(field, &upper_str);
+
+        let query = RangeQuery::new(Bound::Included(lower_term), Bound::Included(upper_term));
+
+        run_benchmark_tasks(&mut group, &bench_index, query, range_low, range_high);
+
+        group.run();
+    }
+}
+
+/// Run all benchmark tasks for a given range query
+fn run_benchmark_tasks(
+    bench_group: &mut BenchGroup,
+    bench_index: &BenchIndex,
+    query: RangeQuery,
+    range_low: u64,
+    range_high: u64,
+) {
+    // Test count of matching documents
+    add_bench_task_count(
+        bench_group,
+        bench_index,
+        query.clone(),
+        range_low,
+        range_high,
+    );
+
+    // Test fetching all DocIds of matching documents
+    add_bench_task_docset(
+        bench_group,
+        bench_index,
+        query.clone(),
+        range_low,
+        range_high,
+    );
+
+    // Test fetching all string fast field values of matching documents
+    add_bench_task_fetch_all_strings(
+        bench_group,
+        bench_index,
+        query.clone(),
+        range_low,
+        range_high,
+    );
+
+    // Test fetching all string values of matching documents through doc() method
+    add_bench_task_fetch_all_strings_from_doc(
+        bench_group,
+        bench_index,
+        query,
+        range_low,
+        range_high,
+    );
+}
+
+fn add_bench_task_count(
+    bench_group: &mut BenchGroup,
+    bench_index: &BenchIndex,
+    query: RangeQuery,
+    range_low: u64,
+    range_high: u64,
+) {
+    let task_name = format!("string_search_count_[{}-{}]", range_low, range_high);
+
+    let search_task = CountSearchTask {
+        searcher: bench_index.searcher.clone(),
+        query,
+    };
+    bench_group.register(task_name, move |_| black_box(search_task.run()));
+}
+
+fn add_bench_task_docset(
+    bench_group: &mut BenchGroup,
+    bench_index: &BenchIndex,
+    query: RangeQuery,
+    range_low: u64,
+    range_high: u64,
+) {
+    let task_name = format!("string_fetch_all_docset_[{}-{}]", range_low, range_high);
+
+    let search_task = DocSetSearchTask {
+        searcher: bench_index.searcher.clone(),
+        query,
+    };
+    bench_group.register(task_name, move |_| black_box(search_task.run()));
+}
+
+fn add_bench_task_fetch_all_strings(
+    bench_group: &mut BenchGroup,
+    bench_index: &BenchIndex,
+    query: RangeQuery,
+    range_low: u64,
+    range_high: u64,
+) {
+    let task_name = format!(
+        "string_fastfield_fetch_all_strings_[{}-{}]",
+        range_low, range_high
+    );
+
+    let search_task = FetchAllStringsSearchTask {
+        searcher: bench_index.searcher.clone(),
+        query,
+    };
+
+    bench_group.register(task_name, move |_| {
+        let result = black_box(search_task.run());
+        result.len()
+    });
+}
+
+fn add_bench_task_fetch_all_strings_from_doc(
+    bench_group: &mut BenchGroup,
+    bench_index: &BenchIndex,
+    query: RangeQuery,
+    range_low: u64,
+    range_high: u64,
+) {
+    let task_name = format!(
+        "string_doc_fetch_all_strings_[{}-{}]",
+        range_low, range_high
+    );
+
+    let search_task = FetchAllStringsFromDocTask {
+        searcher: bench_index.searcher.clone(),
+        query,
+    };
+
+    bench_group.register(task_name, move |_| {
+        let result = black_box(search_task.run());
+        result.len()
+    });
+}
+
+struct CountSearchTask {
+    searcher: Searcher,
+    query: RangeQuery,
+}
+
+impl CountSearchTask {
+    #[inline(never)]
+    pub fn run(&self) -> usize {
+        self.searcher.search(&self.query, &Count).unwrap()
+    }
+}
+
+struct DocSetSearchTask {
+    searcher: Searcher,
+    query: RangeQuery,
+}
+
+impl DocSetSearchTask {
+    #[inline(never)]
+    pub fn run(&self) -> usize {
+        let result = self.searcher.search(&self.query, &DocSetCollector).unwrap();
+        result.len()
+    }
+}
+
+struct FetchAllStringsSearchTask {
+    searcher: Searcher,
+    query: RangeQuery,
+}
+
+impl FetchAllStringsSearchTask {
+    #[inline(never)]
+    pub fn run(&self) -> Vec<String> {
+        let doc_addresses = self.searcher.search(&self.query, &DocSetCollector).unwrap();
+        let mut docs = doc_addresses.into_iter().collect::<Vec<_>>();
+        docs.sort();
+        let mut strings = Vec::with_capacity(docs.len());
+
+        for doc_address in docs {
+            let segment_reader = &self.searcher.segment_readers()[doc_address.segment_ord as usize];
+            let str_column_opt = segment_reader.fast_fields().str("str_fast");
+
+            if let Ok(Some(str_column)) = str_column_opt {
+                let doc_id = doc_address.doc_id;
+                let term_ord = str_column.term_ords(doc_id).next().unwrap();
+                let mut str_buffer = String::new();
+                if str_column.ord_to_str(term_ord, &mut str_buffer).is_ok() {
+                    strings.push(str_buffer);
+                }
+            }
+        }
+
+        strings
+    }
+}
+
+struct FetchAllStringsFromDocTask {
+    searcher: Searcher,
+    query: RangeQuery,
+}
+
+impl FetchAllStringsFromDocTask {
+    #[inline(never)]
+    pub fn run(&self) -> Vec<String> {
+        let doc_addresses = self.searcher.search(&self.query, &DocSetCollector).unwrap();
+        let mut docs = doc_addresses.into_iter().collect::<Vec<_>>();
+        docs.sort();
+        let mut strings = Vec::with_capacity(docs.len());
+
+        let str_stored_field = self
+            .searcher
+            .schema()
+            .get_field("str_stored")
+            .expect("str_stored field should exist");
+
+        for doc_address in docs {
+            // Get the document from the doc store (row store access)
+            if let Ok(doc) = self.searcher.doc(doc_address) {
+                // Extract string values from the stored field
+                if let Some(field_value) = doc.get_first(str_stored_field) {
+                    if let Some(text) = field_value.as_value().as_str() {
+                        strings.push(text.to_string());
+                    }
+                }
+            }
+        }
+
+        strings
+    }
+}
--- a/bitpacker/Cargo.toml
+++ b/bitpacker/Cargo.toml
@@ -18,5 +18,5 @@ homepage = "https://github.com/quickwit-oss/tantivy"
 bitpacking = { version = "0.9.2", default-features = false, features = ["bitpacker1x"] }

 [dev-dependencies]
-rand = "0.8"
+rand = "0.9"
 proptest = "1"
--- a/bitpacker/benches/bench.rs
+++ b/bitpacker/benches/bench.rs
@@ -4,8 +4,8 @@ extern crate test;

 #[cfg(test)]
 mod tests {
+    use rand::rng;
    use rand::seq::IteratorRandom;
-    use rand::thread_rng;
    use tantivy_bitpacker::{BitPacker, BitUnpacker, BlockedBitpacker};
    use test::Bencher;

@@ -27,7 +27,7 @@ mod tests {
        let num_els = 1_000_000u32;
        let bit_unpacker = BitUnpacker::new(bit_width);
        let data = create_bitpacked_data(bit_width, num_els);
-        let idxs: Vec<u32> = (0..num_els).choose_multiple(&mut thread_rng(), 100_000);
+        let idxs: Vec<u32> = (0..num_els).choose_multiple(&mut rng(), 100_000);
        b.iter(|| {
            let mut out = 0u64;
            for &idx in &idxs {
--- a/bitpacker/src/bitpacker.rs
+++ b/bitpacker/src/bitpacker.rs
@@ -258,7 +258,7 @@ mod test {
            bitpacker.write(val, num_bits, &mut data).unwrap();
        }
        bitpacker.close(&mut data).unwrap();
-        assert_eq!(data.len(), ((num_bits as usize) * len + 7) / 8);
+        assert_eq!(data.len(), ((num_bits as usize) * len).div_ceil(8));
        let bitunpacker = BitUnpacker::new(num_bits);
        (bitunpacker, vals, data)
    }
@@ -304,7 +304,7 @@ mod test {
            bitpacker.write(val, num_bits, &mut buffer).unwrap();
        }
        bitpacker.flush(&mut buffer).unwrap();
-        assert_eq!(buffer.len(), (vals.len() * num_bits as usize + 7) / 8);
+        assert_eq!(buffer.len(), (vals.len() * num_bits as usize).div_ceil(8));
        let bitunpacker = BitUnpacker::new(num_bits);
        let max_val = if num_bits == 64 {
            u64::MAX
--- a/bitpacker/src/filter_vec/avx2.rs
+++ b/bitpacker/src/filter_vec/avx2.rs
@@ -19,7 +19,7 @@ fn u32_to_i32(val: u32) -> i32 {
 #[inline]
 unsafe fn u32_to_i32_avx2(vals_u32x8s: DataType) -> DataType {
    const HIGHEST_BIT_MASK: DataType = from_u32x8([HIGHEST_BIT; NUM_LANES]);
-    op_xor(vals_u32x8s, HIGHEST_BIT_MASK)
+    unsafe { op_xor(vals_u32x8s, HIGHEST_BIT_MASK) }
 }

 pub fn filter_vec_in_place(range: RangeInclusive<u32>, offset: u32, output: &mut Vec<u32>) {
@@ -66,17 +66,19 @@ unsafe fn filter_vec_avx2_aux(
    ]);
    const SHIFT: __m256i = from_u32x8([NUM_LANES as u32; NUM_LANES]);
    for _ in 0..num_words {
-        let word = load_unaligned(input);
-        let word = u32_to_i32_avx2(word);
-        let keeper_bitset = compute_filter_bitset(word, range_simd.clone());
-        let added_len = keeper_bitset.count_ones();
-        let filtered_doc_ids = compact(ids, keeper_bitset);
-        store_unaligned(output_tail as *mut __m256i, filtered_doc_ids);
-        output_tail = output_tail.offset(added_len as isize);
-        ids = op_add(ids, SHIFT);
-        input = input.offset(1);
+        unsafe {
+            let word = load_unaligned(input);
+            let word = u32_to_i32_avx2(word);
+            let keeper_bitset = compute_filter_bitset(word, range_simd.clone());
+            let added_len = keeper_bitset.count_ones();
+            let filtered_doc_ids = compact(ids, keeper_bitset);
+            store_unaligned(output_tail as *mut __m256i, filtered_doc_ids);
+            output_tail = output_tail.offset(added_len as isize);
+            ids = op_add(ids, SHIFT);
+            input = input.offset(1);
+        }
    }
-    output_tail.offset_from(output) as usize
+    unsafe { output_tail.offset_from(output) as usize }
 }

 #[inline]
@@ -92,8 +94,7 @@ unsafe fn compute_filter_bitset(val: __m256i, range: std::ops::RangeInclusive<__
    let too_low = op_greater(*range.start(), val);
    let too_high = op_greater(val, *range.end());
    let inside = op_or(too_low, too_high);
-    255 - std::arch::x86_64::_mm256_movemask_ps(std::mem::transmute::<DataType, __m256>(inside))
-        as u8
+    255 - std::arch::x86_64::_mm256_movemask_ps(_mm256_castsi256_ps(inside)) as u8
 }

 union U8x32 {
--- a/columnar/Cargo.toml
+++ b/columnar/Cargo.toml
@@ -22,7 +22,7 @@ downcast-rs = "2.0.1"
 [dev-dependencies]
 proptest = "1"
 more-asserts = "0.3.1"
-rand = "0.8"
+rand = "0.9"
 binggan = "0.14.0"

 [[bench]]
--- a/columnar/README.md
+++ b/columnar/README.md
@@ -73,7 +73,7 @@ The crate introduces the following concepts.
 `Columnar` is an equivalent of a dataframe.
 It maps `column_key` to `Column`.

-A `Column<T>` asssociates a `RowId` (u32) to any
+A `Column<T>` associates a `RowId` (u32) to any
 number of values.

 This is made possible by wrapping a `ColumnIndex` and a `ColumnValue` object.
--- a/columnar/benches/bench_column_values_get.rs
+++ b/columnar/benches/bench_column_values_get.rs
@@ -9,7 +9,7 @@ use tantivy_columnar::column_values::{CodecType, serialize_and_load_u64_based_co
 fn get_data() -> Vec<u64> {
    let mut rng = StdRng::seed_from_u64(2u64);
    let mut data: Vec<_> = (100..55_000_u64)
-        .map(|num| num + rng.r#gen::<u8>() as u64)
+        .map(|num| num + rng.random::<u8>() as u64)
        .collect();
    data.push(99_000);
    data.insert(1000, 2000);
--- a/columnar/benches/bench_create_column_values.rs
+++ b/columnar/benches/bench_create_column_values.rs
@@ -6,7 +6,7 @@ use tantivy_columnar::column_values::{CodecType, serialize_u64_based_column_valu
 fn get_data() -> Vec<u64> {
    let mut rng = StdRng::seed_from_u64(2u64);
    let mut data: Vec<_> = (100..55_000_u64)
-        .map(|num| num + rng.r#gen::<u8>() as u64)
+        .map(|num| num + rng.random::<u8>() as u64)
        .collect();
    data.push(99_000);
    data.insert(1000, 2000);
--- a/columnar/benches/bench_first_vals.rs
+++ b/columnar/benches/bench_first_vals.rs
@@ -89,13 +89,6 @@ fn main() {
        black_box(sum);
    });

-    group.register("first_block_fetch", |column| {
-        let mut block: Vec<Option<u64>> = vec![None; 64];
-        let fetch_docids = (0..64).collect::<Vec<_>>();
-        column.first_vals(&fetch_docids, &mut block);
-        black_box(block[0]);
-    });
-
    group.register("first_block_single_calls", |column| {
        let mut block: Vec<Option<u64>> = vec![None; 64];
        let fetch_docids = (0..64).collect::<Vec<_>>();
--- a/columnar/benches/bench_optional_index.rs
+++ b/columnar/benches/bench_optional_index.rs
@@ -8,7 +8,7 @@ const TOTAL_NUM_VALUES: u32 = 1_000_000;
 fn gen_optional_index(fill_ratio: f64) -> OptionalIndex {
    let mut rng: StdRng = StdRng::from_seed([1u8; 32]);
    let vals: Vec<u32> = (0..TOTAL_NUM_VALUES)
-        .map(|_| rng.gen_bool(fill_ratio))
+        .map(|_| rng.random_bool(fill_ratio))
        .enumerate()
        .filter(|(_pos, val)| *val)
        .map(|(pos, _)| pos as u32)
@@ -25,7 +25,7 @@ fn random_range_iterator(
    let mut rng: StdRng = StdRng::from_seed([1u8; 32]);
    let mut current = start;
    std::iter::from_fn(move || {
-        current += rng.gen_range(avg_step_size - avg_deviation..=avg_step_size + avg_deviation);
+        current += rng.random_range(avg_step_size - avg_deviation..=avg_step_size + avg_deviation);
        if current >= end { None } else { Some(current) }
    })
 }
--- a/columnar/benches/bench_values_u128.rs
+++ b/columnar/benches/bench_values_u128.rs
@@ -39,7 +39,7 @@ fn get_data_50percent_item() -> Vec<u128> {

    let mut data = vec![];
    for _ in 0..300_000 {
-        let val = rng.gen_range(1..=100);
+        let val = rng.random_range(1..=100);
        data.push(val);
    }
    data.push(SINGLE_ITEM);
--- a/columnar/benches/bench_values_u64.rs
+++ b/columnar/benches/bench_values_u64.rs
@@ -34,7 +34,7 @@ fn get_data_50percent_item() -> Vec<u128> {

    let mut data = vec![];
    for _ in 0..300_000 {
-        let val = rng.gen_range(1..=100);
+        let val = rng.random_range(1..=100);
        data.push(val);
    }
    data.push(SINGLE_ITEM);
--- a/columnar/src/block_accessor.rs
+++ b/columnar/src/block_accessor.rs
@@ -29,12 +29,20 @@ impl<T: PartialOrd + Copy + std::fmt::Debug + Send + Sync + 'static + Default>
        }
    }
    #[inline]
-    pub fn fetch_block_with_missing(&mut self, docs: &[u32], accessor: &Column<T>, missing: T) {
+    pub fn fetch_block_with_missing(
+        &mut self,
+        docs: &[u32],
+        accessor: &Column<T>,
+        missing: Option<T>,
+    ) {
        self.fetch_block(docs, accessor);
        // no missing values
        if accessor.index.get_cardinality().is_full() {
            return;
        }
+        let Some(missing) = missing else {
+            return;
+        };

        // We can compare docid_cache length with docs to find missing docs
        // For multi value columns we can't rely on the length and always need to scan
--- a/columnar/src/column/mod.rs
+++ b/columnar/src/column/mod.rs
@@ -85,8 +85,8 @@ impl<T: PartialOrd + Copy + Debug + Send + Sync + 'static> Column<T> {
    }

    #[inline]
-    pub fn first(&self, row_id: RowId) -> Option<T> {
-        self.values_for_doc(row_id).next()
+    pub fn first(&self, doc_id: DocId) -> Option<T> {
+        self.values_for_doc(doc_id).next()
    }

    /// Load the first value for each docid in the provided slice.
@@ -131,6 +131,8 @@ impl<T: PartialOrd + Copy + Debug + Send + Sync + 'static> Column<T> {
        self.index.docids_to_rowids(doc_ids, doc_ids_out, row_ids)
    }

+    /// Get an iterator over the values for the provided docid.
+    #[inline]
    pub fn values_for_doc(&self, doc_id: DocId) -> impl Iterator<Item = T> + '_ {
        self.index
            .value_row_ids(doc_id)
@@ -158,15 +160,6 @@ impl<T: PartialOrd + Copy + Debug + Send + Sync + 'static> Column<T> {
            .select_batch_in_place(selected_docid_range.start, doc_ids);
    }

-    /// Fills the output vector with the (possibly multiple values that are associated_with
-    /// `row_id`.
-    ///
-    /// This method clears the `output` vector.
-    pub fn fill_vals(&self, row_id: RowId, output: &mut Vec<T>) {
-        output.clear();
-        output.extend(self.values_for_doc(row_id));
-    }
-
    pub fn first_or_default_col(self, default_value: T) -> Arc<dyn ColumnValues<T>> {
        Arc::new(FirstValueWithDefault {
            column: self,
--- a/columnar/src/column_values/monotonic_mapping_u128.rs
+++ b/columnar/src/column_values/monotonic_mapping_u128.rs
@@ -1,7 +1,7 @@
 use std::fmt::Debug;
 use std::net::Ipv6Addr;

-/// Montonic maps a value to u128 value space
+/// Monotonic maps a value to u128 value space
 /// Monotonic mapping enables `PartialOrd` on u128 space without conversion to original space.
 pub trait MonotonicallyMappableToU128: 'static + PartialOrd + Copy + Debug + Send + Sync {
    /// Converts a value to u128.
--- a/columnar/src/column_values/u64_based/bitpacked.rs
+++ b/columnar/src/column_values/u64_based/bitpacked.rs
@@ -41,12 +41,6 @@ fn transform_range_before_linear_transformation(
    if range.is_empty() {
        return None;
    }
-    if stats.min_value > *range.end() {
-        return None;
-    }
-    if stats.max_value < *range.start() {
-        return None;
-    }
    let shifted_range =
        range.start().saturating_sub(stats.min_value)..=range.end().saturating_sub(stats.min_value);
    let start_before_gcd_multiplication: u64 = div_ceil(*shifted_range.start(), stats.gcd);
--- a/columnar/src/column_values/u64_based/line.rs
+++ b/columnar/src/column_values/u64_based/line.rs
@@ -8,7 +8,7 @@ use crate::column_values::ColumnValues;
 const MID_POINT: u64 = (1u64 << 32) - 1u64;

 /// `Line` describes a line function `y: ax + b` using integer
-/// arithmetics.
+/// arithmetic.
 ///
 /// The slope is in fact a decimal split into a 32 bit integer value,
 /// and a 32-bit decimal value.
@@ -94,7 +94,7 @@ impl Line {
        // `(i, ys[])`.
        //
        // The best intercept therefore has the form
-        // `y[i] - line.eval(i)` (using wrapping arithmetics).
+        // `y[i] - line.eval(i)` (using wrapping arithmetic).
        // In other words, the best intercept is one of the `y - Line::eval(ys[i])`
        // and our task is just to pick the one that minimizes our error.
        //
--- a/columnar/src/column_values/u64_based/linear.rs
+++ b/columnar/src/column_values/u64_based/linear.rs
@@ -268,7 +268,7 @@ mod tests {

    #[test]
    fn linear_interpol_fast_field_rand() {
-        let mut rng = rand::thread_rng();
+        let mut rng = rand::rng();
        for _ in 0..50 {
            let mut data = (0..10_000).map(|_| rng.next_u64()).collect::<Vec<_>>();
            create_and_validate::<LinearCodec>(&data, "random");
--- a/columnar/src/column_values/u64_based/mod.rs
+++ b/columnar/src/column_values/u64_based/mod.rs
@@ -52,7 +52,7 @@ pub trait ColumnCodecEstimator<T = u64>: 'static {
    ) -> io::Result<()>;
 }

-/// A column codec describes a colunm serialization format.
+/// A column codec describes a column serialization format.
 pub trait ColumnCodec<T: PartialOrd = u64> {
    /// Specialized `ColumnValues` type.
    type ColumnValues: ColumnValues<T> + 'static;
--- a/columnar/src/column_values/u64_based/tests.rs
+++ b/columnar/src/column_values/u64_based/tests.rs
@@ -122,7 +122,7 @@ pub(crate) fn create_and_validate<TColumnCodec: ColumnCodec>(
    assert_eq!(vals, buffer);

    if !vals.is_empty() {
-        let test_rand_idx = rand::thread_rng().gen_range(0..=vals.len() - 1);
+        let test_rand_idx = rand::rng().random_range(0..=vals.len() - 1);
        let expected_positions: Vec<u32> = vals
            .iter()
            .enumerate()
--- a/columnar/src/dynamic_column.rs
+++ b/columnar/src/dynamic_column.rs
@@ -3,7 +3,8 @@ use std::sync::Arc;
 use std::{fmt, io};

 use common::file_slice::FileSlice;
-use common::{ByteCount, DateTime, HasLen, OwnedBytes};
+use common::{ByteCount, DateTime, OwnedBytes};
+use serde::{Deserialize, Serialize};

 use crate::column::{BytesColumn, Column, StrColumn};
 use crate::column_values::{StrictlyMonotonicFn, monotonic_map_column};
@@ -317,10 +318,89 @@ impl DynamicColumnHandle {
    }

    pub fn num_bytes(&self) -> ByteCount {
-        self.file_slice.len().into()
+        self.file_slice.num_bytes()
+    }
+
+    /// Legacy helper returning the column space usage.
+    pub fn column_and_dictionary_num_bytes(&self) -> io::Result<ColumnSpaceUsage> {
+        self.space_usage()
+    }
+
+    /// Return the space usage of the column, optionally broken down by dictionary and column
+    /// values.
+    ///
+    /// For dictionary encoded columns (strings and bytes), this splits the total footprint into
+    /// the dictionary and the remaining column data (including index and values).
+    /// For all other column types, the dictionary size is `None` and the column size
+    /// equals the total bytes.
+    pub fn space_usage(&self) -> io::Result<ColumnSpaceUsage> {
+        let total_num_bytes = self.num_bytes();
+        let dynamic_column = self.open()?;
+        let dictionary_num_bytes = match &dynamic_column {
+            DynamicColumn::Bytes(bytes_column) => bytes_column.dictionary().num_bytes(),
+            DynamicColumn::Str(str_column) => str_column.dictionary().num_bytes(),
+            _ => {
+                return Ok(ColumnSpaceUsage::new(self.num_bytes(), None));
+            }
+        };
+        assert!(dictionary_num_bytes <= total_num_bytes);
+        let column_num_bytes =
+            ByteCount::from(total_num_bytes.get_bytes() - dictionary_num_bytes.get_bytes());
+        Ok(ColumnSpaceUsage::new(
+            column_num_bytes,
+            Some(dictionary_num_bytes),
+        ))
    }

    pub fn column_type(&self) -> ColumnType {
        self.column_type
    }
 }
+
+/// Represents space usage of a column.
+///
+/// `column_num_bytes` tracks the column payload (index, values and footer).
+/// For dictionary encoded columns, `dictionary_num_bytes` captures the dictionary footprint.
+/// [`ColumnSpaceUsage::total_num_bytes`] returns the sum of both parts.
+#[derive(Clone, Debug, Serialize, Deserialize)]
+pub struct ColumnSpaceUsage {
+    column_num_bytes: ByteCount,
+    dictionary_num_bytes: Option<ByteCount>,
+}
+
+impl ColumnSpaceUsage {
+    pub(crate) fn new(
+        column_num_bytes: ByteCount,
+        dictionary_num_bytes: Option<ByteCount>,
+    ) -> Self {
+        ColumnSpaceUsage {
+            column_num_bytes,
+            dictionary_num_bytes,
+        }
+    }
+
+    pub fn column_num_bytes(&self) -> ByteCount {
+        self.column_num_bytes
+    }
+
+    pub fn dictionary_num_bytes(&self) -> Option<ByteCount> {
+        self.dictionary_num_bytes
+    }
+
+    pub fn total_num_bytes(&self) -> ByteCount {
+        self.column_num_bytes + self.dictionary_num_bytes.unwrap_or_default()
+    }
+
+    /// Merge two space usage values by summing their components.
+    pub fn merge(&self, other: &ColumnSpaceUsage) -> ColumnSpaceUsage {
+        let dictionary_num_bytes = match (self.dictionary_num_bytes, other.dictionary_num_bytes) {
+            (Some(lhs), Some(rhs)) => Some(lhs + rhs),
+            (Some(val), None) | (None, Some(val)) => Some(val),
+            (None, None) => None,
+        };
+        ColumnSpaceUsage {
+            column_num_bytes: self.column_num_bytes + other.column_num_bytes,
+            dictionary_num_bytes,
+        }
+    }
+}
--- a/columnar/src/lib.rs
+++ b/columnar/src/lib.rs
@@ -48,7 +48,7 @@ pub use columnar::{
 use sstable::VoidSSTable;
 pub use value::{NumericalType, NumericalValue};

-pub use self::dynamic_column::{DynamicColumn, DynamicColumnHandle};
+pub use self::dynamic_column::{ColumnSpaceUsage, DynamicColumn, DynamicColumnHandle};

 pub type RowId = u32;
 pub type DocId = u32;
--- a/columnar/src/tests.rs
+++ b/columnar/src/tests.rs
@@ -60,7 +60,7 @@ fn test_dataframe_writer_bool() {
    let DynamicColumn::Bool(bool_col) = dyn_bool_col else {
        panic!();
    };
-    let vals: Vec<Option<bool>> = (0..5).map(|row_id| bool_col.first(row_id)).collect();
+    let vals: Vec<Option<bool>> = (0..5).map(|doc_id| bool_col.first(doc_id)).collect();
    assert_eq!(&vals, &[None, Some(false), None, Some(true), None,]);
 }

@@ -108,7 +108,7 @@ fn test_dataframe_writer_ip_addr() {
    let DynamicColumn::IpAddr(ip_col) = dyn_bool_col else {
        panic!();
    };
-    let vals: Vec<Option<Ipv6Addr>> = (0..5).map(|row_id| ip_col.first(row_id)).collect();
+    let vals: Vec<Option<Ipv6Addr>> = (0..5).map(|doc_id| ip_col.first(doc_id)).collect();
    assert_eq!(
        &vals,
        &[
@@ -169,7 +169,7 @@ fn test_dictionary_encoded_str() {
    let DynamicColumn::Str(str_col) = col_handles[0].open().unwrap() else {
        panic!();
    };
-    let index: Vec<Option<u64>> = (0..5).map(|row_id| str_col.ords().first(row_id)).collect();
+    let index: Vec<Option<u64>> = (0..5).map(|doc_id| str_col.ords().first(doc_id)).collect();
    assert_eq!(index, &[None, Some(0), None, Some(2), Some(1)]);
    assert_eq!(str_col.num_rows(), 5);
    let mut term_buffer = String::new();
@@ -204,7 +204,7 @@ fn test_dictionary_encoded_bytes() {
        panic!();
    };
    let index: Vec<Option<u64>> = (0..5)
-        .map(|row_id| bytes_col.ords().first(row_id))
+        .map(|doc_id| bytes_col.ords().first(doc_id))
        .collect();
    assert_eq!(index, &[None, Some(0), None, Some(2), Some(1)]);
    assert_eq!(bytes_col.num_rows(), 5);
--- a/common/Cargo.toml
+++ b/common/Cargo.toml
@@ -21,5 +21,5 @@ serde = { version = "1.0.136", features = ["derive"] }
 [dev-dependencies]
 binggan = "0.14.0"
 proptest = "1.0.0"
-rand = "0.8.4"
+rand = "0.9"

--- a/common/benches/bench.rs
+++ b/common/benches/bench.rs
@@ -1,6 +1,6 @@
 use binggan::{BenchRunner, black_box};
+use rand::rng;
 use rand::seq::IteratorRandom;
-use rand::thread_rng;
 use tantivy_common::{BitSet, TinySet, serialize_vint_u32};

 fn bench_vint() {
@@ -17,7 +17,7 @@ fn bench_vint() {
        black_box(out);
    });

-    let vals: Vec<u32> = (0..20_000).choose_multiple(&mut thread_rng(), 100_000);
+    let vals: Vec<u32> = (0..20_000).choose_multiple(&mut rng(), 100_000);
    runner.bench_function("bench_vint_rand", move |_| {
        let mut out = 0u64;
        for val in vals.iter().cloned() {
--- a/common/src/bitset.rs
+++ b/common/src/bitset.rs
@@ -178,9 +178,15 @@ impl TinySet {
 #[derive(Clone)]
 pub struct BitSet {
    tinysets: Box<[TinySet]>,
-    len: u64,
    max_value: u32,
 }
+impl std::fmt::Debug for BitSet {
+    fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
+        f.debug_struct("BitSet")
+            .field("max_value", &self.max_value)
+            .finish()
+    }
+}

 fn num_buckets(max_val: u32) -> u32 {
    max_val.div_ceil(64u32)
@@ -204,7 +210,6 @@ impl BitSet {
        let tinybitsets = vec![TinySet::empty(); num_buckets as usize].into_boxed_slice();
        BitSet {
            tinysets: tinybitsets,
-            len: 0,
            max_value,
        }
    }
@@ -222,7 +227,6 @@ impl BitSet {
        }
        BitSet {
            tinysets: tinybitsets,
-            len: max_value as u64,
            max_value,
        }
    }
@@ -241,17 +245,19 @@ impl BitSet {

    /// Intersect with tinysets
    fn intersect_update_with_iter(&mut self, other: impl Iterator<Item = TinySet>) {
-        self.len = 0;
        for (left, right) in self.tinysets.iter_mut().zip(other) {
            *left = left.intersect(right);
-            self.len += left.len() as u64;
        }
    }

    /// Returns the number of elements in the `BitSet`.
    #[inline]
    pub fn len(&self) -> usize {
-        self.len as usize
+        self.tinysets
+            .iter()
+            .copied()
+            .map(|tinyset| tinyset.len())
+            .sum::<u32>() as usize
    }

    /// Inserts an element in the `BitSet`
@@ -260,7 +266,7 @@ impl BitSet {
        // we do not check saturated els.
        let higher = el / 64u32;
        let lower = el % 64u32;
-        self.len += u64::from(self.tinysets[higher as usize].insert_mut(lower));
+        self.tinysets[higher as usize].insert_mut(lower);
    }

    /// Inserts an element in the `BitSet`
@@ -269,7 +275,7 @@ impl BitSet {
        // we do not check saturated els.
        let higher = el / 64u32;
        let lower = el % 64u32;
-        self.len -= u64::from(self.tinysets[higher as usize].remove_mut(lower));
+        self.tinysets[higher as usize].remove_mut(lower);
    }

    /// Returns true iff the elements is in the `BitSet`.
@@ -291,6 +297,9 @@ impl BitSet {
            .map(|delta_bucket| bucket + delta_bucket as u32)
    }

+    /// Returns the maximum number of elements in the bitset.
+    ///
+    /// Warning: The largest element the bitset can contain is `max_value - 1`.
    #[inline]
    pub fn max_value(&self) -> u32 {
        self.max_value
@@ -408,7 +417,7 @@ mod tests {
    use std::collections::HashSet;

    use ownedbytes::OwnedBytes;
-    use rand::distributions::Bernoulli;
+    use rand::distr::Bernoulli;
    use rand::rngs::StdRng;
    use rand::{Rng, SeedableRng};

--- a/common/src/vint.rs
+++ b/common/src/vint.rs
@@ -28,7 +28,9 @@ impl BinarySerializable for VIntU128 {
        writer.write_all(&buffer)
    }

+    #[allow(clippy::unbuffered_bytes)]
    fn deserialize<R: Read>(reader: &mut R) -> io::Result<Self> {
+        #[allow(clippy::unbuffered_bytes)]
        let mut bytes = reader.bytes();
        let mut result = 0u128;
        let mut shift = 0u64;
@@ -195,7 +197,9 @@ impl BinarySerializable for VInt {
        writer.write_all(&buffer[0..num_bytes])
    }

+    #[allow(clippy::unbuffered_bytes)]
    fn deserialize<R: Read>(reader: &mut R) -> io::Result<Self> {
+        #[allow(clippy::unbuffered_bytes)]
        let mut bytes = reader.bytes();
        let mut result = 0u64;
        let mut shift = 0u64;
--- a/common/src/writer.rs
+++ b/common/src/writer.rs
@@ -62,7 +62,9 @@ impl<W: TerminatingWrite> TerminatingWrite for CountingWriter<W> {
 pub struct AntiCallToken(());

 /// Trait used to indicate when no more write need to be done on a writer
-pub trait TerminatingWrite: Write + Send + Sync {
+///
+/// Thread-safety is enforced at the call sites that require it.
+pub trait TerminatingWrite: Write {
    /// Indicate that the writer will no longer be used. Internally call terminate_ref.
    fn terminate(mut self) -> io::Result<()>
    where Self: Sized {
--- a/doc/src/json.md
+++ b/doc/src/json.md
@@ -60,7 +60,7 @@ At indexing, tantivy will try to interpret number and strings as different type
 priority order.

 Numbers will be interpreted as u64, i64 and f64 in that order.
-Strings will be interpreted as rfc3999 dates or simple strings.
+Strings will be interpreted as rfc3339 dates or simple strings.

 The first working type is picked and is the only term that is emitted for indexing.
 Note this interpretation happens on a per-document basis, and there is no effort to try to sniff
@@ -81,7 +81,7 @@ Will be interpreted as
 (my_path.my_segment, String, 233) or (my_path.my_segment, u64, 233)
 ```

-Likewise, we need to emit two tokens if the query contains an rfc3999 date.
+Likewise, we need to emit two tokens if the query contains an rfc3339 date.
 Indeed the date could have been actually a single token inside the text of a document at ingestion time. Generally speaking, we will always at least emit a string token in query parsing, and sometimes more.

 If one more json field is defined, things get even more complicated.
--- a/examples/basic_search.rs
+++ b/examples/basic_search.rs
@@ -208,7 +208,7 @@ fn main() -> tantivy::Result<()> {
    // is the role of the `TopDocs` collector.

    // We can now perform our query.
-    let top_docs = searcher.search(&query, &TopDocs::with_limit(10))?;
+    let top_docs = searcher.search(&query, &TopDocs::with_limit(10).order_by_score())?;

    // The actual documents still need to be
    // retrieved from Tantivy's store.
@@ -226,7 +226,7 @@ fn main() -> tantivy::Result<()> {
    let query = query_parser.parse_query("title:sea^20 body:whale^70")?;

    let (_score, doc_address) = searcher
-        .search(&query, &TopDocs::with_limit(1))?
+        .search(&query, &TopDocs::with_limit(1).order_by_score())?
        .into_iter()
        .next()
        .unwrap();
--- a/examples/custom_collector.rs
+++ b/examples/custom_collector.rs
@@ -70,7 +70,7 @@ impl Collector for StatsCollector {
    fn for_segment(
        &self,
        _segment_local_id: u32,
-        segment_reader: &SegmentReader,
+        segment_reader: &dyn SegmentReader,
    ) -> tantivy::Result<StatsSegmentCollector> {
        let fast_field_reader = segment_reader.fast_fields().u64(&self.field)?;
        Ok(StatsSegmentCollector {
--- a/examples/custom_tokenizer.rs
+++ b/examples/custom_tokenizer.rs
@@ -100,7 +100,7 @@ fn main() -> tantivy::Result<()> {
    // here we want to get a hit on the 'ken' in Frankenstein
    let query = query_parser.parse_query("ken")?;

-    let top_docs = searcher.search(&query, &TopDocs::with_limit(10))?;
+    let top_docs = searcher.search(&query, &TopDocs::with_limit(10).order_by_score())?;

    for (_, doc_address) in top_docs {
        let retrieved_doc: TantivyDocument = searcher.doc(doc_address)?;
--- a/examples/date_time_field.rs
+++ b/examples/date_time_field.rs
@@ -50,17 +50,17 @@ fn main() -> tantivy::Result<()> {
    {
        // Simple exact search on the date
        let query = query_parser.parse_query("occurred_at:\"2022-06-22T12:53:50.53Z\"")?;
-        let count_docs = searcher.search(&*query, &TopDocs::with_limit(5))?;
+        let count_docs = searcher.search(&*query, &TopDocs::with_limit(5).order_by_score())?;
        assert_eq!(count_docs.len(), 1);
    }
    {
        // Range query on the date field
        let query = query_parser
            .parse_query(r#"occurred_at:[2022-06-22T12:58:00Z TO 2022-06-23T00:00:00Z}"#)?;
-        let count_docs = searcher.search(&*query, &TopDocs::with_limit(4))?;
+        let count_docs = searcher.search(&*query, &TopDocs::with_limit(4).order_by_score())?;
        assert_eq!(count_docs.len(), 1);
        for (_score, doc_address) in count_docs {
-            let retrieved_doc = searcher.doc::<TantivyDocument>(doc_address)?;
+            let retrieved_doc = searcher.doc(doc_address)?;
            assert!(retrieved_doc
                .get_first(occurred_at)
                .unwrap()
--- a/examples/deleting_updating_documents.rs
+++ b/examples/deleting_updating_documents.rs
@@ -28,7 +28,7 @@ fn extract_doc_given_isbn(
    // The second argument is here to tell we don't care about decoding positions,
    // or term frequencies.
    let term_query = TermQuery::new(isbn_term.clone(), IndexRecordOption::Basic);
-    let top_docs = searcher.search(&term_query, &TopDocs::with_limit(1))?;
+    let top_docs = searcher.search(&term_query, &TopDocs::with_limit(1).order_by_score())?;

    if let Some((_score, doc_address)) = top_docs.first() {
        let doc = searcher.doc(*doc_address)?;
--- a/examples/faceted_search_with_tweaked_score.rs
+++ b/examples/faceted_search_with_tweaked_score.rs
@@ -65,7 +65,7 @@ fn main() -> tantivy::Result<()> {
        );
        let top_docs_by_custom_score =
            // Call TopDocs with a custom tweak score
-            TopDocs::with_limit(2).tweak_score(move |segment_reader: &SegmentReader| {
+            TopDocs::with_limit(2).tweak_score(move |segment_reader: &dyn SegmentReader| {
                let ingredient_reader = segment_reader.facet_reader("ingredient").unwrap();
                let facet_dict = ingredient_reader.facet_dict();

@@ -91,7 +91,7 @@ fn main() -> tantivy::Result<()> {
            .iter()
            .map(|(_, doc_id)| {
                searcher
-                    .doc::<TantivyDocument>(*doc_id)
+                    .doc(*doc_id)
                    .unwrap()
                    .get_first(title)
                    .and_then(|v| v.as_str().map(|el| el.to_string()))
--- a/examples/fuzzy_search.rs
+++ b/examples/fuzzy_search.rs
@@ -145,7 +145,7 @@ fn main() -> tantivy::Result<()> {
        let query = FuzzyTermQuery::new(term, 2, true);

        let (top_docs, count) = searcher
-            .search(&query, &(TopDocs::with_limit(5), Count))
+            .search(&query, &(TopDocs::with_limit(5).order_by_score(), Count))
            .unwrap();
        assert_eq!(count, 3);
        assert_eq!(top_docs.len(), 3);
--- a/examples/ip_field.rs
+++ b/examples/ip_field.rs
@@ -69,25 +69,25 @@ fn main() -> tantivy::Result<()> {
    {
        // Inclusive range queries
        let query = query_parser.parse_query("ip:[192.168.0.80 TO 192.168.0.100]")?;
-        let count_docs = searcher.search(&*query, &TopDocs::with_limit(5))?;
+        let count_docs = searcher.search(&*query, &TopDocs::with_limit(5).order_by_score())?;
        assert_eq!(count_docs.len(), 1);
    }
    {
        // Exclusive range queries
        let query = query_parser.parse_query("ip:{192.168.0.80 TO 192.168.1.100]")?;
-        let count_docs = searcher.search(&*query, &TopDocs::with_limit(2))?;
+        let count_docs = searcher.search(&*query, &TopDocs::with_limit(2).order_by_score())?;
        assert_eq!(count_docs.len(), 0);
    }
    {
        // Find docs with IP addresses smaller equal 192.168.1.100
        let query = query_parser.parse_query("ip:[* TO 192.168.1.100]")?;
-        let count_docs = searcher.search(&*query, &TopDocs::with_limit(2))?;
+        let count_docs = searcher.search(&*query, &TopDocs::with_limit(2).order_by_score())?;
        assert_eq!(count_docs.len(), 2);
    }
    {
        // Find docs with IP addresses smaller than 192.168.1.100
        let query = query_parser.parse_query("ip:[* TO 192.168.1.100}")?;
-        let count_docs = searcher.search(&*query, &TopDocs::with_limit(2))?;
+        let count_docs = searcher.search(&*query, &TopDocs::with_limit(2).order_by_score())?;
        assert_eq!(count_docs.len(), 2);
    }

--- a/examples/iterating_docs_and_positions.rs
+++ b/examples/iterating_docs_and_positions.rs
@@ -91,46 +91,10 @@ fn main() -> tantivy::Result<()> {
        }
    }

-    // A `Term` is a text token associated with a field.
-    // Let's go through all docs containing the term `title:the` and access their position
-    let term_the = Term::from_field_text(title, "the");
-
-    // Some other powerful operations (especially `.skip_to`) may be useful to consume these
+    // Some other powerful operations (especially `.seek`) may be useful to consume these
    // posting lists rapidly.
    // You can check for them in the [`DocSet`](https://docs.rs/tantivy/~0/tantivy/trait.DocSet.html) trait
    // and the [`Postings`](https://docs.rs/tantivy/~0/tantivy/trait.Postings.html) trait

-    // Also, for some VERY specific high performance use case like an OLAP analysis of logs,
-    // you can get better performance by accessing directly the blocks of doc ids.
-    for segment_reader in searcher.segment_readers() {
-        // A segment contains different data structure.
-        // Inverted index stands for the combination of
-        // - the term dictionary
-        // - the inverted lists associated with each terms and their positions
-        let inverted_index = segment_reader.inverted_index(title)?;
-
-        // This segment posting object is like a cursor over the documents matching the term.
-        // The `IndexRecordOption` arguments tells tantivy we will be interested in both term
-        // frequencies and positions.
-        //
-        // If you don't need all this information, you may get better performance by decompressing
-        // less information.
-        if let Some(mut block_segment_postings) =
-            inverted_index.read_block_postings(&term_the, IndexRecordOption::Basic)?
-        {
-            loop {
-                let docs = block_segment_postings.docs();
-                if docs.is_empty() {
-                    break;
-                }
-                // Once again these docs MAY contains deleted documents as well.
-                let docs = block_segment_postings.docs();
-                // Prints `Docs [0, 2].`
-                println!("Docs {docs:?}");
-                block_segment_postings.advance();
-            }
-        }
-    }
-
    Ok(())
 }
--- a/examples/json_field.rs
+++ b/examples/json_field.rs
@@ -59,12 +59,12 @@ fn main() -> tantivy::Result<()> {
    let query_parser = QueryParser::for_index(&index, vec![event_type, attributes]);
    {
        let query = query_parser.parse_query("target:submit-button")?;
-        let count_docs = searcher.search(&*query, &TopDocs::with_limit(2))?;
+        let count_docs = searcher.search(&*query, &TopDocs::with_limit(2).order_by_score())?;
        assert_eq!(count_docs.len(), 2);
    }
    {
        let query = query_parser.parse_query("target:submit")?;
-        let count_docs = searcher.search(&*query, &TopDocs::with_limit(2))?;
+        let count_docs = searcher.search(&*query, &TopDocs::with_limit(2).order_by_score())?;
        assert_eq!(count_docs.len(), 2);
    }
    {
@@ -74,33 +74,33 @@ fn main() -> tantivy::Result<()> {
    }
    {
        let query = query_parser.parse_query("click AND cart.product_id:133")?;
-        let hits = searcher.search(&*query, &TopDocs::with_limit(2))?;
+        let hits = searcher.search(&*query, &TopDocs::with_limit(2).order_by_score())?;
        assert_eq!(hits.len(), 1);
    }
    {
        // The sub-fields in the json field marked as default field still need to be explicitly
        // addressed
        let query = query_parser.parse_query("click AND 133")?;
-        let hits = searcher.search(&*query, &TopDocs::with_limit(2))?;
+        let hits = searcher.search(&*query, &TopDocs::with_limit(2).order_by_score())?;
        assert_eq!(hits.len(), 0);
    }
    {
        // Default json fields are ignored if they collide with the schema
        let query = query_parser.parse_query("event_type:holiday-sale")?;
-        let hits = searcher.search(&*query, &TopDocs::with_limit(2))?;
+        let hits = searcher.search(&*query, &TopDocs::with_limit(2).order_by_score())?;
        assert_eq!(hits.len(), 0);
    }
    // # Query via full attribute path
    {
        // This only searches in our schema's `event_type` field
        let query = query_parser.parse_query("event_type:click")?;
-        let hits = searcher.search(&*query, &TopDocs::with_limit(2))?;
+        let hits = searcher.search(&*query, &TopDocs::with_limit(2).order_by_score())?;
        assert_eq!(hits.len(), 2);
    }
    {
        // Default json fields can still be accessed by full path
        let query = query_parser.parse_query("attributes.event_type:holiday-sale")?;
-        let hits = searcher.search(&*query, &TopDocs::with_limit(2))?;
+        let hits = searcher.search(&*query, &TopDocs::with_limit(2).order_by_score())?;
        assert_eq!(hits.len(), 1);
    }
    Ok(())
--- a/examples/phrase_prefix_search.rs
+++ b/examples/phrase_prefix_search.rs
@@ -63,11 +63,11 @@ fn main() -> Result<()> {
    // but not "in the Gulf Stream".
    let query = query_parser.parse_query("\"in the su\"*")?;

-    let top_docs = searcher.search(&query, &TopDocs::with_limit(10))?;
+    let top_docs = searcher.search(&query, &TopDocs::with_limit(10).order_by_score())?;
    let mut titles = top_docs
        .into_iter()
        .map(|(_score, doc_address)| {
-            let doc = searcher.doc::<TantivyDocument>(doc_address)?;
+            let doc = searcher.doc(doc_address)?;
            let title = doc
                .get_first(title)
                .and_then(|v| v.as_str())
--- a/examples/pre_tokenized_text.rs
+++ b/examples/pre_tokenized_text.rs
@@ -107,7 +107,8 @@ fn main() -> tantivy::Result<()> {
        IndexRecordOption::Basic,
    );

-    let (top_docs, count) = searcher.search(&query, &(TopDocs::with_limit(2), Count))?;
+    let (top_docs, count) =
+        searcher.search(&query, &(TopDocs::with_limit(2).order_by_score(), Count))?;

    assert_eq!(count, 2);

@@ -128,7 +129,8 @@ fn main() -> tantivy::Result<()> {
        IndexRecordOption::Basic,
    );

-    let (_top_docs, count) = searcher.search(&query, &(TopDocs::with_limit(2), Count))?;
+    let (_top_docs, count) =
+        searcher.search(&query, &(TopDocs::with_limit(2).order_by_score(), Count))?;

    assert_eq!(count, 0);

--- a/examples/snippet.rs
+++ b/examples/snippet.rs
@@ -50,12 +50,12 @@ fn main() -> tantivy::Result<()> {
    let query_parser = QueryParser::for_index(&index, vec![title, body]);
    let query = query_parser.parse_query("sycamore spring")?;

-    let top_docs = searcher.search(&query, &TopDocs::with_limit(10))?;
+    let top_docs = searcher.search(&query, &TopDocs::with_limit(10).order_by_score())?;

    let snippet_generator = SnippetGenerator::create(&searcher, &*query, body)?;

    for (score, doc_address) in top_docs {
-        let doc = searcher.doc::<TantivyDocument>(doc_address)?;
+        let doc = searcher.doc(doc_address)?;
        let snippet = snippet_generator.snippet_from_doc(&doc);
        println!("Document score {score}:");
        println!("title: {}", doc.get_first(title).unwrap().as_str().unwrap());
--- a/examples/stop_words.rs
+++ b/examples/stop_words.rs
@@ -102,7 +102,7 @@ fn main() -> tantivy::Result<()> {
    // stop words are applied on the query as well.
    // The following will be equivalent to `title:frankenstein`
    let query = query_parser.parse_query("title:\"the Frankenstein\"")?;
-    let top_docs = searcher.search(&query, &TopDocs::with_limit(10))?;
+    let top_docs = searcher.search(&query, &TopDocs::with_limit(10).order_by_score())?;

    for (score, doc_address) in top_docs {
        let retrieved_doc: TantivyDocument = searcher.doc(doc_address)?;
--- a/examples/warmer.rs
+++ b/examples/warmer.rs
@@ -43,7 +43,7 @@ impl DynamicPriceColumn {
        }
    }

-    pub fn price_for_segment(&self, segment_reader: &SegmentReader) -> Option<Arc<Vec<Price>>> {
+    pub fn price_for_segment(&self, segment_reader: &dyn SegmentReader) -> Option<Arc<Vec<Price>>> {
        let segment_key = (segment_reader.segment_id(), segment_reader.delete_opstamp());
        self.price_cache.read().unwrap().get(&segment_key).cloned()
    }
@@ -157,14 +157,14 @@ fn main() -> tantivy::Result<()> {
    let query = query_parser.parse_query("cooking")?;

    let searcher = reader.searcher();
-    let score_by_price = move |segment_reader: &SegmentReader| {
+    let score_by_price = move |segment_reader: &dyn SegmentReader| {
        let price = price_dynamic_column
            .price_for_segment(segment_reader)
            .unwrap();
        move |doc_id: DocId| Reverse(price[doc_id as usize])
    };

-    let most_expensive_first = TopDocs::with_limit(10).custom_score(score_by_price);
+    let most_expensive_first = TopDocs::with_limit(10).order_by(score_by_price);

    let hits = searcher.search(&query, &most_expensive_first)?;
    assert_eq!(
--- a/query-grammar/src/query_grammar.rs
+++ b/query-grammar/src/query_grammar.rs
@@ -560,7 +560,7 @@ fn range_infallible(inp: &str) -> JResult<&str, UserInputLeaf> {
            (
                (
                    value((), tag(">=")),
-                    map(word_infallible("", false), |(bound, err)| {
+                    map(word_infallible(")", false), |(bound, err)| {
                        (
                            (
                                bound
@@ -574,7 +574,7 @@ fn range_infallible(inp: &str) -> JResult<&str, UserInputLeaf> {
                ),
                (
                    value((), tag("<=")),
-                    map(word_infallible("", false), |(bound, err)| {
+                    map(word_infallible(")", false), |(bound, err)| {
                        (
                            (
                                UserInputBound::Unbounded,
@@ -588,7 +588,7 @@ fn range_infallible(inp: &str) -> JResult<&str, UserInputLeaf> {
                ),
                (
                    value((), tag(">")),
-                    map(word_infallible("", false), |(bound, err)| {
+                    map(word_infallible(")", false), |(bound, err)| {
                        (
                            (
                                bound
@@ -602,7 +602,7 @@ fn range_infallible(inp: &str) -> JResult<&str, UserInputLeaf> {
                ),
                (
                    value((), tag("<")),
-                    map(word_infallible("", false), |(bound, err)| {
+                    map(word_infallible(")", false), |(bound, err)| {
                        (
                            (
                                UserInputBound::Unbounded,
@@ -704,7 +704,11 @@ fn regex(inp: &str) -> IResult<&str, UserInputLeaf> {
                many1(alt((preceded(char('\\'), char('/')), none_of("/")))),
                char('/'),
            ),
-            peek(alt((multispace1, eof))),
+            peek(alt((
+                value((), multispace1),
+                value((), char(')')),
+                value((), eof),
+            ))),
        ),
        |elements| UserInputLeaf::Regex {
            field: None,
@@ -721,8 +725,12 @@ fn regex_infallible(inp: &str) -> JResult<&str, UserInputLeaf> {
            opt_i_err(char('/'), "missing delimiter /"),
        ),
        opt_i_err(
-            peek(alt((multispace1, eof))),
-            "expected whitespace or end of input",
+            peek(alt((
+                value((), multispace1),
+                value((), char(')')),
+                value((), eof),
+            ))),
+            "expected whitespace, closing parenthesis, or end of input",
        ),
    )(inp)
    {
@@ -758,7 +766,17 @@ fn negate(expr: UserInputAst) -> UserInputAst {
 fn leaf(inp: &str) -> IResult<&str, UserInputAst> {
    alt((
        delimited(char('('), ast, char(')')),
-        map(char('*'), |_| UserInputAst::from(UserInputLeaf::All)),
+        map(
+            terminated(
+                char('*'),
+                peek(alt((
+                    value((), multispace1),
+                    value((), char(')')),
+                    value((), eof),
+                ))),
+            ),
+            |_| UserInputAst::from(UserInputLeaf::All),
+        ),
        map(preceded(tuple((tag("NOT"), multispace1)), leaf), negate),
        literal,
    ))(inp)
@@ -779,7 +797,17 @@ fn leaf_infallible(inp: &str) -> JResult<&str, Option<UserInputAst>> {
                ),
            ),
            (
-                value((), char('*')),
+                value(
+                    (),
+                    terminated(
+                        char('*'),
+                        peek(alt((
+                            value((), multispace1),
+                            value((), char(')')),
+                            value((), eof),
+                        ))),
+                    ),
+                ),
                map(nothing, |_| {
                    (Some(UserInputAst::from(UserInputLeaf::All)), Vec::new())
                }),
@@ -1303,6 +1331,14 @@ mod test {
        test_parse_query_to_ast_helper("<a", "{\"*\" TO \"a\"}");
        test_parse_query_to_ast_helper("<=a", "{\"*\" TO \"a\"]");
        test_parse_query_to_ast_helper("<=bsd", "{\"*\" TO \"bsd\"]");
+
+        test_parse_query_to_ast_helper("(<=42)", "{\"*\" TO \"42\"]");
+        test_parse_query_to_ast_helper("(<=42 )", "{\"*\" TO \"42\"]");
+        test_parse_query_to_ast_helper("(age:>5)", "\"age\":{\"5\" TO \"*\"}");
+        test_parse_query_to_ast_helper(
+            "(title:bar AND age:>12)",
+            "(+\"title\":bar +\"age\":{\"12\" TO \"*\"})",
+        );
    }

    #[test]
@@ -1671,6 +1707,25 @@ mod test {
        test_parse_query_to_ast_helper("abc:a b", "(*\"abc\":a *b)");
        test_parse_query_to_ast_helper("abc:\"a b\"", "\"abc\":\"a b\"");
        test_parse_query_to_ast_helper("foo:[1 TO 5]", "\"foo\":[\"1\" TO \"5\"]");
+
+        // Phrase prefixed with *
+        test_parse_query_to_ast_helper("foo:(*A)", "\"foo\":*A");
+        test_parse_query_to_ast_helper("*A", "*A");
+        test_parse_query_to_ast_helper("(*A)", "*A");
+        test_parse_query_to_ast_helper("foo:(A OR B)", "(?\"foo\":A ?\"foo\":B)");
+        test_parse_query_to_ast_helper("foo:(A* OR B*)", "(?\"foo\":A* ?\"foo\":B*)");
+        test_parse_query_to_ast_helper("foo:(*A OR *B)", "(?\"foo\":*A ?\"foo\":*B)");
+
+        // Regexes between parentheses
+        test_parse_query_to_ast_helper("foo:(/A.*/)", "\"foo\":/A.*/");
+        test_parse_query_to_ast_helper("foo:(/A.*/ OR /B.*/)", "(?\"foo\":/A.*/ ?\"foo\":/B.*/)");
+    }
+
+    #[test]
+    fn test_parse_query_all() {
+        test_parse_query_to_ast_helper("*", "*");
+        test_parse_query_to_ast_helper("(*)", "*");
+        test_parse_query_to_ast_helper("(* )", "*");
    }

    #[test]
--- a/query-grammar/src/user_input_ast.rs
+++ b/query-grammar/src/user_input_ast.rs
@@ -66,6 +66,7 @@ impl UserInputLeaf {
            }
            UserInputLeaf::Range { field, .. } if field.is_none() => *field = Some(default_field),
            UserInputLeaf::Set { field, .. } if field.is_none() => *field = Some(default_field),
+            UserInputLeaf::Regex { field, .. } if field.is_none() => *field = Some(default_field),
            _ => (), // field was already set, do nothing
        }
    }
--- a/src/aggregation/accessor_helpers.rs
+++ b/src/aggregation/accessor_helpers.rs
@@ -16,15 +16,16 @@ use crate::index::SegmentReader;
 /// That way we can use it the same way as if it would come from the fastfield.
 pub(crate) fn get_missing_val_as_u64_lenient(
    column_type: ColumnType,
+    column_max_value: u64,
    missing: &Key,
    field_name: &str,
 ) -> crate::Result<Option<u64>> {
    let missing_val = match missing {
-        Key::Str(_) if column_type == ColumnType::Str => Some(u64::MAX),
+        Key::Str(_) if column_type == ColumnType::Str => Some(column_max_value + 1),
        // Allow fallback to number on text fields
-        Key::F64(_) if column_type == ColumnType::Str => Some(u64::MAX),
-        Key::U64(_) if column_type == ColumnType::Str => Some(u64::MAX),
-        Key::I64(_) if column_type == ColumnType::Str => Some(u64::MAX),
+        Key::F64(_) if column_type == ColumnType::Str => Some(column_max_value + 1),
+        Key::U64(_) if column_type == ColumnType::Str => Some(column_max_value + 1),
+        Key::I64(_) if column_type == ColumnType::Str => Some(column_max_value + 1),
        Key::F64(val) if column_type.numerical_type().is_some() => {
            f64_to_fastfield_u64(*val, &column_type)
        }
@@ -56,7 +57,7 @@ pub(crate) fn get_numeric_or_date_column_types() -> &'static [ColumnType] {

 /// Get fast field reader or empty as default.
 pub(crate) fn get_ff_reader(
-    reader: &SegmentReader,
+    reader: &dyn SegmentReader,
    field_name: &str,
    allowed_column_types: Option<&[ColumnType]>,
 ) -> crate::Result<(columnar::Column<u64>, ColumnType)> {
@@ -73,7 +74,7 @@ pub(crate) fn get_ff_reader(
 }

 pub(crate) fn get_dynamic_columns(
-    reader: &SegmentReader,
+    reader: &dyn SegmentReader,
    field_name: &str,
 ) -> crate::Result<Vec<columnar::DynamicColumn>> {
    let ff_fields = reader.fast_fields().dynamic_column_handles(field_name)?;
@@ -89,7 +90,7 @@ pub(crate) fn get_dynamic_columns(
 ///
 /// Is guaranteed to return at least one column.
 pub(crate) fn get_all_ff_reader_or_empty(
-    reader: &SegmentReader,
+    reader: &dyn SegmentReader,
    field_name: &str,
    allowed_column_types: Option<&[ColumnType]>,
    fallback_type: ColumnType,
--- a/src/aggregation/agg_data.rs
+++ b/src/aggregation/agg_data.rs
@@ -1,4 +1,4 @@
-use columnar::{Column, ColumnType, StrColumn};
+use columnar::{Column, ColumnBlockAccessor, ColumnType, StrColumn};
 use common::BitSet;
 use rustc_hash::FxHashSet;
 use serde::Serialize;
@@ -10,16 +10,16 @@ use crate::aggregation::accessor_helpers::{
 };
 use crate::aggregation::agg_req::{Aggregation, AggregationVariants, Aggregations};
 use crate::aggregation::bucket::{
-    build_segment_aggregation_collector, FilterAggReqData, HistogramAggReqData, HistogramBounds,
-    IncludeExcludeParam, MissingTermAggReqData, RangeAggReqData, SegmentFilterCollector,
-    SegmentHistogramCollector, SegmentRangeCollector, TermMissingAgg, TermsAggReqData,
-    TermsAggregation, TermsAggregationInternal,
+    build_segment_filter_collector, build_segment_range_collector, FilterAggReqData,
+    HistogramAggReqData, HistogramBounds, IncludeExcludeParam, MissingTermAggReqData,
+    RangeAggReqData, SegmentHistogramCollector, TermMissingAgg, TermsAggReqData, TermsAggregation,
+    TermsAggregationInternal,
 };
 use crate::aggregation::metric::{
-    AverageAggregation, CardinalityAggReqData, CardinalityAggregationReq, CountAggregation,
-    ExtendedStatsAggregation, MaxAggregation, MetricAggReqData, MinAggregation,
-    SegmentCardinalityCollector, SegmentExtendedStatsCollector, SegmentPercentilesCollector,
-    SegmentStatsCollector, StatsAggregation, StatsType, SumAggregation, TopHitsAggReqData,
+    build_segment_stats_collector, AverageAggregation, CardinalityAggReqData,
+    CardinalityAggregationReq, CountAggregation, ExtendedStatsAggregation, MaxAggregation,
+    MetricAggReqData, MinAggregation, SegmentCardinalityCollector, SegmentExtendedStatsCollector,
+    SegmentPercentilesCollector, StatsAggregation, StatsType, SumAggregation, TopHitsAggReqData,
    TopHitsSegmentCollector,
 };
 use crate::aggregation::segment_agg_result::{
@@ -35,6 +35,7 @@ pub struct AggregationsSegmentCtx {
    /// Request data for each aggregation type.
    pub per_request: PerRequestAggSegCtx,
    pub context: AggContextParams,
+    pub column_block_accessor: ColumnBlockAccessor<u64>,
 }

 impl AggregationsSegmentCtx {
@@ -107,21 +108,14 @@ impl AggregationsSegmentCtx {
            .as_deref()
            .expect("range_req_data slot is empty (taken)")
    }
-    #[inline]
-    pub(crate) fn get_filter_req_data(&self, idx: usize) -> &FilterAggReqData {
-        self.per_request.filter_req_data[idx]
-            .as_deref()
-            .expect("filter_req_data slot is empty (taken)")
-    }

    // ---------- mutable getters ----------

    #[inline]
-    pub(crate) fn get_term_req_data_mut(&mut self, idx: usize) -> &mut TermsAggReqData {
-        self.per_request.term_req_data[idx]
-            .as_deref_mut()
-            .expect("term_req_data slot is empty (taken)")
+    pub(crate) fn get_metric_req_data_mut(&mut self, idx: usize) -> &mut MetricAggReqData {
+        &mut self.per_request.stats_metric_req_data[idx]
    }
+
    #[inline]
    pub(crate) fn get_cardinality_req_data_mut(
        &mut self,
@@ -129,10 +123,7 @@ impl AggregationsSegmentCtx {
    ) -> &mut CardinalityAggReqData {
        &mut self.per_request.cardinality_req_data[idx]
    }
-    #[inline]
-    pub(crate) fn get_metric_req_data_mut(&mut self, idx: usize) -> &mut MetricAggReqData {
-        &mut self.per_request.stats_metric_req_data[idx]
-    }
+
    #[inline]
    pub(crate) fn get_histogram_req_data_mut(&mut self, idx: usize) -> &mut HistogramAggReqData {
        self.per_request.histogram_req_data[idx]
@@ -142,21 +133,6 @@ impl AggregationsSegmentCtx {

    // ---------- take / put (terms, histogram, range) ----------

-    /// Move out the boxed Terms request at `idx`, leaving `None`.
-    #[inline]
-    pub(crate) fn take_term_req_data(&mut self, idx: usize) -> Box<TermsAggReqData> {
-        self.per_request.term_req_data[idx]
-            .take()
-            .expect("term_req_data slot is empty (taken)")
-    }
-
-    /// Put back a Terms request into an empty slot at `idx`.
-    #[inline]
-    pub(crate) fn put_back_term_req_data(&mut self, idx: usize, value: Box<TermsAggReqData>) {
-        debug_assert!(self.per_request.term_req_data[idx].is_none());
-        self.per_request.term_req_data[idx] = Some(value);
-    }
-
    /// Move out the boxed Histogram request at `idx`, leaving `None`.
    #[inline]
    pub(crate) fn take_histogram_req_data(&mut self, idx: usize) -> Box<HistogramAggReqData> {
@@ -320,6 +296,7 @@ impl PerRequestAggSegCtx {

    /// Convert the aggregation tree into a serializable struct representation.
    /// Each node contains: { name, kind, children }.
+    #[allow(dead_code)]
    pub fn get_view_tree(&self) -> Vec<AggTreeViewNode> {
        fn node_to_view(node: &AggRefNode, pr: &PerRequestAggSegCtx) -> AggTreeViewNode {
            let mut children: Vec<AggTreeViewNode> =
@@ -345,12 +322,19 @@ impl PerRequestAggSegCtx {
 pub(crate) fn build_segment_agg_collectors_root(
    req: &mut AggregationsSegmentCtx,
 ) -> crate::Result<Box<dyn SegmentAggregationCollector>> {
-    build_segment_agg_collectors(req, &req.per_request.agg_tree.clone())
+    build_segment_agg_collectors_generic(req, &req.per_request.agg_tree.clone())
 }

 pub(crate) fn build_segment_agg_collectors(
    req: &mut AggregationsSegmentCtx,
    nodes: &[AggRefNode],
+) -> crate::Result<Box<dyn SegmentAggregationCollector>> {
+    build_segment_agg_collectors_generic(req, nodes)
+}
+
+fn build_segment_agg_collectors_generic(
+    req: &mut AggregationsSegmentCtx,
+    nodes: &[AggRefNode],
 ) -> crate::Result<Box<dyn SegmentAggregationCollector>> {
    let mut collectors = Vec::new();
    for node in nodes.iter() {
@@ -373,7 +357,7 @@ pub(crate) fn build_segment_agg_collector(
    node: &AggRefNode,
 ) -> crate::Result<Box<dyn SegmentAggregationCollector>> {
    match node.kind {
-        AggKind::Terms => build_segment_aggregation_collector(req, node),
+        AggKind::Terms => crate::aggregation::bucket::build_segment_term_collector(req, node),
        AggKind::MissingTerm => {
            let req_data = &mut req.per_request.missing_term_req_data[node.idx_in_req_data];
            if req_data.accessors.is_empty() {
@@ -388,6 +372,8 @@ pub(crate) fn build_segment_agg_collector(
            Ok(Box::new(SegmentCardinalityCollector::from_req(
                req_data.column_type,
                node.idx_in_req_data,
+                req_data.accessor.clone(),
+                req_data.missing_value_for_accessor,
            )))
        }
        AggKind::StatsKind(stats_type) => {
@@ -398,20 +384,21 @@ pub(crate) fn build_segment_agg_collector(
                | StatsType::Count
                | StatsType::Max
                | StatsType::Min
-                | StatsType::Stats => Ok(Box::new(SegmentStatsCollector::from_req(
-                    node.idx_in_req_data,
-                ))),
-                StatsType::ExtendedStats(sigma) => {
-                    Ok(Box::new(SegmentExtendedStatsCollector::from_req(
-                        req_data.field_type,
-                        sigma,
-                        node.idx_in_req_data,
-                        req_data.missing,
-                    )))
-                }
-                StatsType::Percentiles => Ok(Box::new(
-                    SegmentPercentilesCollector::from_req_and_validate(node.idx_in_req_data)?,
+                | StatsType::Stats => build_segment_stats_collector(req_data),
+                StatsType::ExtendedStats(sigma) => Ok(Box::new(
+                    SegmentExtendedStatsCollector::from_req(req_data, sigma),
                )),
+                StatsType::Percentiles => {
+                    let req_data = req.get_metric_req_data_mut(node.idx_in_req_data);
+                    Ok(Box::new(
+                        SegmentPercentilesCollector::from_req_and_validate(
+                            req_data.field_type,
+                            req_data.missing_u64,
+                            req_data.accessor.clone(),
+                            node.idx_in_req_data,
+                        ),
+                    ))
+                }
            }
        }
        AggKind::TopHits => {
@@ -428,12 +415,8 @@ pub(crate) fn build_segment_agg_collector(
        AggKind::DateHistogram => Ok(Box::new(SegmentHistogramCollector::from_req_and_validate(
            req, node,
        )?)),
-        AggKind::Range => Ok(Box::new(SegmentRangeCollector::from_req_and_validate(
-            req, node,
-        )?)),
-        AggKind::Filter => Ok(Box::new(SegmentFilterCollector::from_req_and_validate(
-            req, node,
-        )?)),
+        AggKind::Range => Ok(build_segment_range_collector(req, node)?),
+        AggKind::Filter => build_segment_filter_collector(req, node),
    }
 }

@@ -486,17 +469,18 @@ impl AggKind {
 /// Build AggregationsData by walking the request tree.
 pub(crate) fn build_aggregations_data_from_req(
    aggs: &Aggregations,
-    reader: &SegmentReader,
+    reader: &dyn SegmentReader,
    segment_ordinal: SegmentOrdinal,
    context: AggContextParams,
 ) -> crate::Result<AggregationsSegmentCtx> {
    let mut data = AggregationsSegmentCtx {
        per_request: Default::default(),
        context,
+        column_block_accessor: ColumnBlockAccessor::default(),
    };

    for (name, agg) in aggs.iter() {
-        let nodes = build_nodes(name, agg, reader, segment_ordinal, &mut data)?;
+        let nodes = build_nodes(name, agg, reader, segment_ordinal, &mut data, true)?;
        data.per_request.agg_tree.extend(nodes);
    }
    Ok(data)
@@ -505,9 +489,10 @@ pub(crate) fn build_aggregations_data_from_req(
 fn build_nodes(
    agg_name: &str,
    req: &Aggregation,
-    reader: &SegmentReader,
+    reader: &dyn SegmentReader,
    segment_ordinal: SegmentOrdinal,
    data: &mut AggregationsSegmentCtx,
+    is_top_level: bool,
 ) -> crate::Result<Vec<AggRefNode>> {
    use AggregationVariants::*;
    match &req.agg {
@@ -520,9 +505,9 @@ fn build_nodes(
            let idx_in_req_data = data.push_range_req_data(RangeAggReqData {
                accessor,
                field_type,
-                column_block_accessor: Default::default(),
                name: agg_name.to_string(),
                req: range_req.clone(),
+                is_top_level,
            });
            let children = build_children(&req.sub_aggregation, reader, segment_ordinal, data)?;
            Ok(vec![AggRefNode {
@@ -540,9 +525,7 @@ fn build_nodes(
            let idx_in_req_data = data.push_histogram_req_data(HistogramAggReqData {
                accessor,
                field_type,
-                column_block_accessor: Default::default(),
                name: agg_name.to_string(),
-                sub_aggregation_blueprint: None,
                req: histo_req.clone(),
                is_date_histogram: false,
                bounds: HistogramBounds {
@@ -567,9 +550,7 @@ fn build_nodes(
            let idx_in_req_data = data.push_histogram_req_data(HistogramAggReqData {
                accessor,
                field_type,
-                column_block_accessor: Default::default(),
                name: agg_name.to_string(),
-                sub_aggregation_blueprint: None,
                req: histo_req,
                is_date_histogram: true,
                bounds: HistogramBounds {
@@ -594,6 +575,7 @@ fn build_nodes(
            data,
            &req.sub_aggregation,
            TermsOrCardinalityRequest::Terms(terms_req.clone()),
+            is_top_level,
        ),
        Cardinality(card_req) => build_terms_or_cardinality_nodes(
            agg_name,
@@ -604,6 +586,7 @@ fn build_nodes(
            data,
            &req.sub_aggregation,
            TermsOrCardinalityRequest::Cardinality(card_req.clone()),
+            is_top_level,
        ),
        Average(AverageAggregation { field, missing, .. })
        | Max(MaxAggregation { field, missing, .. })
@@ -647,7 +630,6 @@ fn build_nodes(
            let idx_in_req_data = data.push_metric_req_data(MetricAggReqData {
                accessor,
                field_type,
-                column_block_accessor: Default::default(),
                name: agg_name.to_string(),
                collecting_for,
                missing: *missing,
@@ -675,7 +657,6 @@ fn build_nodes(
            let idx_in_req_data = data.push_metric_req_data(MetricAggReqData {
                accessor,
                field_type,
-                column_block_accessor: Default::default(),
                name: agg_name.to_string(),
                collecting_for: StatsType::Percentiles,
                missing: percentiles_req.missing,
@@ -732,7 +713,7 @@ fn build_nodes(
            // Build the query and evaluator upfront
            let schema = reader.schema();
            let tokenizers = &data.context.tokenizers;
-            let query = filter_req.parse_query(&schema, tokenizers)?;
+            let query = filter_req.parse_query(schema, tokenizers)?;
            let evaluator = crate::aggregation::bucket::DocumentQueryEvaluator::new(
                query,
                schema.clone(),
@@ -747,9 +728,10 @@ fn build_nodes(
            let idx_in_req_data = data.push_filter_req_data(FilterAggReqData {
                name: agg_name.to_string(),
                req: filter_req.clone(),
-                segment_reader: reader.clone(),
+                segment_reader: reader.clone_arc(),
                evaluator,
                matching_docs_buffer,
+                is_top_level,
            });
            let children = build_children(&req.sub_aggregation, reader, segment_ordinal, data)?;
            Ok(vec![AggRefNode {
@@ -763,19 +745,26 @@ fn build_nodes(

 fn build_children(
    aggs: &Aggregations,
-    reader: &SegmentReader,
+    reader: &dyn SegmentReader,
    segment_ordinal: SegmentOrdinal,
    data: &mut AggregationsSegmentCtx,
 ) -> crate::Result<Vec<AggRefNode>> {
    let mut children = Vec::new();
    for (name, agg) in aggs.iter() {
-        children.extend(build_nodes(name, agg, reader, segment_ordinal, data)?);
+        children.extend(build_nodes(
+            name,
+            agg,
+            reader,
+            segment_ordinal,
+            data,
+            false,
+        )?);
    }
    Ok(children)
 }

 fn get_term_agg_accessors(
-    reader: &SegmentReader,
+    reader: &dyn SegmentReader,
    field_name: &str,
    missing: &Option<Key>,
 ) -> crate::Result<Vec<(Column<u64>, ColumnType)>> {
@@ -828,11 +817,12 @@ fn build_terms_or_cardinality_nodes(
    agg_name: &str,
    field_name: &str,
    missing: &Option<Key>,
-    reader: &SegmentReader,
+    reader: &dyn SegmentReader,
    segment_ordinal: SegmentOrdinal,
    data: &mut AggregationsSegmentCtx,
    sub_aggs: &Aggregations,
    req: TermsOrCardinalityRequest,
+    is_top_level: bool,
 ) -> crate::Result<Vec<AggRefNode>> {
    let mut nodes = Vec::new();

@@ -884,12 +874,12 @@ fn build_terms_or_cardinality_nodes(
        });
    }

-    // Add one node per accessor to mirror previous behavior and allow per-type missing handling.
+    // Add one node per accessor
    for (accessor, column_type) in column_and_types {
        let missing_value_for_accessor = if use_special_missing_agg {
            None
        } else if let Some(m) = missing.as_ref() {
-            get_missing_val_as_u64_lenient(column_type, m, field_name)?
+            get_missing_val_as_u64_lenient(column_type, accessor.max_value(), m, field_name)?
        } else {
            None
        };
@@ -915,13 +905,11 @@ fn build_terms_or_cardinality_nodes(
                    column_type,
                    str_dict_column: str_dict_column.clone(),
                    missing_value_for_accessor,
-                    column_block_accessor: Default::default(),
                    name: agg_name.to_string(),
                    req: TermsAggregationInternal::from_req(req),
-                    // Will be filled later when building collectors
-                    sub_aggregation_blueprint: None,
                    sug_aggregations: sub_aggs.clone(),
                    allowed_term_ids,
+                    is_top_level,
                });
                (idx_in_req_data, AggKind::Terms)
            }
@@ -931,7 +919,6 @@ fn build_terms_or_cardinality_nodes(
                    column_type,
                    str_dict_column: str_dict_column.clone(),
                    missing_value_for_accessor,
-                    column_block_accessor: Default::default(),
                    name: agg_name.to_string(),
                    req: req.clone(),
                });
--- a/src/aggregation/agg_limits.rs
+++ b/src/aggregation/agg_limits.rs
@@ -35,6 +35,7 @@ pub struct AggregationLimitsGuard {
    /// Allocated memory with this guard.
    allocated_with_the_guard: u64,
 }
+
 impl Clone for AggregationLimitsGuard {
    fn clone(&self) -> Self {
        Self {
--- a/src/aggregation/agg_result.rs
+++ b/src/aggregation/agg_result.rs
@@ -16,7 +16,7 @@ use super::{AggregationError, Key};
 use crate::TantivyError;

 #[derive(Clone, Default, Debug, PartialEq, Serialize, Deserialize)]
-/// The final aggegation result.
+/// The final aggregation result.
 pub struct AggregationResults(pub FxHashMap<String, AggregationResult>);

 impl AggregationResults {
--- a/src/aggregation/agg_tests.rs
+++ b/src/aggregation/agg_tests.rs
@@ -2,15 +2,441 @@ use serde_json::Value;

 use crate::aggregation::agg_req::{Aggregation, Aggregations};
 use crate::aggregation::agg_result::AggregationResults;
-use crate::aggregation::buf_collector::DOC_BLOCK_SIZE;
 use crate::aggregation::collector::AggregationCollector;
 use crate::aggregation::intermediate_agg_result::IntermediateAggregationResults;
 use crate::aggregation::tests::{get_test_index_2_segments, get_test_index_from_values_and_terms};
 use crate::aggregation::DistributedAggregationCollector;
+use crate::docset::COLLECT_BLOCK_BUFFER_LEN;
 use crate::query::{AllQuery, TermQuery};
 use crate::schema::{IndexRecordOption, Schema, FAST};
 use crate::{Index, IndexWriter, Term};

+// The following tests ensure that each bucket aggregation type correctly functions as a
+// sub-aggregation of another bucket aggregation in two scenarios:
+// 1) The parent has more buckets than the child sub-aggregation
+// 2) The child sub-aggregation has more buckets than the parent
+//
+// These scenarios exercise the bucket id mapping and sub-aggregation routing logic.
+
+#[test]
+fn test_terms_as_subagg_parent_more_vs_child_more() -> crate::Result<()> {
+    let index = get_test_index_2_segments(false)?;
+
+    // Case A: parent has more buckets than child
+    // Parent: range with 4 buckets
+    // Child: terms on text -> 2 buckets
+    let agg_parent_more: Aggregations = serde_json::from_value(json!({
+        "parent_range": {
+            "range": {
+                "field": "score",
+                "ranges": [
+                    {"to": 3.0},
+                    {"from": 3.0, "to": 7.0},
+                    {"from": 7.0, "to": 20.0},
+                    {"from": 20.0}
+                ]
+            },
+            "aggs": {
+                "child_terms": {"terms": {"field": "text", "order": {"_key": "asc"}}}
+            }
+        }
+    }))
+    .unwrap();
+
+    let res = crate::aggregation::tests::exec_request(agg_parent_more, &index)?;
+    // Exact expected structure and counts
+    assert_eq!(
+        res["parent_range"]["buckets"],
+        json!([
+            {
+                "key": "*-3",
+                "doc_count": 1,
+                "to": 3.0,
+                "child_terms": {
+                    "buckets": [
+                        {"doc_count": 1, "key": "cool"}
+                    ],
+                    "sum_other_doc_count": 0
+                }
+            },
+            {
+                "key": "3-7",
+                "doc_count": 3,
+                "from": 3.0,
+                "to": 7.0,
+                "child_terms": {
+                    "buckets": [
+                        {"doc_count": 2, "key": "cool"},
+                        {"doc_count": 1, "key": "nohit"}
+                    ],
+                    "sum_other_doc_count": 0
+                }
+            },
+            {
+                "key": "7-20",
+                "doc_count": 3,
+                "from": 7.0,
+                "to": 20.0,
+                "child_terms": {
+                    "buckets": [
+                        {"doc_count": 3, "key": "cool"}
+                    ],
+                    "sum_other_doc_count": 0
+                }
+            },
+            {
+                "key": "20-*",
+                "doc_count": 2,
+                "from": 20.0,
+                "child_terms": {
+                    "buckets": [
+                        {"doc_count": 1, "key": "cool"},
+                        {"doc_count": 1, "key": "nohit"}
+                    ],
+                    "sum_other_doc_count": 0
+                }
+            }
+        ])
+    );
+
+    // Case B: child has more buckets than parent
+    // Parent: histogram on score with large interval -> 1 bucket
+    // Child: terms on text -> 2 buckets (cool/nohit)
+    let agg_child_more: Aggregations = serde_json::from_value(json!({
+        "parent_hist": {
+            "histogram": {"field": "score", "interval": 100.0},
+            "aggs": {
+                "child_terms": {"terms": {"field": "text", "order": {"_key": "asc"}}}
+            }
+        }
+    }))
+    .unwrap();
+
+    let res = crate::aggregation::tests::exec_request(agg_child_more, &index)?;
+    assert_eq!(
+        res["parent_hist"],
+        json!({
+            "buckets": [
+                {
+                    "key": 0.0,
+                    "doc_count": 9,
+                    "child_terms": {
+                        "buckets": [
+                            {"doc_count": 7, "key": "cool"},
+                            {"doc_count": 2, "key": "nohit"}
+                        ],
+                        "sum_other_doc_count": 0
+                    }
+                }
+            ]
+        })
+    );
+
+    Ok(())
+}
+
+#[test]
+fn test_range_as_subagg_parent_more_vs_child_more() -> crate::Result<()> {
+    let index = get_test_index_2_segments(false)?;
+
+    // Case A: parent has more buckets than child
+    // Parent: range with 5 buckets
+    // Child: coarse range with 3 buckets
+    let agg_parent_more: Aggregations = serde_json::from_value(json!({
+        "parent_range": {
+            "range": {
+                "field": "score",
+                "ranges": [
+                    {"to": 3.0},
+                    {"from": 3.0, "to": 7.0},
+                    {"from": 7.0, "to": 11.0},
+                    {"from": 11.0, "to": 20.0},
+                    {"from": 20.0}
+                ]
+            },
+            "aggs": {
+                "child_range": {
+                    "range": {
+                        "field": "score",
+                        "ranges": [
+                            {"to": 3.0},
+                            {"from": 3.0, "to": 20.0}
+                        ]
+                    }
+                }
+            }
+        }
+    }))
+    .unwrap();
+    let res = crate::aggregation::tests::exec_request(agg_parent_more, &index)?;
+    assert_eq!(
+        res["parent_range"]["buckets"],
+        json!([
+            {"key": "*-3", "doc_count": 1, "to": 3.0,
+                "child_range": {"buckets": [
+                    {"key": "*-3", "doc_count": 1, "to": 3.0},
+                    {"key": "3-20", "doc_count": 0, "from": 3.0, "to": 20.0},
+                    {"key": "20-*", "doc_count": 0, "from": 20.0}
+                ]}
+            },
+            {"key": "3-7", "doc_count": 3, "from": 3.0, "to": 7.0,
+                "child_range": {"buckets": [
+                    {"key": "*-3", "doc_count": 0, "to": 3.0},
+                    {"key": "3-20", "doc_count": 3, "from": 3.0, "to": 20.0},
+                    {"key": "20-*", "doc_count": 0, "from": 20.0}
+                ]}
+            },
+            {"key": "7-11", "doc_count": 1, "from": 7.0, "to": 11.0,
+                "child_range": {"buckets": [
+                    {"key": "*-3", "doc_count": 0, "to": 3.0},
+                    {"key": "3-20", "doc_count": 1, "from": 3.0, "to": 20.0},
+                    {"key": "20-*", "doc_count": 0, "from": 20.0}
+                ]}
+            },
+            {"key": "11-20", "doc_count": 2, "from": 11.0, "to": 20.0,
+                "child_range": {"buckets": [
+                    {"key": "*-3", "doc_count": 0, "to": 3.0},
+                    {"key": "3-20", "doc_count": 2, "from": 3.0, "to": 20.0},
+                    {"key": "20-*", "doc_count": 0, "from": 20.0}
+                ]}
+            },
+            {"key": "20-*", "doc_count": 2, "from": 20.0,
+                "child_range": {"buckets": [
+                    {"key": "*-3", "doc_count": 0, "to": 3.0},
+                    {"key": "3-20", "doc_count": 0, "from": 3.0, "to": 20.0},
+                    {"key": "20-*", "doc_count": 2, "from": 20.0}
+                ]}
+            }
+        ])
+    );
+
+    // Case B: child has more buckets than parent
+    // Parent: terms on text (2 buckets)
+    // Child: range with 4 buckets
+    let agg_child_more: Aggregations = serde_json::from_value(json!({
+        "parent_terms": {
+            "terms": {"field": "text"},
+            "aggs": {
+                "child_range": {
+                    "range": {
+                        "field": "score",
+                        "ranges": [
+                            {"to": 3.0},
+                            {"from": 3.0, "to": 7.0},
+                            {"from": 7.0, "to": 20.0}
+                        ]
+                    }
+                }
+            }
+        }
+    }))
+    .unwrap();
+    let res = crate::aggregation::tests::exec_request(agg_child_more, &index)?;
+
+    assert_eq!(
+        res["parent_terms"],
+        json!({
+            "buckets": [
+                {
+                    "key": "cool",
+                    "doc_count": 7,
+                    "child_range": {
+                        "buckets": [
+                            {"key": "*-3", "doc_count": 1, "to": 3.0},
+                            {"key": "3-7", "doc_count": 2, "from": 3.0, "to": 7.0},
+                            {"key": "7-20", "doc_count": 3, "from": 7.0, "to": 20.0},
+                            {"key": "20-*", "doc_count": 1, "from": 20.0}
+                        ]
+                    }
+                },
+                {
+                    "key": "nohit",
+                    "doc_count": 2,
+                    "child_range": {
+                        "buckets": [
+                            {"key": "*-3", "doc_count": 0, "to": 3.0},
+                            {"key": "3-7", "doc_count": 1, "from": 3.0, "to": 7.0},
+                            {"key": "7-20", "doc_count": 0, "from": 7.0, "to": 20.0},
+                            {"key": "20-*", "doc_count": 1, "from": 20.0}
+                        ]
+                    }
+                }
+            ],
+            "doc_count_error_upper_bound": 0,
+            "sum_other_doc_count": 0
+        })
+    );
+
+    Ok(())
+}
+
+#[test]
+fn test_histogram_as_subagg_parent_more_vs_child_more() -> crate::Result<()> {
+    let index = get_test_index_2_segments(false)?;
+
+    // Case A: parent has more buckets than child
+    // Parent: range with several ranges
+    // Child: histogram with large interval (single bucket per parent)
+    let agg_parent_more: Aggregations = serde_json::from_value(json!({
+        "parent_range": {
+            "range": {
+                "field": "score",
+                "ranges": [
+                    {"to": 3.0},
+                    {"from": 3.0, "to": 7.0},
+                    {"from": 7.0, "to": 11.0},
+                    {"from": 11.0, "to": 20.0},
+                    {"from": 20.0}
+                ]
+            },
+            "aggs": {
+                "child_hist": {"histogram": {"field": "score", "interval": 100.0}}
+            }
+        }
+    }))
+    .unwrap();
+    let res = crate::aggregation::tests::exec_request(agg_parent_more, &index)?;
+    assert_eq!(
+        res["parent_range"]["buckets"],
+        json!([
+            {"key": "*-3", "doc_count": 1, "to": 3.0,
+                "child_hist": {"buckets": [ {"key": 0.0, "doc_count": 1} ]}
+            },
+            {"key": "3-7", "doc_count": 3, "from": 3.0, "to": 7.0,
+                "child_hist": {"buckets": [ {"key": 0.0, "doc_count": 3} ]}
+            },
+            {"key": "7-11", "doc_count": 1, "from": 7.0, "to": 11.0,
+                "child_hist": {"buckets": [ {"key": 0.0, "doc_count": 1} ]}
+            },
+            {"key": "11-20", "doc_count": 2, "from": 11.0, "to": 20.0,
+                "child_hist": {"buckets": [ {"key": 0.0, "doc_count": 2} ]}
+            },
+            {"key": "20-*", "doc_count": 2, "from": 20.0,
+                "child_hist": {"buckets": [ {"key": 0.0, "doc_count": 2} ]}
+            }
+        ])
+    );
+
+    // Case B: child has more buckets than parent
+    // Parent: terms on text -> 2 buckets
+    // Child: histogram with small interval -> multiple buckets including empties
+    let agg_child_more: Aggregations = serde_json::from_value(json!({
+        "parent_terms": {
+            "terms": {"field": "text"},
+            "aggs": {
+                "child_hist": {"histogram": {"field": "score", "interval": 10.0}}
+            }
+        }
+    }))
+    .unwrap();
+    let res = crate::aggregation::tests::exec_request(agg_child_more, &index)?;
+    assert_eq!(
+        res["parent_terms"],
+        json!({
+            "buckets": [
+                {
+                    "key": "cool",
+                    "doc_count": 7,
+                    "child_hist": {
+                        "buckets": [
+                            {"key": 0.0, "doc_count": 4},
+                            {"key": 10.0, "doc_count": 2},
+                            {"key": 20.0, "doc_count": 0},
+                            {"key": 30.0, "doc_count": 0},
+                            {"key": 40.0, "doc_count": 1}
+                        ]
+                    }
+                },
+                {
+                    "key": "nohit",
+                    "doc_count": 2,
+                    "child_hist": {
+                        "buckets": [
+                            {"key": 0.0, "doc_count": 1},
+                            {"key": 10.0, "doc_count": 0},
+                            {"key": 20.0, "doc_count": 0},
+                            {"key": 30.0, "doc_count": 0},
+                            {"key": 40.0, "doc_count": 1}
+                        ]
+                    }
+                }
+            ],
+            "doc_count_error_upper_bound": 0,
+            "sum_other_doc_count": 0
+        })
+    );
+
+    Ok(())
+}
+
+#[test]
+fn test_date_histogram_as_subagg_parent_more_vs_child_more() -> crate::Result<()> {
+    let index = get_test_index_2_segments(false)?;
+
+    // Case A: parent has more buckets than child
+    // Parent: range with several buckets
+    // Child: date_histogram with 30d -> single bucket per parent
+    let agg_parent_more: Aggregations = serde_json::from_value(json!({
+        "parent_range": {
+            "range": {
+                "field": "score",
+                "ranges": [
+                    {"to": 3.0},
+                    {"from": 3.0, "to": 7.0},
+                    {"from": 7.0, "to": 11.0},
+                    {"from": 11.0, "to": 20.0},
+                    {"from": 20.0}
+                ]
+            },
+            "aggs": {
+                "child_date_hist": {"date_histogram": {"field": "date", "fixed_interval": "30d"}}
+            }
+        }
+    }))
+    .unwrap();
+    let res = crate::aggregation::tests::exec_request(agg_parent_more, &index)?;
+    let buckets = res["parent_range"]["buckets"].as_array().unwrap();
+    // Verify each parent bucket has exactly one child date bucket with matching doc_count
+    for bucket in buckets {
+        let parent_count = bucket["doc_count"].as_u64().unwrap();
+        let child_buckets = bucket["child_date_hist"]["buckets"].as_array().unwrap();
+        assert_eq!(child_buckets.len(), 1);
+        assert_eq!(child_buckets[0]["doc_count"], parent_count);
+    }
+
+    // Case B: child has more buckets than parent
+    // Parent: terms on text (2 buckets)
+    // Child: date_histogram with 1d -> multiple buckets
+    let agg_child_more: Aggregations = serde_json::from_value(json!({
+        "parent_terms": {
+            "terms": {"field": "text"},
+            "aggs": {
+                "child_date_hist": {"date_histogram": {"field": "date", "fixed_interval": "1d"}}
+            }
+        }
+    }))
+    .unwrap();
+    let res = crate::aggregation::tests::exec_request(agg_child_more, &index)?;
+    let buckets = res["parent_terms"]["buckets"].as_array().unwrap();
+
+    // cool bucket
+    assert_eq!(buckets[0]["key"], "cool");
+    let cool_buckets = buckets[0]["child_date_hist"]["buckets"].as_array().unwrap();
+    assert_eq!(cool_buckets.len(), 3);
+    assert_eq!(cool_buckets[0]["doc_count"], 1); // day 0
+    assert_eq!(cool_buckets[1]["doc_count"], 4); // day 1
+    assert_eq!(cool_buckets[2]["doc_count"], 2); // day 2
+
+    // nohit bucket
+    assert_eq!(buckets[1]["key"], "nohit");
+    let nohit_buckets = buckets[1]["child_date_hist"]["buckets"].as_array().unwrap();
+    assert_eq!(nohit_buckets.len(), 2);
+    assert_eq!(nohit_buckets[0]["doc_count"], 1); // day 1
+    assert_eq!(nohit_buckets[1]["doc_count"], 1); // day 2
+
+    Ok(())
+}
+
 fn get_avg_req(field_name: &str) -> Aggregation {
    serde_json::from_value(json!({
        "avg": {
@@ -25,6 +451,10 @@ fn get_collector(agg_req: Aggregations) -> AggregationCollector {
 }

 // *** EVERY BUCKET-TYPE SHOULD BE TESTED HERE ***
+// Note: The flushng part of these  tests are outdated, since the buffering change after converting
+// the collection into one collector per request instead of per bucket.
+//
+// However they are useful as they test a complex aggregation requests.
 fn test_aggregation_flushing(
    merge_segments: bool,
    use_distributed_collector: bool,
@@ -37,8 +467,9 @@ fn test_aggregation_flushing(

    let reader = index.reader()?;

-    assert_eq!(DOC_BLOCK_SIZE, 64);
-    // In the tree we cache Documents of DOC_BLOCK_SIZE, before passing them down as one block.
+    assert_eq!(COLLECT_BLOCK_BUFFER_LEN, 64);
+    // In the tree we cache documents of COLLECT_BLOCK_BUFFER_LEN before passing them down as one
+    // block.
    //
    // Build a request so that on the first level we have one full cache, which is then flushed.
    // The same cache should have some residue docs at the end, which are flushed (Range 0-70)
--- a/src/aggregation/bucket/filter.rs
+++ b/src/aggregation/bucket/filter.rs
@@ -1,4 +1,5 @@
 use std::fmt::Debug;
+use std::sync::Arc;

 use common::BitSet;
 use serde::{Deserialize, Deserializer, Serialize, Serializer};
@@ -6,10 +7,14 @@ use serde::{Deserialize, Deserializer, Serialize, Serializer};
 use crate::aggregation::agg_data::{
    build_segment_agg_collectors, AggRefNode, AggregationsSegmentCtx,
 };
+use crate::aggregation::cached_sub_aggs::{
+    CachedSubAggs, HighCardSubAggCache, LowCardSubAggCache, SubAggCache,
+};
 use crate::aggregation::intermediate_agg_result::{
    IntermediateAggregationResult, IntermediateAggregationResults, IntermediateBucketResult,
 };
-use crate::aggregation::segment_agg_result::{CollectorClone, SegmentAggregationCollector};
+use crate::aggregation::segment_agg_result::{BucketIdProvider, SegmentAggregationCollector};
+use crate::aggregation::BucketId;
 use crate::docset::DocSet;
 use crate::query::{AllQuery, EnableScoring, Query, QueryParser};
 use crate::schema::Schema;
@@ -32,7 +37,7 @@ use crate::{DocId, SegmentReader, TantivyError};
 ///
 /// # Implementation Requirements
 ///
-/// Implementors must:
+/// Implementers must:
 /// 1. Derive `Debug`, `Clone`, `Serialize`, and `Deserialize`
 /// 2. Use `#[typetag::serde]` attribute on the impl block
 /// 3. Implement `build_query()` to construct the query from schema/tokenizers
@@ -398,21 +403,24 @@ pub struct FilterAggReqData {
    /// The filter aggregation
    pub req: FilterAggregation,
    /// The segment reader
-    pub segment_reader: SegmentReader,
+    pub segment_reader: Arc<dyn SegmentReader>,
    /// Document evaluator for the filter query (precomputed BitSet)
    /// This is built once when the request data is created
    pub evaluator: DocumentQueryEvaluator,
    /// Reusable buffer for matching documents to minimize allocations during collection
    pub matching_docs_buffer: Vec<DocId>,
+    /// True if this filter aggregation is at the top level of the aggregation tree (not nested).
+    pub is_top_level: bool,
 }

 impl FilterAggReqData {
    pub(crate) fn get_memory_consumption(&self) -> usize {
        // Estimate: name + segment reader reference + bitset + buffer capacity
        self.name.len()
-            + std::mem::size_of::<SegmentReader>()
-            + self.evaluator.bitset.len() / 8 // BitSet memory (bits to bytes)
-            + self.matching_docs_buffer.capacity() * std::mem::size_of::<DocId>()
+        + std::mem::size_of::<Arc<dyn SegmentReader>>()
+        + self.evaluator.bitset.len() / 8 // BitSet memory (bits to bytes)
+        + self.matching_docs_buffer.capacity() * std::mem::size_of::<DocId>()
+        + std::mem::size_of::<bool>()
    }
 }

@@ -431,7 +439,7 @@ impl DocumentQueryEvaluator {
    pub(crate) fn new(
        query: Box<dyn Query>,
        schema: Schema,
-        segment_reader: &SegmentReader,
+        segment_reader: &dyn SegmentReader,
    ) -> crate::Result<Self> {
        let max_doc = segment_reader.max_doc();

@@ -489,17 +497,24 @@ impl Debug for DocumentQueryEvaluator {
    }
 }

-/// Segment collector for filter aggregation
-pub struct SegmentFilterCollector {
-    /// Document count in this bucket
+#[derive(Debug, Clone, PartialEq, Copy)]
+struct DocCount {
    doc_count: u64,
+    bucket_id: BucketId,
+}
+
+/// Segment collector for filter aggregation
+pub struct SegmentFilterCollector<C: SubAggCache> {
+    /// Document counts per parent bucket
+    parent_buckets: Vec<DocCount>,
    /// Sub-aggregation collectors
-    sub_aggregations: Option<Box<dyn SegmentAggregationCollector>>,
+    sub_aggregations: Option<CachedSubAggs<C>>,
+    bucket_id_provider: BucketIdProvider,
    /// Accessor index for this filter aggregation (to access FilterAggReqData)
    accessor_idx: usize,
 }

-impl SegmentFilterCollector {
+impl<C: SubAggCache> SegmentFilterCollector<C> {
    /// Create a new filter segment collector following the new agg_data pattern
    pub(crate) fn from_req_and_validate(
        req: &mut AggregationsSegmentCtx,
@@ -511,47 +526,75 @@ impl SegmentFilterCollector {
        } else {
            None
        };
+        let sub_agg_collector = sub_agg_collector.map(CachedSubAggs::new);

        Ok(SegmentFilterCollector {
-            doc_count: 0,
+            parent_buckets: Vec::new(),
            sub_aggregations: sub_agg_collector,
            accessor_idx: node.idx_in_req_data,
+            bucket_id_provider: BucketIdProvider::default(),
        })
    }
 }

-impl Debug for SegmentFilterCollector {
+pub(crate) fn build_segment_filter_collector(
+    req: &mut AggregationsSegmentCtx,
+    node: &AggRefNode,
+) -> crate::Result<Box<dyn SegmentAggregationCollector>> {
+    let is_top_level = req.per_request.filter_req_data[node.idx_in_req_data]
+        .as_ref()
+        .expect("filter_req_data slot is empty")
+        .is_top_level;
+
+    if is_top_level {
+        Ok(Box::new(
+            SegmentFilterCollector::<LowCardSubAggCache>::from_req_and_validate(req, node)?,
+        ))
+    } else {
+        Ok(Box::new(
+            SegmentFilterCollector::<HighCardSubAggCache>::from_req_and_validate(req, node)?,
+        ))
+    }
+}
+
+impl<C: SubAggCache> Debug for SegmentFilterCollector<C> {
    fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
        f.debug_struct("SegmentFilterCollector")
-            .field("doc_count", &self.doc_count)
+            .field("buckets", &self.parent_buckets)
            .field("has_sub_aggs", &self.sub_aggregations.is_some())
            .field("accessor_idx", &self.accessor_idx)
            .finish()
    }
 }

-impl CollectorClone for SegmentFilterCollector {
-    fn clone_box(&self) -> Box<dyn SegmentAggregationCollector> {
-        // For now, panic - this needs proper implementation with weight recreation
-        panic!("SegmentFilterCollector cloning not yet implemented - requires weight recreation")
-    }
-}
-
-impl SegmentAggregationCollector for SegmentFilterCollector {
+impl<C: SubAggCache> SegmentAggregationCollector for SegmentFilterCollector<C> {
    fn add_intermediate_aggregation_result(
-        self: Box<Self>,
+        &mut self,
        agg_data: &AggregationsSegmentCtx,
        results: &mut IntermediateAggregationResults,
+        parent_bucket_id: BucketId,
    ) -> crate::Result<()> {
        let mut sub_results = IntermediateAggregationResults::default();
+        let bucket_opt = self.parent_buckets.get(parent_bucket_id as usize);

-        if let Some(sub_aggs) = self.sub_aggregations {
-            sub_aggs.add_intermediate_aggregation_result(agg_data, &mut sub_results)?;
+        if let Some(sub_aggs) = &mut self.sub_aggregations {
+            sub_aggs
+                .get_sub_agg_collector()
+                .add_intermediate_aggregation_result(
+                    agg_data,
+                    &mut sub_results,
+                    // Here we create a new bucket ID for sub-aggregations if the bucket doesn't
+                    // exist, so that sub-aggregations can still produce results (e.g., zero doc
+                    // count)
+                    bucket_opt
+                        .map(|bucket| bucket.bucket_id)
+                        .unwrap_or(self.bucket_id_provider.next_bucket_id()),
+                )?;
        }

        // Create the filter bucket result
        let filter_bucket_result = IntermediateBucketResult::Filter {
-            doc_count: self.doc_count,
+            doc_count: bucket_opt.map(|b| b.doc_count).unwrap_or(0),
            sub_aggregations: sub_results,
        };

@@ -570,32 +613,17 @@ impl SegmentAggregationCollector for SegmentFilterCollector {
        Ok(())
    }

-    fn collect(&mut self, doc: DocId, agg_data: &mut AggregationsSegmentCtx) -> crate::Result<()> {
-        // Access the evaluator from FilterAggReqData
-        let req_data = agg_data.get_filter_req_data(self.accessor_idx);
-
-        // O(1) BitSet lookup to check if document matches filter
-        if req_data.evaluator.matches_document(doc) {
-            self.doc_count += 1;
-
-            // If we have sub-aggregations, collect on them for this filtered document
-            if let Some(sub_aggs) = &mut self.sub_aggregations {
-                sub_aggs.collect(doc, agg_data)?;
-            }
-        }
-        Ok(())
-    }
-
-    #[inline]
-    fn collect_block(
+    fn collect(
        &mut self,
-        docs: &[DocId],
+        parent_bucket_id: BucketId,
+        docs: &[crate::DocId],
        agg_data: &mut AggregationsSegmentCtx,
    ) -> crate::Result<()> {
        if docs.is_empty() {
            return Ok(());
        }

+        let mut bucket = self.parent_buckets[parent_bucket_id as usize];
        // Take the request data to avoid borrow checker issues with sub-aggregations
        let mut req = agg_data.take_filter_req_data(self.accessor_idx);

@@ -604,18 +632,24 @@ impl SegmentAggregationCollector for SegmentFilterCollector {
        req.evaluator
            .filter_batch(docs, &mut req.matching_docs_buffer);

-        self.doc_count += req.matching_docs_buffer.len() as u64;
+        bucket.doc_count += req.matching_docs_buffer.len() as u64;

        // Batch process sub-aggregations if we have matches
        if !req.matching_docs_buffer.is_empty() {
            if let Some(sub_aggs) = &mut self.sub_aggregations {
-                // Use collect_block for better sub-aggregation performance
-                sub_aggs.collect_block(&req.matching_docs_buffer, agg_data)?;
+                for &doc_id in &req.matching_docs_buffer {
+                    sub_aggs.push(bucket.bucket_id, doc_id);
+                }
            }
        }

        // Put the request data back
        agg_data.put_back_filter_req_data(self.accessor_idx, req);
+        if let Some(sub_aggs) = &mut self.sub_aggregations {
+            sub_aggs.check_flush_local(agg_data)?;
+        }
+        // put back bucket
+        self.parent_buckets[parent_bucket_id as usize] = bucket;

        Ok(())
    }
@@ -626,6 +660,21 @@ impl SegmentAggregationCollector for SegmentFilterCollector {
        }
        Ok(())
    }
+
+    fn prepare_max_bucket(
+        &mut self,
+        max_bucket: BucketId,
+        _agg_data: &AggregationsSegmentCtx,
+    ) -> crate::Result<()> {
+        while self.parent_buckets.len() <= max_bucket as usize {
+            let bucket_id = self.bucket_id_provider.next_bucket_id();
+            self.parent_buckets.push(DocCount {
+                doc_count: 0,
+                bucket_id,
+            });
+        }
+        Ok(())
+    }
 }

 /// Intermediate result for filter aggregation
@@ -639,16 +688,14 @@ pub struct IntermediateFilterBucketResult {

 #[cfg(test)]
 mod tests {
-    use std::time::Instant;
-
    use serde_json::{json, Value};

    use super::*;
    use crate::aggregation::agg_req::Aggregations;
    use crate::aggregation::agg_result::AggregationResults;
    use crate::aggregation::{AggContextParams, AggregationCollector};
-    use crate::query::{AllQuery, QueryParser, TermQuery};
-    use crate::schema::{IndexRecordOption, Schema, Term, FAST, INDEXED, STORED, TEXT};
+    use crate::query::{AllQuery, TermQuery};
+    use crate::schema::{IndexRecordOption, Schema, Term, FAST, INDEXED, TEXT};
    use crate::{doc, Index, IndexWriter};

    // Test helper functions
@@ -729,12 +776,13 @@ mod tests {

        let schema = schema_builder.build();
        let index = Index::create_in_ram(schema);
-        let mut writer: IndexWriter = index.writer(50_000_000)?;
+        let mut writer: IndexWriter = index.writer_for_tests()?;

        writer.add_document(doc!(
            category => "electronics", brand => "apple",
            price => 999u64, rating => 4.5f64, in_stock => true
        ))?;
+        writer.commit()?;
        writer.add_document(doc!(
            category => "electronics", brand => "samsung",
            price => 799u64, rating => 4.2f64, in_stock => true
@@ -938,7 +986,7 @@ mod tests {
        let index = create_standard_test_index()?;
        let reader = index.reader()?;
        let searcher = reader.searcher();
-
+        assert_eq!(searcher.segment_readers().len(), 2);
        let agg = json!({
            "premium_electronics": {
                "filter": "category:electronics AND price:[800 TO *]",
@@ -1520,9 +1568,9 @@ mod tests {
        let searcher = reader.searcher();

        let agg = json!({
-            "test": {
-                "filter": deserialized,
-                "aggs": { "count": { "value_count": { "field": "brand" } } }
+                "test": {
+                    "filter": deserialized,
+                    "aggs": { "count": { "value_count": { "field": "brand" } } }
            }
        });

--- a/src/aggregation/bucket/histogram/histogram.rs
+++ b/src/aggregation/bucket/histogram/histogram.rs
@@ -1,6 +1,6 @@
 use std::cmp::Ordering;

-use columnar::{Column, ColumnBlockAccessor, ColumnType};
+use columnar::{Column, ColumnType};
 use rustc_hash::FxHashMap;
 use serde::{Deserialize, Serialize};
 use tantivy_bitpacker::minmax;
@@ -8,14 +8,14 @@ use tantivy_bitpacker::minmax;
 use crate::aggregation::agg_data::{
    build_segment_agg_collectors, AggRefNode, AggregationsSegmentCtx,
 };
-use crate::aggregation::agg_limits::MemoryConsumption;
 use crate::aggregation::agg_req::Aggregations;
 use crate::aggregation::agg_result::BucketEntry;
+use crate::aggregation::cached_sub_aggs::{CachedSubAggs, HighCardCachedSubAggs};
 use crate::aggregation::intermediate_agg_result::{
    IntermediateAggregationResult, IntermediateAggregationResults, IntermediateBucketResult,
    IntermediateHistogramBucketEntry,
 };
-use crate::aggregation::segment_agg_result::SegmentAggregationCollector;
+use crate::aggregation::segment_agg_result::{BucketIdProvider, SegmentAggregationCollector};
 use crate::aggregation::*;
 use crate::TantivyError;

@@ -26,13 +26,8 @@ pub struct HistogramAggReqData {
    pub accessor: Column<u64>,
    /// The field type of the fast field.
    pub field_type: ColumnType,
-    /// The column block accessor to access the fast field values.
-    pub column_block_accessor: ColumnBlockAccessor<u64>,
    /// The name of the aggregation.
    pub name: String,
-    /// The sub aggregation blueprint, used to create sub aggregations for each bucket.
-    /// Will be filled during initialization of the collector.
-    pub sub_aggregation_blueprint: Option<Box<dyn SegmentAggregationCollector>>,
    /// The histogram aggregation request.
    pub req: HistogramAggregation,
    /// True if this is a date_histogram aggregation.
@@ -257,18 +252,24 @@ impl HistogramBounds {
 pub(crate) struct SegmentHistogramBucketEntry {
    pub key: f64,
    pub doc_count: u64,
+    pub bucket_id: BucketId,
 }

 impl SegmentHistogramBucketEntry {
    pub(crate) fn into_intermediate_bucket_entry(
        self,
-        sub_aggregation: Option<Box<dyn SegmentAggregationCollector>>,
+        sub_aggregation: &mut Option<HighCardCachedSubAggs>,
        agg_data: &AggregationsSegmentCtx,
    ) -> crate::Result<IntermediateHistogramBucketEntry> {
        let mut sub_aggregation_res = IntermediateAggregationResults::default();
        if let Some(sub_aggregation) = sub_aggregation {
            sub_aggregation
-                .add_intermediate_aggregation_result(agg_data, &mut sub_aggregation_res)?;
+                .get_sub_agg_collector()
+                .add_intermediate_aggregation_result(
+                    agg_data,
+                    &mut sub_aggregation_res,
+                    self.bucket_id,
+                )?;
        }
        Ok(IntermediateHistogramBucketEntry {
            key: self.key,
@@ -278,27 +279,38 @@ impl SegmentHistogramBucketEntry {
    }
 }

+#[derive(Clone, Debug, Default)]
+struct HistogramBuckets {
+    pub buckets: FxHashMap<i64, SegmentHistogramBucketEntry>,
+}
+
 /// The collector puts values from the fast field into the correct buckets and does a conversion to
 /// the correct datatype.
-#[derive(Clone, Debug)]
+#[derive(Debug)]
 pub struct SegmentHistogramCollector {
    /// The buckets containing the aggregation data.
-    buckets: FxHashMap<i64, SegmentHistogramBucketEntry>,
-    sub_aggregations: FxHashMap<i64, Box<dyn SegmentAggregationCollector>>,
+    /// One Histogram bucket per parent bucket id.
+    parent_buckets: Vec<HistogramBuckets>,
+    sub_agg: Option<HighCardCachedSubAggs>,
    accessor_idx: usize,
+    bucket_id_provider: BucketIdProvider,
 }

 impl SegmentAggregationCollector for SegmentHistogramCollector {
    fn add_intermediate_aggregation_result(
-        self: Box<Self>,
+        &mut self,
        agg_data: &AggregationsSegmentCtx,
        results: &mut IntermediateAggregationResults,
+        parent_bucket_id: BucketId,
    ) -> crate::Result<()> {
        let name = agg_data
            .get_histogram_req_data(self.accessor_idx)
            .name
            .clone();
-        let bucket = self.into_intermediate_bucket_result(agg_data)?;
+        // TODO: avoid prepare_max_bucket here and handle empty buckets.
+        self.prepare_max_bucket(parent_bucket_id, agg_data)?;
+        let histogram = std::mem::take(&mut self.parent_buckets[parent_bucket_id as usize]);
+        let bucket = self.add_intermediate_bucket_result(agg_data, histogram)?;
        results.push(name, IntermediateAggregationResult::Bucket(bucket))?;

        Ok(())
@@ -307,44 +319,40 @@ impl SegmentAggregationCollector for SegmentHistogramCollector {
    #[inline]
    fn collect(
        &mut self,
-        doc: crate::DocId,
-        agg_data: &mut AggregationsSegmentCtx,
-    ) -> crate::Result<()> {
-        self.collect_block(&[doc], agg_data)
-    }
-
-    #[inline]
-    fn collect_block(
-        &mut self,
+        parent_bucket_id: BucketId,
        docs: &[crate::DocId],
        agg_data: &mut AggregationsSegmentCtx,
    ) -> crate::Result<()> {
-        let mut req = agg_data.take_histogram_req_data(self.accessor_idx);
+        let req = agg_data.take_histogram_req_data(self.accessor_idx);
        let mem_pre = self.get_memory_consumption();
+        let buckets = &mut self.parent_buckets[parent_bucket_id as usize].buckets;

        let bounds = req.bounds;
        let interval = req.req.interval;
        let offset = req.offset;
        let get_bucket_pos = |val| get_bucket_pos_f64(val, interval, offset) as i64;

-        req.column_block_accessor.fetch_block(docs, &req.accessor);
-        for (doc, val) in req
+        agg_data
+            .column_block_accessor
+            .fetch_block(docs, &req.accessor);
+        for (doc, val) in agg_data
            .column_block_accessor
            .iter_docid_vals(docs, &req.accessor)
        {
-            let val = f64_from_fastfield_u64(val, &req.field_type);
+            let val = f64_from_fastfield_u64(val, req.field_type);
            let bucket_pos = get_bucket_pos(val);
            if bounds.contains(val) {
-                let bucket = self.buckets.entry(bucket_pos).or_insert_with(|| {
+                let bucket = buckets.entry(bucket_pos).or_insert_with(|| {
                    let key = get_bucket_key_from_pos(bucket_pos as f64, interval, offset);
-                    SegmentHistogramBucketEntry { key, doc_count: 0 }
+                    SegmentHistogramBucketEntry {
+                        key,
+                        doc_count: 0,
+                        bucket_id: self.bucket_id_provider.next_bucket_id(),
+                    }
                });
                bucket.doc_count += 1;
-                if let Some(sub_aggregation_blueprint) = req.sub_aggregation_blueprint.as_ref() {
-                    self.sub_aggregations
-                        .entry(bucket_pos)
-                        .or_insert_with(|| sub_aggregation_blueprint.clone())
-                        .collect(doc, agg_data)?;
+                if let Some(sub_agg) = &mut self.sub_agg {
+                    sub_agg.push(bucket.bucket_id, doc);
                }
            }
        }
@@ -358,14 +366,30 @@ impl SegmentAggregationCollector for SegmentHistogramCollector {
                .add_memory_consumed(mem_delta as u64)?;
        }

+        if let Some(sub_agg) = &mut self.sub_agg {
+            sub_agg.check_flush_local(agg_data)?;
+        }
+
        Ok(())
    }

    fn flush(&mut self, agg_data: &mut AggregationsSegmentCtx) -> crate::Result<()> {
-        for sub_aggregation in self.sub_aggregations.values_mut() {
+        if let Some(sub_aggregation) = &mut self.sub_agg {
            sub_aggregation.flush(agg_data)?;
        }
+        Ok(())
+    }

+    fn prepare_max_bucket(
+        &mut self,
+        max_bucket: BucketId,
+        _agg_data: &AggregationsSegmentCtx,
+    ) -> crate::Result<()> {
+        while self.parent_buckets.len() <= max_bucket as usize {
+            self.parent_buckets.push(HistogramBuckets {
+                buckets: FxHashMap::default(),
+            });
+        }
        Ok(())
    }
 }
@@ -373,22 +397,19 @@ impl SegmentAggregationCollector for SegmentHistogramCollector {
 impl SegmentHistogramCollector {
    fn get_memory_consumption(&self) -> usize {
        let self_mem = std::mem::size_of::<Self>();
-        let sub_aggs_mem = self.sub_aggregations.memory_consumption();
-        let buckets_mem = self.buckets.memory_consumption();
-        self_mem + sub_aggs_mem + buckets_mem
+        let buckets_mem = self.parent_buckets.len() * std::mem::size_of::<HistogramBuckets>();
+        self_mem + buckets_mem
    }
    /// Converts the collector result into a intermediate bucket result.
-    pub fn into_intermediate_bucket_result(
-        self,
+    fn add_intermediate_bucket_result(
+        &mut self,
        agg_data: &AggregationsSegmentCtx,
+        histogram: HistogramBuckets,
    ) -> crate::Result<IntermediateBucketResult> {
-        let mut buckets = Vec::with_capacity(self.buckets.len());
+        let mut buckets = Vec::with_capacity(histogram.buckets.len());

-        for (bucket_pos, bucket) in self.buckets {
-            let bucket_res = bucket.into_intermediate_bucket_entry(
-                self.sub_aggregations.get(&bucket_pos).cloned(),
-                agg_data,
-            );
+        for bucket in histogram.buckets.into_values() {
+            let bucket_res = bucket.into_intermediate_bucket_entry(&mut self.sub_agg, agg_data);

            buckets.push(bucket_res?);
        }
@@ -408,7 +429,7 @@ impl SegmentHistogramCollector {
        agg_data: &mut AggregationsSegmentCtx,
        node: &AggRefNode,
    ) -> crate::Result<Self> {
-        let blueprint = if !node.children.is_empty() {
+        let sub_agg = if !node.children.is_empty() {
            Some(build_segment_agg_collectors(agg_data, &node.children)?)
        } else {
            None
@@ -423,13 +444,13 @@ impl SegmentHistogramCollector {
            max: f64::MAX,
        });
        req_data.offset = req_data.req.offset.unwrap_or(0.0);
-
-        req_data.sub_aggregation_blueprint = blueprint;
+        let sub_agg = sub_agg.map(CachedSubAggs::new);

        Ok(Self {
-            buckets: Default::default(),
-            sub_aggregations: Default::default(),
+            parent_buckets: Default::default(),
+            sub_agg,
            accessor_idx: node.idx_in_req_data,
+            bucket_id_provider: BucketIdProvider::default(),
        })
    }
 }
--- a/src/aggregation/bucket/range.rs
+++ b/src/aggregation/bucket/range.rs
@@ -1,18 +1,22 @@
 use std::fmt::Debug;
 use std::ops::Range;

-use columnar::{Column, ColumnBlockAccessor, ColumnType};
+use columnar::{Column, ColumnType};
 use rustc_hash::FxHashMap;
 use serde::{Deserialize, Serialize};

 use crate::aggregation::agg_data::{
    build_segment_agg_collectors, AggRefNode, AggregationsSegmentCtx,
 };
+use crate::aggregation::agg_limits::AggregationLimitsGuard;
+use crate::aggregation::cached_sub_aggs::{
+    CachedSubAggs, HighCardSubAggCache, LowCardCachedSubAggs, LowCardSubAggCache, SubAggCache,
+};
 use crate::aggregation::intermediate_agg_result::{
    IntermediateAggregationResult, IntermediateAggregationResults, IntermediateBucketResult,
    IntermediateRangeBucketEntry, IntermediateRangeBucketResult,
 };
-use crate::aggregation::segment_agg_result::SegmentAggregationCollector;
+use crate::aggregation::segment_agg_result::{BucketIdProvider, SegmentAggregationCollector};
 use crate::aggregation::*;
 use crate::TantivyError;

@@ -23,12 +27,12 @@ pub struct RangeAggReqData {
    pub accessor: Column<u64>,
    /// The type of the fast field.
    pub field_type: ColumnType,
-    /// The column block accessor to access the fast field values.
-    pub column_block_accessor: ColumnBlockAccessor<u64>,
    /// The range aggregation request.
    pub req: RangeAggregation,
    /// The name of the aggregation.
    pub name: String,
+    /// Whether this is a top-level aggregation.
+    pub is_top_level: bool,
 }

 impl RangeAggReqData {
@@ -151,19 +155,47 @@ pub(crate) struct SegmentRangeAndBucketEntry {

 /// The collector puts values from the fast field into the correct buckets and does a conversion to
 /// the correct datatype.
-#[derive(Clone, Debug)]
-pub struct SegmentRangeCollector {
+pub struct SegmentRangeCollector<C: SubAggCache> {
    /// The buckets containing the aggregation data.
-    buckets: Vec<SegmentRangeAndBucketEntry>,
+    /// One for each ParentBucketId
+    parent_buckets: Vec<Vec<SegmentRangeAndBucketEntry>>,
    column_type: ColumnType,
    pub(crate) accessor_idx: usize,
+    sub_agg: Option<CachedSubAggs<C>>,
+    /// Here things get a bit weird. We need to assign unique bucket ids across all
+    /// parent buckets. So we keep track of the next available bucket id here.
+    /// This allows a kind of flattening of the bucket ids across all parent buckets.
+    /// E.g. in nested aggregations:
+    /// Term Agg -> Range aggregation -> Stats aggregation
+    /// E.g. the Term Agg creates 3 buckets ["INFO", "ERROR", "WARN"], each of these has a Range
+    /// aggregation with 4 buckets. The Range aggregation will create buckets with ids:
+    /// - INFO: 0,1,2,3
+    /// - ERROR: 4,5,6,7
+    /// - WARN: 8,9,10,11
+    ///
+    /// This allows the Stats aggregation to have unique bucket ids to refer to.
+    bucket_id_provider: BucketIdProvider,
+    limits: AggregationLimitsGuard,
 }

+impl<C: SubAggCache> Debug for SegmentRangeCollector<C> {
+    fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
+        f.debug_struct("SegmentRangeCollector")
+            .field("parent_buckets_len", &self.parent_buckets.len())
+            .field("column_type", &self.column_type)
+            .field("accessor_idx", &self.accessor_idx)
+            .field("has_sub_agg", &self.sub_agg.is_some())
+            .finish()
+    }
+}
+
+/// TODO: Bad naming, there's also SegmentRangeAndBucketEntry
 #[derive(Clone)]
 pub(crate) struct SegmentRangeBucketEntry {
    pub key: Key,
    pub doc_count: u64,
-    pub sub_aggregation: Option<Box<dyn SegmentAggregationCollector>>,
+    // pub sub_aggregation: Option<Box<dyn SegmentAggregationCollector>>,
+    pub bucket_id: BucketId,
    /// The from range of the bucket. Equals `f64::MIN` when `None`.
    pub from: Option<f64>,
    /// The to range of the bucket. Equals `f64::MAX` when `None`. Open interval, `to` is not
@@ -184,48 +216,50 @@ impl Debug for SegmentRangeBucketEntry {
 impl SegmentRangeBucketEntry {
    pub(crate) fn into_intermediate_bucket_entry(
        self,
-        agg_data: &AggregationsSegmentCtx,
    ) -> crate::Result<IntermediateRangeBucketEntry> {
-        let mut sub_aggregation_res = IntermediateAggregationResults::default();
-        if let Some(sub_aggregation) = self.sub_aggregation {
-            sub_aggregation
-                .add_intermediate_aggregation_result(agg_data, &mut sub_aggregation_res)?
-        } else {
-            Default::default()
-        };
+        let sub_aggregation = IntermediateAggregationResults::default();

        Ok(IntermediateRangeBucketEntry {
            key: self.key.into(),
            doc_count: self.doc_count,
-            sub_aggregation: sub_aggregation_res,
+            sub_aggregation_res: sub_aggregation,
            from: self.from,
            to: self.to,
        })
    }
 }

-impl SegmentAggregationCollector for SegmentRangeCollector {
+impl<C: SubAggCache> SegmentAggregationCollector for SegmentRangeCollector<C> {
    fn add_intermediate_aggregation_result(
-        self: Box<Self>,
+        &mut self,
        agg_data: &AggregationsSegmentCtx,
        results: &mut IntermediateAggregationResults,
+        parent_bucket_id: BucketId,
    ) -> crate::Result<()> {
+        self.prepare_max_bucket(parent_bucket_id, agg_data)?;
        let field_type = self.column_type;
        let name = agg_data
            .get_range_req_data(self.accessor_idx)
            .name
            .to_string();

-        let buckets: FxHashMap<SerializedKey, IntermediateRangeBucketEntry> = self
-            .buckets
+        let buckets = std::mem::take(&mut self.parent_buckets[parent_bucket_id as usize]);
+
+        let buckets: FxHashMap<SerializedKey, IntermediateRangeBucketEntry> = buckets
            .into_iter()
-            .map(move |range_bucket| {
-                Ok((
-                    range_to_string(&range_bucket.range, &field_type)?,
-                    range_bucket
-                        .bucket
-                        .into_intermediate_bucket_entry(agg_data)?,
-                ))
+            .map(|range_bucket| {
+                let bucket_id = range_bucket.bucket.bucket_id;
+                let mut agg = range_bucket.bucket.into_intermediate_bucket_entry()?;
+                if let Some(sub_aggregation) = &mut self.sub_agg {
+                    sub_aggregation
+                        .get_sub_agg_collector()
+                        .add_intermediate_aggregation_result(
+                            agg_data,
+                            &mut agg.sub_aggregation_res,
+                            bucket_id,
+                        )?;
+                }
+                Ok((range_to_string(&range_bucket.range, &field_type)?, agg))
            })
            .collect::<crate::Result<_>>()?;

@@ -242,73 +276,114 @@ impl SegmentAggregationCollector for SegmentRangeCollector {
    #[inline]
    fn collect(
        &mut self,
-        doc: crate::DocId,
-        agg_data: &mut AggregationsSegmentCtx,
-    ) -> crate::Result<()> {
-        self.collect_block(&[doc], agg_data)
-    }
-
-    #[inline]
-    fn collect_block(
-        &mut self,
+        parent_bucket_id: BucketId,
        docs: &[crate::DocId],
        agg_data: &mut AggregationsSegmentCtx,
    ) -> crate::Result<()> {
-        // Take request data to avoid borrow conflicts during sub-aggregation
-        let mut req = agg_data.take_range_req_data(self.accessor_idx);
+        let req = agg_data.take_range_req_data(self.accessor_idx);

-        req.column_block_accessor.fetch_block(docs, &req.accessor);
+        agg_data
+            .column_block_accessor
+            .fetch_block(docs, &req.accessor);

-        for (doc, val) in req
+        let buckets = &mut self.parent_buckets[parent_bucket_id as usize];
+
+        for (doc, val) in agg_data
            .column_block_accessor
            .iter_docid_vals(docs, &req.accessor)
        {
-            let bucket_pos = self.get_bucket_pos(val);
-            let bucket = &mut self.buckets[bucket_pos];
+            let bucket_pos = get_bucket_pos(val, buckets);
+            let bucket = &mut buckets[bucket_pos];
            bucket.bucket.doc_count += 1;
-            if let Some(sub_agg) = bucket.bucket.sub_aggregation.as_mut() {
-                sub_agg.collect(doc, agg_data)?;
+            if let Some(sub_agg) = self.sub_agg.as_mut() {
+                sub_agg.push(bucket.bucket.bucket_id, doc);
            }
        }

        agg_data.put_back_range_req_data(self.accessor_idx, req);
+        if let Some(sub_agg) = self.sub_agg.as_mut() {
+            sub_agg.check_flush_local(agg_data)?;
+        }

        Ok(())
    }

    fn flush(&mut self, agg_data: &mut AggregationsSegmentCtx) -> crate::Result<()> {
-        for bucket in self.buckets.iter_mut() {
-            if let Some(sub_agg) = bucket.bucket.sub_aggregation.as_mut() {
-                sub_agg.flush(agg_data)?;
-            }
+        if let Some(sub_agg) = self.sub_agg.as_mut() {
+            sub_agg.flush(agg_data)?;
        }
        Ok(())
    }
+
+    fn prepare_max_bucket(
+        &mut self,
+        max_bucket: BucketId,
+        agg_data: &AggregationsSegmentCtx,
+    ) -> crate::Result<()> {
+        while self.parent_buckets.len() <= max_bucket as usize {
+            let new_buckets = self.create_new_buckets(agg_data)?;
+            self.parent_buckets.push(new_buckets);
+        }
+
+        Ok(())
+    }
+}
+/// Build a concrete `SegmentRangeCollector` with either a Vec- or HashMap-backed
+/// bucket storage, depending on the column type and aggregation level.
+pub(crate) fn build_segment_range_collector(
+    agg_data: &mut AggregationsSegmentCtx,
+    node: &AggRefNode,
+) -> crate::Result<Box<dyn SegmentAggregationCollector>> {
+    let accessor_idx = node.idx_in_req_data;
+    let req_data = agg_data.get_range_req_data(node.idx_in_req_data);
+    let field_type = req_data.field_type;
+
+    // TODO: A better metric instead of is_top_level would be the number of buckets expected.
+    // E.g. If range agg is not top level, but the parent is a bucket agg with less than 10 buckets,
+    // we can are still in low cardinality territory.
+    let is_low_card = req_data.is_top_level && req_data.req.ranges.len() <= 64;
+
+    let sub_agg = if !node.children.is_empty() {
+        Some(build_segment_agg_collectors(agg_data, &node.children)?)
+    } else {
+        None
+    };
+
+    if is_low_card {
+        Ok(Box::new(SegmentRangeCollector::<LowCardSubAggCache> {
+            sub_agg: sub_agg.map(LowCardCachedSubAggs::new),
+            column_type: field_type,
+            accessor_idx,
+            parent_buckets: Vec::new(),
+            bucket_id_provider: BucketIdProvider::default(),
+            limits: agg_data.context.limits.clone(),
+        }))
+    } else {
+        Ok(Box::new(SegmentRangeCollector::<HighCardSubAggCache> {
+            sub_agg: sub_agg.map(CachedSubAggs::new),
+            column_type: field_type,
+            accessor_idx,
+            parent_buckets: Vec::new(),
+            bucket_id_provider: BucketIdProvider::default(),
+            limits: agg_data.context.limits.clone(),
+        }))
+    }
 }

-impl SegmentRangeCollector {
-    pub(crate) fn from_req_and_validate(
-        req_data: &mut AggregationsSegmentCtx,
-        node: &AggRefNode,
-    ) -> crate::Result<Self> {
-        let accessor_idx = node.idx_in_req_data;
-        let (field_type, ranges) = {
-            let req_view = req_data.get_range_req_data(node.idx_in_req_data);
-            (req_view.field_type, req_view.req.ranges.clone())
-        };
-
+impl<C: SubAggCache> SegmentRangeCollector<C> {
+    pub(crate) fn create_new_buckets(
+        &mut self,
+        agg_data: &AggregationsSegmentCtx,
+    ) -> crate::Result<Vec<SegmentRangeAndBucketEntry>> {
+        let field_type = self.column_type;
+        let req_data = agg_data.get_range_req_data(self.accessor_idx);
        // The range input on the request is f64.
        // We need to convert to u64 ranges, because we read the values as u64.
        // The mapping from the conversion is monotonic so ordering is preserved.
-        let sub_agg_prototype = if !node.children.is_empty() {
-            Some(build_segment_agg_collectors(req_data, &node.children)?)
-        } else {
-            None
-        };
-
-        let buckets: Vec<_> = extend_validate_ranges(&ranges, &field_type)?
+        let buckets: Vec<_> = extend_validate_ranges(&req_data.req.ranges, &field_type)?
            .iter()
            .map(|range| {
+                let bucket_id = self.bucket_id_provider.next_bucket_id();
                let key = range
                    .key
                    .clone()
@@ -317,20 +392,20 @@ impl SegmentRangeCollector {
                let to = if range.range.end == u64::MAX {
                    None
                } else {
-                    Some(f64_from_fastfield_u64(range.range.end, &field_type))
+                    Some(f64_from_fastfield_u64(range.range.end, field_type))
                };
                let from = if range.range.start == u64::MIN {
                    None
                } else {
-                    Some(f64_from_fastfield_u64(range.range.start, &field_type))
+                    Some(f64_from_fastfield_u64(range.range.start, field_type))
                };
-                let sub_aggregation = sub_agg_prototype.clone();
+                // let sub_aggregation = sub_agg_prototype.clone();

                Ok(SegmentRangeAndBucketEntry {
                    range: range.range.clone(),
                    bucket: SegmentRangeBucketEntry {
                        doc_count: 0,
-                        sub_aggregation,
+                        bucket_id,
                        key,
                        from,
                        to,
@@ -339,27 +414,20 @@ impl SegmentRangeCollector {
            })
            .collect::<crate::Result<_>>()?;

-        req_data.context.limits.add_memory_consumed(
+        self.limits.add_memory_consumed(
            buckets.len() as u64 * std::mem::size_of::<SegmentRangeAndBucketEntry>() as u64,
        )?;
-
-        Ok(SegmentRangeCollector {
-            buckets,
-            column_type: field_type,
-            accessor_idx,
-        })
-    }
-
-    #[inline]
-    fn get_bucket_pos(&self, val: u64) -> usize {
-        let pos = self
-            .buckets
-            .binary_search_by_key(&val, |probe| probe.range.start)
-            .unwrap_or_else(|pos| pos - 1);
-        debug_assert!(self.buckets[pos].range.contains(&val));
-        pos
+        Ok(buckets)
    }
 }
+#[inline]
+fn get_bucket_pos(val: u64, buckets: &[SegmentRangeAndBucketEntry]) -> usize {
+    let pos = buckets
+        .binary_search_by_key(&val, |probe| probe.range.start)
+        .unwrap_or_else(|pos| pos - 1);
+    debug_assert!(buckets[pos].range.contains(&val));
+    pos
+}

 /// Converts the user provided f64 range value to fast field value space.
 ///
@@ -456,7 +524,7 @@ pub(crate) fn range_to_string(
            let val = i64::from_u64(val);
            format_date(val)
        } else {
-            Ok(f64_from_fastfield_u64(val, field_type).to_string())
+            Ok(f64_from_fastfield_u64(val, *field_type).to_string())
        }
    };

@@ -486,7 +554,7 @@ mod tests {
    pub fn get_collector_from_ranges(
        ranges: Vec<RangeAggregationRange>,
        field_type: ColumnType,
-    ) -> SegmentRangeCollector {
+    ) -> SegmentRangeCollector<HighCardSubAggCache> {
        let req = RangeAggregation {
            field: "dummy".to_string(),
            ranges,
@@ -506,30 +574,33 @@ mod tests {
                let to = if range.range.end == u64::MAX {
                    None
                } else {
-                    Some(f64_from_fastfield_u64(range.range.end, &field_type))
+                    Some(f64_from_fastfield_u64(range.range.end, field_type))
                };
                let from = if range.range.start == u64::MIN {
                    None
                } else {
-                    Some(f64_from_fastfield_u64(range.range.start, &field_type))
+                    Some(f64_from_fastfield_u64(range.range.start, field_type))
                };
                SegmentRangeAndBucketEntry {
                    range: range.range.clone(),
                    bucket: SegmentRangeBucketEntry {
                        doc_count: 0,
-                        sub_aggregation: None,
                        key,
                        from,
                        to,
+                        bucket_id: 0,
                    },
                }
            })
            .collect();

        SegmentRangeCollector {
-            buckets,
+            parent_buckets: vec![buckets],
            column_type: field_type,
            accessor_idx: 0,
+            sub_agg: None,
+            bucket_id_provider: Default::default(),
+            limits: AggregationLimitsGuard::default(),
        }
    }

@@ -776,7 +847,7 @@ mod tests {
        let buckets = vec![(10f64..20f64).into(), (30f64..40f64).into()];
        let collector = get_collector_from_ranges(buckets, ColumnType::F64);

-        let buckets = collector.buckets;
+        let buckets = collector.parent_buckets[0].clone();
        assert_eq!(buckets[0].range.start, u64::MIN);
        assert_eq!(buckets[0].range.end, 10f64.to_u64());
        assert_eq!(buckets[1].range.start, 10f64.to_u64());
@@ -799,7 +870,7 @@ mod tests {
        ];
        let collector = get_collector_from_ranges(buckets, ColumnType::F64);

-        let buckets = collector.buckets;
+        let buckets = collector.parent_buckets[0].clone();
        assert_eq!(buckets[0].range.start, u64::MIN);
        assert_eq!(buckets[0].range.end, 10f64.to_u64());
        assert_eq!(buckets[1].range.start, 10f64.to_u64());
@@ -814,7 +885,7 @@ mod tests {
        let buckets = vec![(-10f64..-1f64).into()];
        let collector = get_collector_from_ranges(buckets, ColumnType::F64);

-        let buckets = collector.buckets;
+        let buckets = collector.parent_buckets[0].clone();
        assert_eq!(&buckets[0].bucket.key.to_string(), "*--10");
        assert_eq!(&buckets[buckets.len() - 1].bucket.key.to_string(), "-1-*");
    }
@@ -823,7 +894,7 @@ mod tests {
        let buckets = vec![(0f64..10f64).into()];
        let collector = get_collector_from_ranges(buckets, ColumnType::F64);

-        let buckets = collector.buckets;
+        let buckets = collector.parent_buckets[0].clone();
        assert_eq!(&buckets[0].bucket.key.to_string(), "*-0");
        assert_eq!(&buckets[buckets.len() - 1].bucket.key.to_string(), "10-*");
    }
@@ -832,7 +903,7 @@ mod tests {
    fn range_binary_search_test_u64() {
        let check_ranges = |ranges: Vec<RangeAggregationRange>| {
            let collector = get_collector_from_ranges(ranges, ColumnType::U64);
-            let search = |val: u64| collector.get_bucket_pos(val);
+            let search = |val: u64| get_bucket_pos(val, &collector.parent_buckets[0]);

            assert_eq!(search(u64::MIN), 0);
            assert_eq!(search(9), 0);
@@ -878,7 +949,7 @@ mod tests {
        let ranges = vec![(10.0..100.0).into()];

        let collector = get_collector_from_ranges(ranges, ColumnType::F64);
-        let search = |val: u64| collector.get_bucket_pos(val);
+        let search = |val: u64| get_bucket_pos(val, &collector.parent_buckets[0]);

        assert_eq!(search(u64::MIN), 0);
        assert_eq!(search(9f64.to_u64()), 0);
@@ -890,63 +961,3 @@ mod tests {
                                             // the max value
    }
 }
-
-#[cfg(all(test, feature = "unstable"))]
-mod bench {
-
-    use itertools::Itertools;
-    use rand::seq::SliceRandom;
-    use rand::thread_rng;
-
-    use super::*;
-    use crate::aggregation::bucket::range::tests::get_collector_from_ranges;
-
-    const TOTAL_DOCS: u64 = 1_000_000u64;
-    const NUM_DOCS: u64 = 50_000u64;
-
-    fn get_collector_with_buckets(num_buckets: u64, num_docs: u64) -> SegmentRangeCollector {
-        let bucket_size = num_docs / num_buckets;
-        let mut buckets: Vec<RangeAggregationRange> = vec![];
-        for i in 0..num_buckets {
-            let bucket_start = (i * bucket_size) as f64;
-            buckets.push((bucket_start..bucket_start + bucket_size as f64).into())
-        }
-
-        get_collector_from_ranges(buckets, ColumnType::U64)
-    }
-
-    fn get_rand_docs(total_docs: u64, num_docs_returned: u64) -> Vec<u64> {
-        let mut rng = thread_rng();
-
-        let all_docs = (0..total_docs - 1).collect_vec();
-        let mut vals = all_docs
-            .as_slice()
-            .choose_multiple(&mut rng, num_docs_returned as usize)
-            .cloned()
-            .collect_vec();
-        vals.sort();
-        vals
-    }
-
-    fn bench_range_binary_search(b: &mut test::Bencher, num_buckets: u64) {
-        let collector = get_collector_with_buckets(num_buckets, TOTAL_DOCS);
-        let vals = get_rand_docs(TOTAL_DOCS, NUM_DOCS);
-        b.iter(|| {
-            let mut bucket_pos = 0;
-            for val in &vals {
-                bucket_pos = collector.get_bucket_pos(*val);
-            }
-            bucket_pos
-        })
-    }
-
-    #[bench]
-    fn bench_range_100_buckets(b: &mut test::Bencher) {
-        bench_range_binary_search(b, 100)
-    }
-
-    #[bench]
-    fn bench_range_10_buckets(b: &mut test::Bencher) {
-        bench_range_binary_search(b, 10)
-    }
-}
--- a/src/aggregation/bucket/term_agg/mod.rs
+++ b/src/aggregation/bucket/term_agg/mod.rs
--- a/src/aggregation/bucket/term_agg/default_impl.rs
+++ b/src/aggregation/bucket/term_agg/default_impl.rs
@@ -1,196 +0,0 @@
-use std::fmt::Debug;
-
-use columnar::ColumnType;
-use rustc_hash::FxHashMap;
-
-use super::OrderTarget;
-use crate::aggregation::agg_data::{
-    build_segment_agg_collectors, AggRefNode, AggregationsSegmentCtx,
-};
-use crate::aggregation::agg_limits::MemoryConsumption;
-use crate::aggregation::bucket::get_agg_name_and_property;
-use crate::aggregation::intermediate_agg_result::{
-    IntermediateAggregationResult, IntermediateAggregationResults,
-};
-use crate::aggregation::segment_agg_result::SegmentAggregationCollector;
-use crate::TantivyError;
-
-#[derive(Clone, Debug, Default)]
-/// Container to store term_ids/or u64 values and their buckets.
-struct TermBuckets {
-    pub(crate) entries: FxHashMap<u64, u32>,
-    pub(crate) sub_aggs: FxHashMap<u64, Box<dyn SegmentAggregationCollector>>,
-}
-
-impl TermBuckets {
-    fn get_memory_consumption(&self) -> usize {
-        let sub_aggs_mem = self.sub_aggs.memory_consumption();
-        let buckets_mem = self.entries.memory_consumption();
-        sub_aggs_mem + buckets_mem
-    }
-
-    fn force_flush(&mut self, agg_data: &mut AggregationsSegmentCtx) -> crate::Result<()> {
-        for sub_aggregations in &mut self.sub_aggs.values_mut() {
-            sub_aggregations.as_mut().flush(agg_data)?;
-        }
-        Ok(())
-    }
-}
-
-/// The collector puts values from the fast field into the correct buckets and does a conversion to
-/// the correct datatype.
-#[derive(Clone, Debug)]
-pub struct SegmentTermCollector {
-    /// The buckets containing the aggregation data.
-    term_buckets: TermBuckets,
-    accessor_idx: usize,
-}
-
-impl SegmentAggregationCollector for SegmentTermCollector {
-    fn add_intermediate_aggregation_result(
-        self: Box<Self>,
-        agg_data: &AggregationsSegmentCtx,
-        results: &mut IntermediateAggregationResults,
-    ) -> crate::Result<()> {
-        let name = agg_data.get_term_req_data(self.accessor_idx).name.clone();
-
-        let entries: Vec<(u64, u32)> = self.term_buckets.entries.into_iter().collect();
-        let bucket = super::into_intermediate_bucket_result(
-            self.accessor_idx,
-            entries,
-            self.term_buckets.sub_aggs,
-            agg_data,
-        )?;
-        results.push(name, IntermediateAggregationResult::Bucket(bucket))?;
-
-        Ok(())
-    }
-
-    #[inline]
-    fn collect(
-        &mut self,
-        doc: crate::DocId,
-        agg_data: &mut AggregationsSegmentCtx,
-    ) -> crate::Result<()> {
-        self.collect_block(&[doc], agg_data)
-    }
-
-    #[inline]
-    fn collect_block(
-        &mut self,
-        docs: &[crate::DocId],
-        agg_data: &mut AggregationsSegmentCtx,
-    ) -> crate::Result<()> {
-        let mut req_data = agg_data.take_term_req_data(self.accessor_idx);
-
-        let mem_pre = self.get_memory_consumption();
-
-        if let Some(missing) = req_data.missing_value_for_accessor {
-            req_data.column_block_accessor.fetch_block_with_missing(
-                docs,
-                &req_data.accessor,
-                missing,
-            );
-        } else {
-            req_data
-                .column_block_accessor
-                .fetch_block(docs, &req_data.accessor);
-        }
-
-        for term_id in req_data.column_block_accessor.iter_vals() {
-            if let Some(allowed_bs) = req_data.allowed_term_ids.as_ref() {
-                if !allowed_bs.contains(term_id as u32) {
-                    continue;
-                }
-            }
-            let entry = self.term_buckets.entries.entry(term_id).or_default();
-            *entry += 1;
-        }
-        // has subagg
-        if let Some(blueprint) = req_data.sub_aggregation_blueprint.as_ref() {
-            for (doc, term_id) in req_data
-                .column_block_accessor
-                .iter_docid_vals(docs, &req_data.accessor)
-            {
-                if let Some(allowed_bs) = req_data.allowed_term_ids.as_ref() {
-                    if !allowed_bs.contains(term_id as u32) {
-                        continue;
-                    }
-                }
-                let sub_aggregations = self
-                    .term_buckets
-                    .sub_aggs
-                    .entry(term_id)
-                    .or_insert_with(|| blueprint.clone());
-                sub_aggregations.collect(doc, agg_data)?;
-            }
-        }
-
-        let mem_delta = self.get_memory_consumption() - mem_pre;
-        if mem_delta > 0 {
-            agg_data
-                .context
-                .limits
-                .add_memory_consumed(mem_delta as u64)?;
-        }
-        agg_data.put_back_term_req_data(self.accessor_idx, req_data);
-
-        Ok(())
-    }
-
-    fn flush(&mut self, agg_data: &mut AggregationsSegmentCtx) -> crate::Result<()> {
-        self.term_buckets.force_flush(agg_data)?;
-        Ok(())
-    }
-}
-
-impl SegmentTermCollector {
-    pub fn from_req_and_validate(
-        req_data: &mut AggregationsSegmentCtx,
-        node: &AggRefNode,
-    ) -> crate::Result<Self> {
-        let terms_req_data = req_data.get_term_req_data(node.idx_in_req_data);
-        let column_type = terms_req_data.column_type;
-        let accessor_idx = node.idx_in_req_data;
-        if column_type == ColumnType::Bytes {
-            return Err(TantivyError::InvalidArgument(format!(
-                "terms aggregation is not supported for column type {column_type:?}"
-            )));
-        }
-        let term_buckets = TermBuckets::default();
-
-        // Validate sub aggregation exists
-        if let OrderTarget::SubAggregation(sub_agg_name) = &terms_req_data.req.order.target {
-            let (agg_name, _agg_property) = get_agg_name_and_property(sub_agg_name);
-
-            node.get_sub_agg(agg_name, &req_data.per_request)
-                .ok_or_else(|| {
-                    TantivyError::InvalidArgument(format!(
-                        "could not find aggregation with name {agg_name} in metric \
-                         sub_aggregations"
-                    ))
-                })?;
-        }
-
-        let has_sub_aggregations = !node.children.is_empty();
-        let blueprint = if has_sub_aggregations {
-            let sub_aggregation = build_segment_agg_collectors(req_data, &node.children)?;
-            Some(sub_aggregation)
-        } else {
-            None
-        };
-        let terms_req_data = req_data.get_term_req_data_mut(node.idx_in_req_data);
-        terms_req_data.sub_aggregation_blueprint = blueprint;
-
-        Ok(SegmentTermCollector {
-            term_buckets,
-            accessor_idx,
-        })
-    }
-
-    fn get_memory_consumption(&self) -> usize {
-        let self_mem = std::mem::size_of::<Self>();
-        let term_buckets_mem = self.term_buckets.get_memory_consumption();
-        self_mem + term_buckets_mem
-    }
-}
--- a/src/aggregation/bucket/term_agg/low_cardinality_impl.rs
+++ b/src/aggregation/bucket/term_agg/low_cardinality_impl.rs
@@ -1,228 +0,0 @@
-use std::vec;
-
-use rustc_hash::FxHashMap;
-
-use crate::aggregation::agg_data::{
-    build_segment_agg_collectors, AggRefNode, AggregationsSegmentCtx,
-};
-use crate::aggregation::bucket::{get_agg_name_and_property, OrderTarget};
-use crate::aggregation::intermediate_agg_result::{
-    IntermediateAggregationResult, IntermediateAggregationResults,
-};
-use crate::aggregation::segment_agg_result::SegmentAggregationCollector;
-use crate::{DocId, TantivyError};
-
-const MAX_BATCH_SIZE: usize = 1_024;
-
-#[derive(Debug, Clone)]
-struct LowCardTermBuckets {
-    entries: Box<[u32]>,
-    sub_aggs: Vec<Box<dyn SegmentAggregationCollector>>,
-    doc_buffers: Box<[Vec<DocId>]>,
-}
-
-impl LowCardTermBuckets {
-    pub fn with_num_buckets(
-        num_buckets: usize,
-        sub_aggs_blueprint_opt: Option<&Box<dyn SegmentAggregationCollector>>,
-    ) -> Self {
-        let sub_aggs = sub_aggs_blueprint_opt
-            .as_ref()
-            .map(|blueprint| {
-                std::iter::repeat_with(|| blueprint.clone_box())
-                    .take(num_buckets)
-                    .collect::<Vec<_>>()
-            })
-            .unwrap_or_default();
-        Self {
-            entries: vec![0; num_buckets].into_boxed_slice(),
-            sub_aggs,
-            doc_buffers: std::iter::repeat_with(|| Vec::with_capacity(MAX_BATCH_SIZE))
-                .take(num_buckets)
-                .collect::<Vec<_>>()
-                .into_boxed_slice(),
-        }
-    }
-
-    fn get_memory_consumption(&self) -> usize {
-        std::mem::size_of::<Self>()
-            + self.entries.len() * std::mem::size_of::<u32>()
-            + self.doc_buffers.len()
-                * (std::mem::size_of::<Vec<DocId>>()
-                    + std::mem::size_of::<DocId>() * MAX_BATCH_SIZE)
-    }
-}
-
-#[derive(Debug, Clone)]
-pub struct LowCardSegmentTermCollector {
-    term_buckets: LowCardTermBuckets,
-    accessor_idx: usize,
-}
-
-impl LowCardSegmentTermCollector {
-    pub fn from_req_and_validate(
-        req_data: &mut AggregationsSegmentCtx,
-        node: &AggRefNode,
-    ) -> crate::Result<Self> {
-        let terms_req_data = req_data.get_term_req_data(node.idx_in_req_data);
-        let accessor_idx = node.idx_in_req_data;
-        let cardinality = terms_req_data
-            .accessor
-            .max_value()
-            .max(terms_req_data.missing_value_for_accessor.unwrap_or(0))
-            + 1;
-        assert!(cardinality <= super::LOW_CARDINALITY_THRESHOLD);
-
-        // Validate sub aggregation exists
-        if let OrderTarget::SubAggregation(sub_agg_name) = &terms_req_data.req.order.target {
-            let (agg_name, _agg_property) = get_agg_name_and_property(sub_agg_name);
-
-            node.get_sub_agg(agg_name, &req_data.per_request)
-                .ok_or_else(|| {
-                    TantivyError::InvalidArgument(format!(
-                        "could not find aggregation with name {agg_name} in metric \
-                         sub_aggregations"
-                    ))
-                })?;
-        }
-
-        let has_sub_aggregations = !node.children.is_empty();
-        let blueprint = if has_sub_aggregations {
-            let sub_aggregation = build_segment_agg_collectors(req_data, &node.children)?;
-            Some(sub_aggregation)
-        } else {
-            None
-        };
-        let terms_req_data = req_data.get_term_req_data_mut(node.idx_in_req_data);
-
-        let term_buckets =
-            LowCardTermBuckets::with_num_buckets(cardinality as usize, blueprint.as_ref());
-
-        terms_req_data.sub_aggregation_blueprint = blueprint;
-
-        Ok(LowCardSegmentTermCollector {
-            term_buckets,
-            accessor_idx,
-        })
-    }
-
-    fn get_memory_consumption(&self) -> usize {
-        let self_mem = std::mem::size_of::<Self>();
-        let term_buckets_mem = self.term_buckets.get_memory_consumption();
-        self_mem + term_buckets_mem
-    }
-}
-
-impl SegmentAggregationCollector for LowCardSegmentTermCollector {
-    fn add_intermediate_aggregation_result(
-        self: Box<Self>,
-        agg_data: &AggregationsSegmentCtx,
-        results: &mut IntermediateAggregationResults,
-    ) -> crate::Result<()> {
-        let name = agg_data.get_term_req_data(self.accessor_idx).name.clone();
-        let sub_aggs: FxHashMap<u64, Box<dyn SegmentAggregationCollector>> = self
-            .term_buckets
-            .sub_aggs
-            .into_iter()
-            .enumerate()
-            .filter(|(bucket_id, _sub_agg)| self.term_buckets.entries[*bucket_id] > 0)
-            .map(|(bucket_id, sub_agg)| (bucket_id as u64, sub_agg))
-            .collect();
-        let entries: Vec<(u64, u32)> = self
-            .term_buckets
-            .entries
-            .iter()
-            .enumerate()
-            .filter(|(_, count)| **count > 0)
-            .map(|(bucket_id, count)| (bucket_id as u64, *count))
-            .collect();
-
-        let bucket =
-            super::into_intermediate_bucket_result(self.accessor_idx, entries, sub_aggs, agg_data)?;
-        results.push(name, IntermediateAggregationResult::Bucket(bucket))?;
-        Ok(())
-    }
-
-    fn collect_block(
-        &mut self,
-        docs: &[crate::DocId],
-        agg_data: &mut AggregationsSegmentCtx,
-    ) -> crate::Result<()> {
-        if docs.len() > MAX_BATCH_SIZE {
-            for batch in docs.chunks(MAX_BATCH_SIZE) {
-                self.collect_block(batch, agg_data)?;
-            }
-        }
-
-        let mut req_data = agg_data.take_term_req_data(self.accessor_idx);
-
-        let mem_pre = self.get_memory_consumption();
-
-        if let Some(missing) = req_data.missing_value_for_accessor {
-            req_data.column_block_accessor.fetch_block_with_missing(
-                docs,
-                &req_data.accessor,
-                missing,
-            );
-        } else {
-            req_data
-                .column_block_accessor
-                .fetch_block(docs, &req_data.accessor);
-        }
-
-        // has subagg
-        if req_data.sub_aggregation_blueprint.is_some() {
-            for (doc, term_id) in req_data
-                .column_block_accessor
-                .iter_docid_vals(docs, &req_data.accessor)
-            {
-                if let Some(allowed_bs) = req_data.allowed_term_ids.as_ref() {
-                    if !allowed_bs.contains(term_id as u32) {
-                        continue;
-                    }
-                }
-                self.term_buckets.doc_buffers[term_id as usize].push(doc);
-            }
-            for (bucket_id, docs) in self.term_buckets.doc_buffers.iter_mut().enumerate() {
-                self.term_buckets.entries[bucket_id] += docs.len() as u32;
-                self.term_buckets.sub_aggs[bucket_id].collect_block(&docs[..], agg_data)?;
-                docs.clear();
-            }
-        } else {
-            for term_id in req_data.column_block_accessor.iter_vals() {
-                if let Some(allowed_bs) = req_data.allowed_term_ids.as_ref() {
-                    if !allowed_bs.contains(term_id as u32) {
-                        continue;
-                    }
-                }
-                self.term_buckets.entries[term_id as usize] += 1;
-            }
-        }
-
-        let mem_delta = self.get_memory_consumption() - mem_pre;
-        if mem_delta > 0 {
-            agg_data
-                .context
-                .limits
-                .add_memory_consumed(mem_delta as u64)?;
-        }
-        agg_data.put_back_term_req_data(self.accessor_idx, req_data);
-
-        Ok(())
-    }
-
-    fn collect(
-        &mut self,
-        doc: crate::DocId,
-        agg_data: &mut AggregationsSegmentCtx,
-    ) -> crate::Result<()> {
-        self.collect_block(&[doc], agg_data)
-    }
-
-    fn flush(&mut self, agg_data: &mut AggregationsSegmentCtx) -> crate::Result<()> {
-        for sub_aggregations in &mut self.term_buckets.sub_aggs.iter_mut() {
-            sub_aggregations.as_mut().flush(agg_data)?;
-        }
-        Ok(())
-    }
-}
--- a/src/aggregation/bucket/term_missing_agg.rs
+++ b/src/aggregation/bucket/term_missing_agg.rs
@@ -5,11 +5,13 @@ use crate::aggregation::agg_data::{
    build_segment_agg_collectors, AggRefNode, AggregationsSegmentCtx,
 };
 use crate::aggregation::bucket::term_agg::TermsAggregation;
+use crate::aggregation::cached_sub_aggs::{CachedSubAggs, HighCardCachedSubAggs};
 use crate::aggregation::intermediate_agg_result::{
    IntermediateAggregationResult, IntermediateAggregationResults, IntermediateBucketResult,
    IntermediateKey, IntermediateTermBucketEntry, IntermediateTermBucketResult,
 };
-use crate::aggregation::segment_agg_result::SegmentAggregationCollector;
+use crate::aggregation::segment_agg_result::{BucketIdProvider, SegmentAggregationCollector};
+use crate::aggregation::BucketId;

 /// Special aggregation to handle missing values for term aggregations.
 /// This missing aggregation will check multiple columns for existence.
@@ -35,41 +37,55 @@ impl MissingTermAggReqData {
    }
 }

-/// The specialized missing term aggregation.
 #[derive(Default, Debug, Clone)]
-pub struct TermMissingAgg {
+struct MissingCount {
    missing_count: u32,
+    bucket_id: BucketId,
+}
+
+/// The specialized missing term aggregation.
+#[derive(Default, Debug)]
+pub struct TermMissingAgg {
    accessor_idx: usize,
-    sub_agg: Option<Box<dyn SegmentAggregationCollector>>,
+    sub_agg: Option<HighCardCachedSubAggs>,
+    /// Idx = parent bucket id, Value = missing count for that bucket
+    missing_count_per_bucket: Vec<MissingCount>,
+    bucket_id_provider: BucketIdProvider,
 }
 impl TermMissingAgg {
    pub(crate) fn new(
-        req_data: &mut AggregationsSegmentCtx,
+        agg_data: &mut AggregationsSegmentCtx,
        node: &AggRefNode,
    ) -> crate::Result<Self> {
        let has_sub_aggregations = !node.children.is_empty();
        let accessor_idx = node.idx_in_req_data;
        let sub_agg = if has_sub_aggregations {
-            let sub_aggregation = build_segment_agg_collectors(req_data, &node.children)?;
+            let sub_aggregation = build_segment_agg_collectors(agg_data, &node.children)?;
            Some(sub_aggregation)
        } else {
            None
        };

+        let sub_agg = sub_agg.map(CachedSubAggs::new);
+        let bucket_id_provider = BucketIdProvider::default();
+
        Ok(Self {
            accessor_idx,
            sub_agg,
-            ..Default::default()
+            missing_count_per_bucket: Vec::new(),
+            bucket_id_provider,
        })
    }
 }

 impl SegmentAggregationCollector for TermMissingAgg {
    fn add_intermediate_aggregation_result(
-        self: Box<Self>,
+        &mut self,
        agg_data: &AggregationsSegmentCtx,
        results: &mut IntermediateAggregationResults,
+        parent_bucket_id: BucketId,
    ) -> crate::Result<()> {
+        self.prepare_max_bucket(parent_bucket_id, agg_data)?;
        let req_data = agg_data.get_missing_term_req_data(self.accessor_idx);
        let term_agg = &req_data.req;
        let missing = term_agg
@@ -80,13 +96,16 @@ impl SegmentAggregationCollector for TermMissingAgg {
        let mut entries: FxHashMap<IntermediateKey, IntermediateTermBucketEntry> =
            Default::default();

+        let missing_count = &self.missing_count_per_bucket[parent_bucket_id as usize];
        let mut missing_entry = IntermediateTermBucketEntry {
-            doc_count: self.missing_count,
+            doc_count: missing_count.missing_count,
            sub_aggregation: Default::default(),
        };
-        if let Some(sub_agg) = self.sub_agg {
+        if let Some(sub_agg) = &mut self.sub_agg {
            let mut res = IntermediateAggregationResults::default();
-            sub_agg.add_intermediate_aggregation_result(agg_data, &mut res)?;
+            sub_agg
+                .get_sub_agg_collector()
+                .add_intermediate_aggregation_result(agg_data, &mut res, missing_count.bucket_id)?;
            missing_entry.sub_aggregation = res;
        }
        entries.insert(missing.into(), missing_entry);
@@ -109,30 +128,52 @@ impl SegmentAggregationCollector for TermMissingAgg {

    fn collect(
        &mut self,
-        doc: crate::DocId,
+        parent_bucket_id: BucketId,
+        docs: &[crate::DocId],
        agg_data: &mut AggregationsSegmentCtx,
    ) -> crate::Result<()> {
+        let bucket = &mut self.missing_count_per_bucket[parent_bucket_id as usize];
        let req_data = agg_data.get_missing_term_req_data(self.accessor_idx);
-        let has_value = req_data
-            .accessors
-            .iter()
-            .any(|(acc, _)| acc.index.has_value(doc));
-        if !has_value {
-            self.missing_count += 1;
-            if let Some(sub_agg) = self.sub_agg.as_mut() {
-                sub_agg.collect(doc, agg_data)?;
+
+        for doc in docs {
+            let doc = *doc;
+            let has_value = req_data
+                .accessors
+                .iter()
+                .any(|(acc, _)| acc.index.has_value(doc));
+            if !has_value {
+                bucket.missing_count += 1;
+
+                if let Some(sub_agg) = self.sub_agg.as_mut() {
+                    sub_agg.push(bucket.bucket_id, doc);
+                }
            }
        }
+
+        if let Some(sub_agg) = self.sub_agg.as_mut() {
+            sub_agg.check_flush_local(agg_data)?;
+        }
        Ok(())
    }

-    fn collect_block(
+    fn prepare_max_bucket(
        &mut self,
-        docs: &[crate::DocId],
-        agg_data: &mut AggregationsSegmentCtx,
+        max_bucket: BucketId,
+        _agg_data: &AggregationsSegmentCtx,
    ) -> crate::Result<()> {
-        for doc in docs {
-            self.collect(*doc, agg_data)?;
+        while self.missing_count_per_bucket.len() <= max_bucket as usize {
+            let bucket_id = self.bucket_id_provider.next_bucket_id();
+            self.missing_count_per_bucket.push(MissingCount {
+                missing_count: 0,
+                bucket_id,
+            });
+        }
+        Ok(())
+    }
+
+    fn flush(&mut self, agg_data: &mut AggregationsSegmentCtx) -> crate::Result<()> {
+        if let Some(sub_agg) = self.sub_agg.as_mut() {
+            sub_agg.flush(agg_data)?;
        }
        Ok(())
    }
--- a/src/aggregation/buf_collector.rs
+++ b/src/aggregation/buf_collector.rs
@@ -1,83 +0,0 @@
-use super::intermediate_agg_result::IntermediateAggregationResults;
-use super::segment_agg_result::SegmentAggregationCollector;
-use crate::aggregation::agg_data::AggregationsSegmentCtx;
-use crate::DocId;
-
-pub(crate) const DOC_BLOCK_SIZE: usize = 64;
-pub(crate) type DocBlock = [DocId; DOC_BLOCK_SIZE];
-
-/// BufAggregationCollector buffers documents before calling collect_block().
-#[derive(Clone)]
-pub(crate) struct BufAggregationCollector {
-    pub(crate) collector: Box<dyn SegmentAggregationCollector>,
-    staged_docs: DocBlock,
-    num_staged_docs: usize,
-}
-
-impl std::fmt::Debug for BufAggregationCollector {
-    fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
-        f.debug_struct("SegmentAggregationResultsCollector")
-            .field("staged_docs", &&self.staged_docs[..self.num_staged_docs])
-            .field("num_staged_docs", &self.num_staged_docs)
-            .finish()
-    }
-}
-
-impl BufAggregationCollector {
-    pub fn new(collector: Box<dyn SegmentAggregationCollector>) -> Self {
-        Self {
-            collector,
-            num_staged_docs: 0,
-            staged_docs: [0; DOC_BLOCK_SIZE],
-        }
-    }
-}
-
-impl SegmentAggregationCollector for BufAggregationCollector {
-    #[inline]
-    fn add_intermediate_aggregation_result(
-        self: Box<Self>,
-        agg_data: &AggregationsSegmentCtx,
-        results: &mut IntermediateAggregationResults,
-    ) -> crate::Result<()> {
-        Box::new(self.collector).add_intermediate_aggregation_result(agg_data, results)
-    }
-
-    #[inline]
-    fn collect(
-        &mut self,
-        doc: crate::DocId,
-        agg_data: &mut AggregationsSegmentCtx,
-    ) -> crate::Result<()> {
-        self.staged_docs[self.num_staged_docs] = doc;
-        self.num_staged_docs += 1;
-        if self.num_staged_docs == self.staged_docs.len() {
-            self.collector
-                .collect_block(&self.staged_docs[..self.num_staged_docs], agg_data)?;
-            self.num_staged_docs = 0;
-        }
-        Ok(())
-    }
-
-    #[inline]
-    fn collect_block(
-        &mut self,
-        docs: &[crate::DocId],
-        agg_data: &mut AggregationsSegmentCtx,
-    ) -> crate::Result<()> {
-        self.collector.collect_block(docs, agg_data)?;
-
-        Ok(())
-    }
-
-    #[inline]
-    fn flush(&mut self, agg_data: &mut AggregationsSegmentCtx) -> crate::Result<()> {
-        self.collector
-            .collect_block(&self.staged_docs[..self.num_staged_docs], agg_data)?;
-        self.num_staged_docs = 0;
-
-        self.collector.flush(agg_data)?;
-
-        Ok(())
-    }
-}
--- a/src/aggregation/cached_sub_aggs.rs
+++ b/src/aggregation/cached_sub_aggs.rs
@@ -0,0 +1,245 @@
+use std::fmt::Debug;
+
+use super::segment_agg_result::SegmentAggregationCollector;
+use crate::aggregation::agg_data::AggregationsSegmentCtx;
+use crate::aggregation::bucket::MAX_NUM_TERMS_FOR_VEC;
+use crate::aggregation::BucketId;
+use crate::DocId;
+
+/// A cache for sub-aggregations, storing doc ids per bucket id.
+/// Depending on the cardinality of the parent aggregation, we use different
+/// storage strategies.
+///
+/// ## Low Cardinality
+/// Cardinality here refers to the number of unique flattened buckets that can be created
+/// by the parent aggregation.
+/// Flattened buckets are the result of combining all buckets per collector
+/// into a single list of buckets, where each bucket is identified by its BucketId.
+///
+/// ## Usage
+/// Since this is caching for sub-aggregations, it is only used by bucket
+/// aggregations.
+///
+/// TODO: consider using a more advanced data structure for high cardinality
+/// aggregations.
+/// What this datastructure does in general is to group docs by bucket id.
+#[derive(Debug)]
+pub(crate) struct CachedSubAggs<C: SubAggCache> {
+    cache: C,
+    sub_agg_collector: Box<dyn SegmentAggregationCollector>,
+    num_docs: usize,
+}
+
+pub type LowCardCachedSubAggs = CachedSubAggs<LowCardSubAggCache>;
+pub type HighCardCachedSubAggs = CachedSubAggs<HighCardSubAggCache>;
+
+const FLUSH_THRESHOLD: usize = 2048;
+
+/// A trait for caching sub-aggregation doc ids per bucket id.
+/// Different implementations can be used depending on the cardinality
+/// of the parent aggregation.
+pub trait SubAggCache: Debug {
+    fn new() -> Self;
+    fn push(&mut self, bucket_id: BucketId, doc_id: DocId);
+    fn flush_local(
+        &mut self,
+        sub_agg: &mut Box<dyn SegmentAggregationCollector>,
+        agg_data: &mut AggregationsSegmentCtx,
+        force: bool,
+    ) -> crate::Result<()>;
+}
+
+impl<Backend: SubAggCache + Debug> CachedSubAggs<Backend> {
+    pub fn new(sub_agg: Box<dyn SegmentAggregationCollector>) -> Self {
+        Self {
+            cache: Backend::new(),
+            sub_agg_collector: sub_agg,
+            num_docs: 0,
+        }
+    }
+
+    pub fn get_sub_agg_collector(&mut self) -> &mut Box<dyn SegmentAggregationCollector> {
+        &mut self.sub_agg_collector
+    }
+
+    #[inline]
+    pub fn push(&mut self, bucket_id: BucketId, doc_id: DocId) {
+        self.cache.push(bucket_id, doc_id);
+        self.num_docs += 1;
+    }
+
+    /// Check if we need to flush based on the number of documents cached.
+    /// If so, flushes the cache to the provided aggregation collector.
+    pub fn check_flush_local(
+        &mut self,
+        agg_data: &mut AggregationsSegmentCtx,
+    ) -> crate::Result<()> {
+        if self.num_docs >= FLUSH_THRESHOLD {
+            self.cache
+                .flush_local(&mut self.sub_agg_collector, agg_data, false)?;
+            self.num_docs = 0;
+        }
+        Ok(())
+    }
+
+    /// Note: this _does_ flush the sub aggregations.
+    pub fn flush(&mut self, agg_data: &mut AggregationsSegmentCtx) -> crate::Result<()> {
+        if self.num_docs != 0 {
+            self.cache
+                .flush_local(&mut self.sub_agg_collector, agg_data, true)?;
+            self.num_docs = 0;
+        }
+        self.sub_agg_collector.flush(agg_data)?;
+        Ok(())
+    }
+}
+
+/// Number of partitions for high cardinality sub-aggregation cache.
+const NUM_PARTITIONS: usize = 16;
+
+#[derive(Debug)]
+pub(crate) struct HighCardSubAggCache {
+    /// This weird partitioning is used to do some cheap grouping on the bucket ids.
+    /// bucket ids are dense, e.g. when we don't detect the cardinality as low cardinality,
+    /// but there are just 16 bucket ids, each bucket id will go to its own partition.
+    ///
+    /// We want to keep this cheap, because high cardinality aggregations can have a lot of
+    /// buckets, and there may be nothing to group.
+    partitions: Box<[PartitionEntry; NUM_PARTITIONS]>,
+}
+
+impl HighCardSubAggCache {
+    #[inline]
+    fn clear(&mut self) {
+        for partition in self.partitions.iter_mut() {
+            partition.clear();
+        }
+    }
+}
+
+#[derive(Debug, Clone, Default)]
+struct PartitionEntry {
+    bucket_ids: Vec<BucketId>,
+    docs: Vec<DocId>,
+}
+
+impl PartitionEntry {
+    #[inline]
+    fn clear(&mut self) {
+        self.bucket_ids.clear();
+        self.docs.clear();
+    }
+}
+
+impl SubAggCache for HighCardSubAggCache {
+    fn new() -> Self {
+        Self {
+            partitions: Box::new(core::array::from_fn(|_| PartitionEntry::default())),
+        }
+    }
+
+    fn push(&mut self, bucket_id: BucketId, doc_id: DocId) {
+        let idx = bucket_id % NUM_PARTITIONS as u32;
+        let slot = &mut self.partitions[idx as usize];
+        slot.bucket_ids.push(bucket_id);
+        slot.docs.push(doc_id);
+    }
+
+    fn flush_local(
+        &mut self,
+        sub_agg: &mut Box<dyn SegmentAggregationCollector>,
+        agg_data: &mut AggregationsSegmentCtx,
+        _force: bool,
+    ) -> crate::Result<()> {
+        let mut max_bucket = 0u32;
+        for partition in self.partitions.iter() {
+            if let Some(&local_max) = partition.bucket_ids.iter().max() {
+                max_bucket = max_bucket.max(local_max);
+            }
+        }
+
+        sub_agg.prepare_max_bucket(max_bucket, agg_data)?;
+
+        for slot in self.partitions.iter() {
+            if !slot.bucket_ids.is_empty() {
+                // Reduce dynamic dispatch overhead by collecting a full partition in one call.
+                sub_agg.collect_multiple(&slot.bucket_ids, &slot.docs, agg_data)?;
+            }
+        }
+
+        self.clear();
+        Ok(())
+    }
+}
+
+#[derive(Debug)]
+pub(crate) struct LowCardSubAggCache {
+    /// Cache doc ids per bucket for sub-aggregations.
+    ///
+    /// The outer Vec is indexed by BucketId.
+    per_bucket_docs: Vec<Vec<DocId>>,
+}
+
+impl LowCardSubAggCache {
+    #[inline]
+    fn clear(&mut self) {
+        for v in &mut self.per_bucket_docs {
+            v.clear();
+        }
+    }
+}
+
+impl SubAggCache for LowCardSubAggCache {
+    fn new() -> Self {
+        Self {
+            per_bucket_docs: Vec::new(),
+        }
+    }
+
+    fn push(&mut self, bucket_id: BucketId, doc_id: DocId) {
+        let idx = bucket_id as usize;
+        if self.per_bucket_docs.len() <= idx {
+            self.per_bucket_docs.resize_with(idx + 1, Vec::new);
+        }
+        self.per_bucket_docs[idx].push(doc_id);
+    }
+
+    fn flush_local(
+        &mut self,
+        sub_agg: &mut Box<dyn SegmentAggregationCollector>,
+        agg_data: &mut AggregationsSegmentCtx,
+        force: bool,
+    ) -> crate::Result<()> {
+        // Pre-aggregated: call collect per bucket.
+        let max_bucket = (self.per_bucket_docs.len() as BucketId).saturating_sub(1);
+        sub_agg.prepare_max_bucket(max_bucket, agg_data)?;
+        // The threshold above which we flush buckets individually.
+        // Note: We need to make sure that we don't lock ourselves into a situation where we hit
+        // the FLUSH_THRESHOLD, but never flush any buckets. (except the final flush)
+        let mut bucket_treshold = FLUSH_THRESHOLD / (self.per_bucket_docs.len().max(1) * 2);
+        const _: () = {
+            // MAX_NUM_TERMS_FOR_VEC threshold is used for term aggregations
+            // Note: There may be other flexible values, for other aggregations, but we can use the
+            // const value here as a upper bound. (better than nothing)
+            let bucket_treshold_limit = FLUSH_THRESHOLD / (MAX_NUM_TERMS_FOR_VEC as usize * 2);
+            assert!(
+                bucket_treshold_limit > 0,
+                "Bucket threshold must be greater than 0"
+            );
+        };
+        if force {
+            bucket_treshold = 0;
+        }
+        for (bucket_id, docs) in self
+            .per_bucket_docs
+            .iter()
+            .enumerate()
+            .filter(|(_, docs)| docs.len() > bucket_treshold)
+        {
+            sub_agg.collect(bucket_id as BucketId, docs, agg_data)?;
+        }
+
+        self.clear();
+        Ok(())
+    }
+}
--- a/src/aggregation/collector.rs
+++ b/src/aggregation/collector.rs
@@ -1,9 +1,9 @@
 use super::agg_req::Aggregations;
 use super::agg_result::AggregationResults;
-use super::buf_collector::BufAggregationCollector;
+use super::cached_sub_aggs::LowCardCachedSubAggs;
 use super::intermediate_agg_result::IntermediateAggregationResults;
-use super::segment_agg_result::SegmentAggregationCollector;
 use super::AggContextParams;
+// group buffering strategy is chosen explicitly by callers; no need to hash-group on the fly.
 use crate::aggregation::agg_data::{
    build_aggregations_data_from_req, build_segment_agg_collectors_root, AggregationsSegmentCtx,
 };
@@ -66,7 +66,7 @@ impl Collector for DistributedAggregationCollector {
    fn for_segment(
        &self,
        segment_local_id: crate::SegmentOrdinal,
-        reader: &crate::SegmentReader,
+        reader: &dyn SegmentReader,
    ) -> crate::Result<Self::Child> {
        AggregationSegmentCollector::from_agg_req_and_reader(
            &self.agg,
@@ -96,7 +96,7 @@ impl Collector for AggregationCollector {
    fn for_segment(
        &self,
        segment_local_id: crate::SegmentOrdinal,
-        reader: &crate::SegmentReader,
+        reader: &dyn SegmentReader,
    ) -> crate::Result<Self::Child> {
        AggregationSegmentCollector::from_agg_req_and_reader(
            &self.agg,
@@ -136,7 +136,7 @@ fn merge_fruits(
 /// `AggregationSegmentCollector` does the aggregation collection on a segment.
 pub struct AggregationSegmentCollector {
    aggs_with_accessor: AggregationsSegmentCtx,
-    agg_collector: BufAggregationCollector,
+    agg_collector: LowCardCachedSubAggs,
    error: Option<TantivyError>,
 }

@@ -145,14 +145,17 @@ impl AggregationSegmentCollector {
    /// reader. Also includes validation, e.g. checking field types and existence.
    pub fn from_agg_req_and_reader(
        agg: &Aggregations,
-        reader: &SegmentReader,
+        reader: &dyn SegmentReader,
        segment_ordinal: SegmentOrdinal,
        context: &AggContextParams,
    ) -> crate::Result<Self> {
        let mut agg_data =
            build_aggregations_data_from_req(agg, reader, segment_ordinal, context.clone())?;
-        let result =
-            BufAggregationCollector::new(build_segment_agg_collectors_root(&mut agg_data)?);
+        let mut result =
+            LowCardCachedSubAggs::new(build_segment_agg_collectors_root(&mut agg_data)?);
+        result
+            .get_sub_agg_collector()
+            .prepare_max_bucket(0, &agg_data)?; // prepare for bucket zero

        Ok(AggregationSegmentCollector {
            aggs_with_accessor: agg_data,
@@ -170,26 +173,31 @@ impl SegmentCollector for AggregationSegmentCollector {
        if self.error.is_some() {
            return;
        }
-        if let Err(err) = self
+        self.agg_collector.push(0, doc);
+        match self
            .agg_collector
-            .collect(doc, &mut self.aggs_with_accessor)
+            .check_flush_local(&mut self.aggs_with_accessor)
        {
-            self.error = Some(err);
+            Ok(_) => {}
+            Err(e) => {
+                self.error = Some(e);
+            }
        }
    }
-
-    /// The query pushes the documents to the collector via this method.
-    ///
-    /// Only valid for Collectors that ignore docs
    fn collect_block(&mut self, docs: &[DocId]) {
        if self.error.is_some() {
            return;
        }
-        if let Err(err) = self
-            .agg_collector
-            .collect_block(docs, &mut self.aggs_with_accessor)
-        {
-            self.error = Some(err);
+
+        match self.agg_collector.get_sub_agg_collector().collect(
+            0,
+            docs,
+            &mut self.aggs_with_accessor,
+        ) {
+            Ok(_) => {}
+            Err(e) => {
+                self.error = Some(e);
+            }
        }
    }

@@ -200,10 +208,13 @@ impl SegmentCollector for AggregationSegmentCollector {
        self.agg_collector.flush(&mut self.aggs_with_accessor)?;

        let mut sub_aggregation_res = IntermediateAggregationResults::default();
-        Box::new(self.agg_collector).add_intermediate_aggregation_result(
-            &self.aggs_with_accessor,
-            &mut sub_aggregation_res,
-        )?;
+        self.agg_collector
+            .get_sub_agg_collector()
+            .add_intermediate_aggregation_result(
+                &self.aggs_with_accessor,
+                &mut sub_aggregation_res,
+                0,
+            )?;

        Ok(sub_aggregation_res)
    }
--- a/src/aggregation/intermediate_agg_result.rs
+++ b/src/aggregation/intermediate_agg_result.rs
@@ -90,6 +90,19 @@ impl From<IntermediateKey> for Key {

 impl Eq for IntermediateKey {}

+impl std::fmt::Display for IntermediateKey {
+    fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
+        match self {
+            IntermediateKey::Str(val) => f.write_str(val),
+            IntermediateKey::F64(val) => f.write_str(&val.to_string()),
+            IntermediateKey::U64(val) => f.write_str(&val.to_string()),
+            IntermediateKey::I64(val) => f.write_str(&val.to_string()),
+            IntermediateKey::Bool(val) => f.write_str(&val.to_string()),
+            IntermediateKey::IpAddr(val) => f.write_str(&val.to_string()),
+        }
+    }
+}
+
 impl std::hash::Hash for IntermediateKey {
    fn hash<H: std::hash::Hasher>(&self, state: &mut H) {
        core::mem::discriminant(self).hash(state);
@@ -105,6 +118,21 @@ impl std::hash::Hash for IntermediateKey {
 }

 impl IntermediateAggregationResults {
+    /// Returns a reference to the intermediate aggregation result for the given key.
+    pub fn get(&self, key: &str) -> Option<&IntermediateAggregationResult> {
+        self.aggs_res.get(key)
+    }
+
+    /// Removes and returns the intermediate aggregation result for the given key.
+    pub fn remove(&mut self, key: &str) -> Option<IntermediateAggregationResult> {
+        self.aggs_res.remove(key)
+    }
+
+    /// Returns an iterator over the keys in the intermediate aggregation results.
+    pub fn keys(&self) -> impl Iterator<Item = &String> {
+        self.aggs_res.keys()
+    }
+
    /// Add a result
    pub fn push(&mut self, key: String, value: IntermediateAggregationResult) -> crate::Result<()> {
        let entry = self.aggs_res.entry(key);
@@ -639,6 +667,21 @@ pub struct IntermediateTermBucketResult {
 }

 impl IntermediateTermBucketResult {
+    /// Returns a reference to the map of bucket entries keyed by [`IntermediateKey`].
+    pub fn entries(&self) -> &FxHashMap<IntermediateKey, IntermediateTermBucketEntry> {
+        &self.entries
+    }
+
+    /// Returns the count of documents not included in the returned buckets.
+    pub fn sum_other_doc_count(&self) -> u64 {
+        self.sum_other_doc_count
+    }
+
+    /// Returns the upper bound of the error on document counts in the returned buckets.
+    pub fn doc_count_error_upper_bound(&self) -> u64 {
+        self.doc_count_error_upper_bound
+    }
+
    pub(crate) fn into_final_result(
        self,
        req: &TermsAggregation,
@@ -792,7 +835,7 @@ pub struct IntermediateRangeBucketEntry {
    /// The number of documents in the bucket.
    pub doc_count: u64,
    /// The sub_aggregation in this bucket.
-    pub sub_aggregation: IntermediateAggregationResults,
+    pub sub_aggregation_res: IntermediateAggregationResults,
    /// The from range of the bucket. Equals `f64::MIN` when `None`.
    pub from: Option<f64>,
    /// The to range of the bucket. Equals `f64::MAX` when `None`.
@@ -811,7 +854,7 @@ impl IntermediateRangeBucketEntry {
            key: self.key.into(),
            doc_count: self.doc_count,
            sub_aggregation: self
-                .sub_aggregation
+                .sub_aggregation_res
                .into_final_result_internal(req, limits)?,
            to: self.to,
            from: self.from,
@@ -820,7 +863,7 @@ impl IntermediateRangeBucketEntry {
        };

        // If we have a date type on the histogram buckets, we add the `key_as_string` field as
-        // rfc339
+        // rfc3339
        if column_type == Some(ColumnType::DateTime) {
            if let Some(val) = range_bucket_entry.to {
                let key_as_string = format_date(val as i64)?;
@@ -857,7 +900,8 @@ impl MergeFruits for IntermediateTermBucketEntry {
 impl MergeFruits for IntermediateRangeBucketEntry {
    fn merge_fruits(&mut self, other: IntermediateRangeBucketEntry) -> crate::Result<()> {
        self.doc_count += other.doc_count;
-        self.sub_aggregation.merge_fruits(other.sub_aggregation)?;
+        self.sub_aggregation_res
+            .merge_fruits(other.sub_aggregation_res)?;
        Ok(())
    }
 }
@@ -887,7 +931,7 @@ mod tests {
                IntermediateRangeBucketEntry {
                    key: IntermediateKey::Str(key.to_string()),
                    doc_count: *doc_count,
-                    sub_aggregation: Default::default(),
+                    sub_aggregation_res: Default::default(),
                    from: None,
                    to: None,
                },
@@ -920,7 +964,7 @@ mod tests {
                    doc_count: *doc_count,
                    from: None,
                    to: None,
-                    sub_aggregation: get_sub_test_tree(&[(
+                    sub_aggregation_res: get_sub_test_tree(&[(
                        sub_aggregation_key.to_string(),
                        *sub_aggregation_count,
                    )]),
--- a/src/aggregation/metric/average.rs
+++ b/src/aggregation/metric/average.rs
@@ -52,11 +52,15 @@ pub struct IntermediateAverage {

 impl IntermediateAverage {
    /// Creates a new [`IntermediateAverage`] instance from a [`SegmentStatsCollector`].
-    pub(crate) fn from_collector(collector: SegmentStatsCollector) -> Self {
-        Self {
-            stats: collector.stats,
-        }
+    pub(crate) fn from_stats(stats: IntermediateStats) -> Self {
+        Self { stats }
    }
+
+    /// Returns a reference to the underlying [`IntermediateStats`].
+    pub fn stats(&self) -> &IntermediateStats {
+        &self.stats
+    }
+
    /// Merges the other intermediate result into self.
    pub fn merge_fruits(&mut self, other: IntermediateAverage) {
        self.stats.merge_fruits(other.stats);
--- a/src/aggregation/metric/cardinality.rs
+++ b/src/aggregation/metric/cardinality.rs
@@ -2,7 +2,7 @@ use std::collections::hash_map::DefaultHasher;
 use std::hash::{BuildHasher, Hasher};

 use columnar::column_values::CompactSpaceU64Accessor;
-use columnar::{Column, ColumnBlockAccessor, ColumnType, Dictionary, StrColumn};
+use columnar::{Column, ColumnType, Dictionary, StrColumn};
 use common::f64_to_u64;
 use hyperloglogplus::{HyperLogLog, HyperLogLogPlus};
 use rustc_hash::FxHashSet;
@@ -106,8 +106,6 @@ pub struct CardinalityAggReqData {
    pub str_dict_column: Option<StrColumn>,
    /// The missing value normalized to the internal u64 representation of the field type.
    pub missing_value_for_accessor: Option<u64>,
-    /// The column block accessor to access the fast field values.
-    pub(crate) column_block_accessor: ColumnBlockAccessor<u64>,
    /// The name of the aggregation.
    pub name: String,
    /// The aggregation request.
@@ -135,45 +133,34 @@ impl CardinalityAggregationReq {
    }
 }

-#[derive(Clone, Debug, PartialEq)]
+#[derive(Clone, Debug)]
 pub(crate) struct SegmentCardinalityCollector {
-    cardinality: CardinalityCollector,
-    entries: FxHashSet<u64>,
+    buckets: Vec<SegmentCardinalityCollectorBucket>,
    accessor_idx: usize,
+    /// The column accessor to access the fast field values.
+    accessor: Column<u64>,
+    /// The column_type of the field.
+    column_type: ColumnType,
+    /// The missing value normalized to the internal u64 representation of the field type.
+    missing_value_for_accessor: Option<u64>,
 }

-impl SegmentCardinalityCollector {
-    pub fn from_req(column_type: ColumnType, accessor_idx: usize) -> Self {
+#[derive(Clone, Debug, PartialEq, Default)]
+pub(crate) struct SegmentCardinalityCollectorBucket {
+    cardinality: CardinalityCollector,
+    entries: FxHashSet<u64>,
+}
+impl SegmentCardinalityCollectorBucket {
+    pub fn new(column_type: ColumnType) -> Self {
        Self {
            cardinality: CardinalityCollector::new(column_type as u8),
-            entries: Default::default(),
-            accessor_idx,
+            entries: FxHashSet::default(),
        }
    }
-
-    fn fetch_block_with_field(
-        &mut self,
-        docs: &[crate::DocId],
-        agg_data: &mut CardinalityAggReqData,
-    ) {
-        if let Some(missing) = agg_data.missing_value_for_accessor {
-            agg_data.column_block_accessor.fetch_block_with_missing(
-                docs,
-                &agg_data.accessor,
-                missing,
-            );
-        } else {
-            agg_data
-                .column_block_accessor
-                .fetch_block(docs, &agg_data.accessor);
-        }
-    }
-
    fn into_intermediate_metric_result(
        mut self,
-        agg_data: &AggregationsSegmentCtx,
+        req_data: &CardinalityAggReqData,
    ) -> crate::Result<IntermediateMetricResult> {
-        let req_data = &agg_data.get_cardinality_req_data(self.accessor_idx);
        if req_data.column_type == ColumnType::Str {
            let fallback_dict = Dictionary::empty();
            let dict = req_data
@@ -194,6 +181,7 @@ impl SegmentCardinalityCollector {
                    term_ids.push(term_ord as u32);
                }
            }
+
            term_ids.sort_unstable();
            dict.sorted_ords_to_term_cb(term_ids.iter().map(|term| *term as u64), |term| {
                self.cardinality.sketch.insert_any(&term);
@@ -227,16 +215,49 @@ impl SegmentCardinalityCollector {
    }
 }

+impl SegmentCardinalityCollector {
+    pub fn from_req(
+        column_type: ColumnType,
+        accessor_idx: usize,
+        accessor: Column<u64>,
+        missing_value_for_accessor: Option<u64>,
+    ) -> Self {
+        Self {
+            buckets: vec![SegmentCardinalityCollectorBucket::new(column_type); 1],
+            column_type,
+            accessor_idx,
+            accessor,
+            missing_value_for_accessor,
+        }
+    }
+
+    fn fetch_block_with_field(
+        &mut self,
+        docs: &[crate::DocId],
+        agg_data: &mut AggregationsSegmentCtx,
+    ) {
+        agg_data.column_block_accessor.fetch_block_with_missing(
+            docs,
+            &self.accessor,
+            self.missing_value_for_accessor,
+        );
+    }
+}
+
 impl SegmentAggregationCollector for SegmentCardinalityCollector {
    fn add_intermediate_aggregation_result(
-        self: Box<Self>,
+        &mut self,
        agg_data: &AggregationsSegmentCtx,
        results: &mut IntermediateAggregationResults,
+        parent_bucket_id: BucketId,
    ) -> crate::Result<()> {
+        self.prepare_max_bucket(parent_bucket_id, agg_data)?;
        let req_data = &agg_data.get_cardinality_req_data(self.accessor_idx);
        let name = req_data.name.to_string();
+        // take the bucket in buckets and replace it with a new empty one
+        let bucket = std::mem::take(&mut self.buckets[parent_bucket_id as usize]);

-        let intermediate_result = self.into_intermediate_metric_result(agg_data)?;
+        let intermediate_result = bucket.into_intermediate_metric_result(req_data)?;
        results.push(
            name,
            IntermediateAggregationResult::Metric(intermediate_result),
@@ -247,27 +268,20 @@ impl SegmentAggregationCollector for SegmentCardinalityCollector {

    fn collect(
        &mut self,
-        doc: crate::DocId,
-        agg_data: &mut AggregationsSegmentCtx,
-    ) -> crate::Result<()> {
-        self.collect_block(&[doc], agg_data)
-    }
-
-    fn collect_block(
-        &mut self,
+        parent_bucket_id: BucketId,
        docs: &[crate::DocId],
        agg_data: &mut AggregationsSegmentCtx,
    ) -> crate::Result<()> {
-        let req_data = agg_data.get_cardinality_req_data_mut(self.accessor_idx);
-        self.fetch_block_with_field(docs, req_data);
+        self.fetch_block_with_field(docs, agg_data);
+        let bucket = &mut self.buckets[parent_bucket_id as usize];

-        let col_block_accessor = &req_data.column_block_accessor;
-        if req_data.column_type == ColumnType::Str {
+        let col_block_accessor = &agg_data.column_block_accessor;
+        if self.column_type == ColumnType::Str {
            for term_ord in col_block_accessor.iter_vals() {
-                self.entries.insert(term_ord);
+                bucket.entries.insert(term_ord);
            }
-        } else if req_data.column_type == ColumnType::IpAddr {
-            let compact_space_accessor = req_data
+        } else if self.column_type == ColumnType::IpAddr {
+            let compact_space_accessor = self
                .accessor
                .values
                .clone()
@@ -282,16 +296,29 @@ impl SegmentAggregationCollector for SegmentCardinalityCollector {
                })?;
            for val in col_block_accessor.iter_vals() {
                let val: u128 = compact_space_accessor.compact_to_u128(val as u32);
-                self.cardinality.sketch.insert_any(&val);
+                bucket.cardinality.sketch.insert_any(&val);
            }
        } else {
            for val in col_block_accessor.iter_vals() {
-                self.cardinality.sketch.insert_any(&val);
+                bucket.cardinality.sketch.insert_any(&val);
            }
        }

        Ok(())
    }
+
+    fn prepare_max_bucket(
+        &mut self,
+        max_bucket: BucketId,
+        _agg_data: &AggregationsSegmentCtx,
+    ) -> crate::Result<()> {
+        if max_bucket as usize >= self.buckets.len() {
+            self.buckets.resize_with(max_bucket as usize + 1, || {
+                SegmentCardinalityCollectorBucket::new(self.column_type)
+            });
+        }
+        Ok(())
+    }
 }

 #[derive(Clone, Debug, Serialize, Deserialize)]
--- a/src/aggregation/metric/count.rs
+++ b/src/aggregation/metric/count.rs
@@ -52,10 +52,8 @@ pub struct IntermediateCount {

 impl IntermediateCount {
    /// Creates a new [`IntermediateCount`] instance from a [`SegmentStatsCollector`].
-    pub(crate) fn from_collector(collector: SegmentStatsCollector) -> Self {
-        Self {
-            stats: collector.stats,
-        }
+    pub(crate) fn from_stats(stats: IntermediateStats) -> Self {
+        Self { stats }
    }
    /// Merges the other intermediate result into self.
    pub fn merge_fruits(&mut self, other: IntermediateCount) {
--- a/src/aggregation/metric/extended_stats.rs
+++ b/src/aggregation/metric/extended_stats.rs
@@ -8,10 +8,9 @@ use crate::aggregation::agg_data::AggregationsSegmentCtx;
 use crate::aggregation::intermediate_agg_result::{
    IntermediateAggregationResult, IntermediateAggregationResults, IntermediateMetricResult,
 };
-use crate::aggregation::metric::MetricAggReqData;
 use crate::aggregation::segment_agg_result::SegmentAggregationCollector;
 use crate::aggregation::*;
-use crate::{DocId, TantivyError};
+use crate::TantivyError;

 /// A multi-value metric aggregation that computes a collection of extended statistics
 /// on numeric values that are extracted
@@ -62,7 +61,7 @@ impl ExtendedStatsAggregation {

 /// Extended stats contains a collection of statistics
 /// they extends stats adding variance, standard deviation
-/// and bound informations
+/// and bound information
 #[derive(Clone, Debug, PartialEq, Serialize, Deserialize)]
 pub struct ExtendedStats {
    /// The number of documents.
@@ -318,51 +317,28 @@ impl IntermediateExtendedStats {
    }
 }

-#[derive(Clone, Debug, PartialEq)]
+#[derive(Clone, Debug)]
 pub(crate) struct SegmentExtendedStatsCollector {
+    name: String,
    missing: Option<u64>,
    field_type: ColumnType,
-    pub(crate) extended_stats: IntermediateExtendedStats,
-    pub(crate) accessor_idx: usize,
-    val_cache: Vec<u64>,
+    accessor: columnar::Column<u64>,
+    buckets: Vec<IntermediateExtendedStats>,
+    sigma: Option<f64>,
 }

 impl SegmentExtendedStatsCollector {
-    pub fn from_req(
-        field_type: ColumnType,
-        sigma: Option<f64>,
-        accessor_idx: usize,
-        missing: Option<f64>,
-    ) -> Self {
-        let missing = missing.and_then(|val| f64_to_fastfield_u64(val, &field_type));
+    pub fn from_req(req: &MetricAggReqData, sigma: Option<f64>) -> Self {
+        let missing = req
+            .missing
+            .and_then(|val| f64_to_fastfield_u64(val, &req.field_type));
        Self {
-            field_type,
-            extended_stats: IntermediateExtendedStats::with_sigma(sigma),
-            accessor_idx,
+            name: req.name.clone(),
+            field_type: req.field_type,
+            accessor: req.accessor.clone(),
            missing,
-            val_cache: Default::default(),
-        }
-    }
-    #[inline]
-    pub(crate) fn collect_block_with_field(
-        &mut self,
-        docs: &[DocId],
-        req_data: &mut MetricAggReqData,
-    ) {
-        if let Some(missing) = self.missing.as_ref() {
-            req_data.column_block_accessor.fetch_block_with_missing(
-                docs,
-                &req_data.accessor,
-                *missing,
-            );
-        } else {
-            req_data
-                .column_block_accessor
-                .fetch_block(docs, &req_data.accessor);
-        }
-        for val in req_data.column_block_accessor.iter_vals() {
-            let val1 = f64_from_fastfield_u64(val, &self.field_type);
-            self.extended_stats.collect(val1);
+            buckets: vec![IntermediateExtendedStats::with_sigma(sigma); 16],
+            sigma,
        }
    }
 }
@@ -370,15 +346,18 @@ impl SegmentExtendedStatsCollector {
 impl SegmentAggregationCollector for SegmentExtendedStatsCollector {
    #[inline]
    fn add_intermediate_aggregation_result(
-        self: Box<Self>,
+        &mut self,
        agg_data: &AggregationsSegmentCtx,
        results: &mut IntermediateAggregationResults,
+        parent_bucket_id: BucketId,
    ) -> crate::Result<()> {
-        let name = agg_data.get_metric_req_data(self.accessor_idx).name.clone();
+        let name = self.name.clone();
+        self.prepare_max_bucket(parent_bucket_id, agg_data)?;
+        let extended_stats = std::mem::take(&mut self.buckets[parent_bucket_id as usize]);
        results.push(
            name,
            IntermediateAggregationResult::Metric(IntermediateMetricResult::ExtendedStats(
-                self.extended_stats,
+                extended_stats,
            )),
        )?;

@@ -388,39 +367,36 @@ impl SegmentAggregationCollector for SegmentExtendedStatsCollector {
    #[inline]
    fn collect(
        &mut self,
-        doc: crate::DocId,
+        parent_bucket_id: BucketId,
+        docs: &[crate::DocId],
        agg_data: &mut AggregationsSegmentCtx,
    ) -> crate::Result<()> {
-        let req_data = agg_data.get_metric_req_data(self.accessor_idx);
-        if let Some(missing) = self.missing {
-            let mut has_val = false;
-            for val in req_data.accessor.values_for_doc(doc) {
-                let val1 = f64_from_fastfield_u64(val, &self.field_type);
-                self.extended_stats.collect(val1);
-                has_val = true;
-            }
-            if !has_val {
-                self.extended_stats
-                    .collect(f64_from_fastfield_u64(missing, &self.field_type));
-            }
-        } else {
-            for val in req_data.accessor.values_for_doc(doc) {
-                let val1 = f64_from_fastfield_u64(val, &self.field_type);
-                self.extended_stats.collect(val1);
-            }
+        let mut extended_stats = self.buckets[parent_bucket_id as usize].clone();
+
+        agg_data
+            .column_block_accessor
+            .fetch_block_with_missing(docs, &self.accessor, self.missing);
+        for val in agg_data.column_block_accessor.iter_vals() {
+            let val1 = f64_from_fastfield_u64(val, self.field_type);
+            extended_stats.collect(val1);
        }

+        // store back
+        self.buckets[parent_bucket_id as usize] = extended_stats;
+
        Ok(())
    }

-    #[inline]
-    fn collect_block(
+    fn prepare_max_bucket(
        &mut self,
-        docs: &[crate::DocId],
-        agg_data: &mut AggregationsSegmentCtx,
+        max_bucket: BucketId,
+        _agg_data: &AggregationsSegmentCtx,
    ) -> crate::Result<()> {
-        let req_data = agg_data.get_metric_req_data_mut(self.accessor_idx);
-        self.collect_block_with_field(docs, req_data);
+        if self.buckets.len() <= max_bucket as usize {
+            self.buckets.resize_with(max_bucket as usize + 1, || {
+                IntermediateExtendedStats::with_sigma(self.sigma)
+            });
+        }
        Ok(())
    }
 }
--- a/src/aggregation/metric/max.rs
+++ b/src/aggregation/metric/max.rs
@@ -52,10 +52,8 @@ pub struct IntermediateMax {

 impl IntermediateMax {
    /// Creates a new [`IntermediateMax`] instance from a [`SegmentStatsCollector`].
-    pub(crate) fn from_collector(collector: SegmentStatsCollector) -> Self {
-        Self {
-            stats: collector.stats,
-        }
+    pub(crate) fn from_stats(stats: IntermediateStats) -> Self {
+        Self { stats }
    }
    /// Merges the other intermediate result into self.
    pub fn merge_fruits(&mut self, other: IntermediateMax) {
--- a/src/aggregation/metric/min.rs
+++ b/src/aggregation/metric/min.rs
@@ -52,10 +52,8 @@ pub struct IntermediateMin {

 impl IntermediateMin {
    /// Creates a new [`IntermediateMin`] instance from a [`SegmentStatsCollector`].
-    pub(crate) fn from_collector(collector: SegmentStatsCollector) -> Self {
-        Self {
-            stats: collector.stats,
-        }
+    pub(crate) fn from_stats(stats: IntermediateStats) -> Self {
+        Self { stats }
    }
    /// Merges the other intermediate result into self.
    pub fn merge_fruits(&mut self, other: IntermediateMin) {
--- a/src/aggregation/metric/mod.rs
+++ b/src/aggregation/metric/mod.rs
@@ -31,7 +31,7 @@ use std::collections::HashMap;

 pub use average::*;
 pub use cardinality::*;
-use columnar::{Column, ColumnBlockAccessor, ColumnType};
+use columnar::{Column, ColumnType};
 pub use count::*;
 pub use extended_stats::*;
 pub use max::*;
@@ -55,8 +55,6 @@ pub struct MetricAggReqData {
    pub field_type: ColumnType,
    /// The missing value normalized to the internal u64 representation of the field type.
    pub missing_u64: Option<u64>,
-    /// The column block accessor to access the fast field values.
-    pub column_block_accessor: ColumnBlockAccessor<u64>,
    /// The column accessor to access the fast field values.
    pub accessor: Column<u64>,
    /// Used when converting to intermediate result
@@ -109,8 +107,11 @@ pub enum PercentileValues {
 #[derive(Clone, Debug, PartialEq, Serialize, Deserialize)]
 /// The entry when requesting percentiles with keyed: false
 pub struct PercentileValuesVecEntry {
-    key: f64,
-    value: f64,
+    /// Percentile
+    pub key: f64,
+
+    /// Value at the percentile
+    pub value: f64,
 }

 /// Single-metric aggregations use this common result structure.
--- a/src/aggregation/metric/percentiles.rs
+++ b/src/aggregation/metric/percentiles.rs
@@ -7,10 +7,9 @@ use crate::aggregation::agg_data::AggregationsSegmentCtx;
 use crate::aggregation::intermediate_agg_result::{
    IntermediateAggregationResult, IntermediateAggregationResults, IntermediateMetricResult,
 };
-use crate::aggregation::metric::MetricAggReqData;
 use crate::aggregation::segment_agg_result::SegmentAggregationCollector;
 use crate::aggregation::*;
-use crate::{DocId, TantivyError};
+use crate::TantivyError;

 /// # Percentiles
 ///
@@ -131,10 +130,16 @@ impl PercentilesAggregationReq {
    }
 }

-#[derive(Clone, Debug, PartialEq)]
+#[derive(Clone, Debug)]
 pub(crate) struct SegmentPercentilesCollector {
-    pub(crate) percentiles: PercentilesCollector,
+    pub(crate) buckets: Vec<PercentilesCollector>,
    pub(crate) accessor_idx: usize,
+    /// The type of the field.
+    pub field_type: ColumnType,
+    /// The missing value normalized to the internal u64 representation of the field type.
+    pub missing_u64: Option<u64>,
+    /// The column accessor to access the fast field values.
+    pub accessor: Column<u64>,
 }

 #[derive(Clone, Serialize, Deserialize)]
@@ -229,33 +234,18 @@ impl PercentilesCollector {
 }

 impl SegmentPercentilesCollector {
-    pub fn from_req_and_validate(accessor_idx: usize) -> crate::Result<Self> {
-        Ok(Self {
-            percentiles: PercentilesCollector::new(),
+    pub fn from_req_and_validate(
+        field_type: ColumnType,
+        missing_u64: Option<u64>,
+        accessor: Column<u64>,
+        accessor_idx: usize,
+    ) -> Self {
+        Self {
+            buckets: Vec::with_capacity(64),
+            field_type,
+            missing_u64,
+            accessor,
            accessor_idx,
-        })
-    }
-    #[inline]
-    pub(crate) fn collect_block_with_field(
-        &mut self,
-        docs: &[DocId],
-        req_data: &mut MetricAggReqData,
-    ) {
-        if let Some(missing) = req_data.missing_u64.as_ref() {
-            req_data.column_block_accessor.fetch_block_with_missing(
-                docs,
-                &req_data.accessor,
-                *missing,
-            );
-        } else {
-            req_data
-                .column_block_accessor
-                .fetch_block(docs, &req_data.accessor);
-        }
-
-        for val in req_data.column_block_accessor.iter_vals() {
-            let val1 = f64_from_fastfield_u64(val, &req_data.field_type);
-            self.percentiles.collect(val1);
        }
    }
 }
@@ -263,12 +253,18 @@ impl SegmentPercentilesCollector {
 impl SegmentAggregationCollector for SegmentPercentilesCollector {
    #[inline]
    fn add_intermediate_aggregation_result(
-        self: Box<Self>,
+        &mut self,
        agg_data: &AggregationsSegmentCtx,
        results: &mut IntermediateAggregationResults,
+        parent_bucket_id: BucketId,
    ) -> crate::Result<()> {
        let name = agg_data.get_metric_req_data(self.accessor_idx).name.clone();
-        let intermediate_metric_result = IntermediateMetricResult::Percentiles(self.percentiles);
+        self.prepare_max_bucket(parent_bucket_id, agg_data)?;
+        // Swap collector with an empty one to avoid cloning
+        let percentiles_collector = std::mem::take(&mut self.buckets[parent_bucket_id as usize]);
+
+        let intermediate_metric_result =
+            IntermediateMetricResult::Percentiles(percentiles_collector);

        results.push(
            name,
@@ -281,40 +277,33 @@ impl SegmentAggregationCollector for SegmentPercentilesCollector {
    #[inline]
    fn collect(
        &mut self,
-        doc: crate::DocId,
+        parent_bucket_id: BucketId,
+        docs: &[crate::DocId],
        agg_data: &mut AggregationsSegmentCtx,
    ) -> crate::Result<()> {
-        let req_data = agg_data.get_metric_req_data(self.accessor_idx);
+        let percentiles = &mut self.buckets[parent_bucket_id as usize];
+        agg_data.column_block_accessor.fetch_block_with_missing(
+            docs,
+            &self.accessor,
+            self.missing_u64,
+        );

-        if let Some(missing) = req_data.missing_u64 {
-            let mut has_val = false;
-            for val in req_data.accessor.values_for_doc(doc) {
-                let val1 = f64_from_fastfield_u64(val, &req_data.field_type);
-                self.percentiles.collect(val1);
-                has_val = true;
-            }
-            if !has_val {
-                self.percentiles
-                    .collect(f64_from_fastfield_u64(missing, &req_data.field_type));
-            }
-        } else {
-            for val in req_data.accessor.values_for_doc(doc) {
-                let val1 = f64_from_fastfield_u64(val, &req_data.field_type);
-                self.percentiles.collect(val1);
-            }
+        for val in agg_data.column_block_accessor.iter_vals() {
+            let val1 = f64_from_fastfield_u64(val, self.field_type);
+            percentiles.collect(val1);
        }

        Ok(())
    }

-    #[inline]
-    fn collect_block(
+    fn prepare_max_bucket(
        &mut self,
-        docs: &[crate::DocId],
-        agg_data: &mut AggregationsSegmentCtx,
+        max_bucket: BucketId,
+        _agg_data: &AggregationsSegmentCtx,
    ) -> crate::Result<()> {
-        let req_data = agg_data.get_metric_req_data_mut(self.accessor_idx);
-        self.collect_block_with_field(docs, req_data);
+        while self.buckets.len() <= max_bucket as usize {
+            self.buckets.push(PercentilesCollector::new());
+        }
        Ok(())
    }
 }
--- a/src/aggregation/metric/stats.rs
+++ b/src/aggregation/metric/stats.rs
@@ -1,5 +1,6 @@
 use std::fmt::Debug;

+use columnar::{Column, ColumnType};
 use serde::{Deserialize, Serialize};

 use super::*;
@@ -7,10 +8,9 @@ use crate::aggregation::agg_data::AggregationsSegmentCtx;
 use crate::aggregation::intermediate_agg_result::{
    IntermediateAggregationResult, IntermediateAggregationResults, IntermediateMetricResult,
 };
-use crate::aggregation::metric::MetricAggReqData;
 use crate::aggregation::segment_agg_result::SegmentAggregationCollector;
 use crate::aggregation::*;
-use crate::{DocId, TantivyError};
+use crate::TantivyError;

 /// A multi-value metric aggregation that computes a collection of statistics on numeric values that
 /// are extracted from the aggregated documents.
@@ -83,7 +83,7 @@ impl Stats {

 /// Intermediate result of the stats aggregation that can be combined with other intermediate
 /// results.
-#[derive(Clone, Debug, PartialEq, Serialize, Deserialize)]
+#[derive(Clone, Copy, Debug, PartialEq, Serialize, Deserialize)]
 pub struct IntermediateStats {
    /// The number of extracted values.
    pub(crate) count: u64,
@@ -110,6 +110,16 @@ impl Default for IntermediateStats {
 }

 impl IntermediateStats {
+    /// Returns the number of values collected.
+    pub fn count(&self) -> u64 {
+        self.count
+    }
+
+    /// Returns the sum of all values collected.
+    pub fn sum(&self) -> f64 {
+        self.sum
+    }
+
    /// Merges the other stats intermediate result into self.
    pub fn merge_fruits(&mut self, other: IntermediateStats) {
        self.count += other.count;
@@ -187,75 +197,75 @@ pub enum StatsType {
    Percentiles,
 }

+fn create_collector<const TYPE_ID: u8>(
+    req: &MetricAggReqData,
+) -> Box<dyn SegmentAggregationCollector> {
+    Box::new(SegmentStatsCollector::<TYPE_ID> {
+        name: req.name.clone(),
+        collecting_for: req.collecting_for,
+        is_number_or_date_type: req.is_number_or_date_type,
+        missing_u64: req.missing_u64,
+        accessor: req.accessor.clone(),
+        buckets: vec![IntermediateStats::default()],
+    })
+}
+
+/// Build a concrete `SegmentStatsCollector` depending on the column type.
+pub(crate) fn build_segment_stats_collector(
+    req: &MetricAggReqData,
+) -> crate::Result<Box<dyn SegmentAggregationCollector>> {
+    match req.field_type {
+        ColumnType::I64 => Ok(create_collector::<{ ColumnType::I64 as u8 }>(req)),
+        ColumnType::U64 => Ok(create_collector::<{ ColumnType::U64 as u8 }>(req)),
+        ColumnType::F64 => Ok(create_collector::<{ ColumnType::F64 as u8 }>(req)),
+        ColumnType::Bool => Ok(create_collector::<{ ColumnType::Bool as u8 }>(req)),
+        ColumnType::DateTime => Ok(create_collector::<{ ColumnType::DateTime as u8 }>(req)),
+        ColumnType::Bytes => Ok(create_collector::<{ ColumnType::Bytes as u8 }>(req)),
+        ColumnType::Str => Ok(create_collector::<{ ColumnType::Str as u8 }>(req)),
+        ColumnType::IpAddr => Ok(create_collector::<{ ColumnType::IpAddr as u8 }>(req)),
+    }
+}
+
+#[repr(C)]
 #[derive(Clone, Debug)]
-pub(crate) struct SegmentStatsCollector {
-    pub(crate) stats: IntermediateStats,
-    pub(crate) accessor_idx: usize,
+pub(crate) struct SegmentStatsCollector<const COLUMN_TYPE_ID: u8> {
+    pub(crate) missing_u64: Option<u64>,
+    pub(crate) accessor: Column<u64>,
+    pub(crate) is_number_or_date_type: bool,
+    pub(crate) buckets: Vec<IntermediateStats>,
+    pub(crate) name: String,
+    pub(crate) collecting_for: StatsType,
 }

-impl SegmentStatsCollector {
-    pub fn from_req(accessor_idx: usize) -> Self {
-        Self {
-            stats: IntermediateStats::default(),
-            accessor_idx,
-        }
-    }
-    #[inline]
-    pub(crate) fn collect_block_with_field(
-        &mut self,
-        docs: &[DocId],
-        req_data: &mut MetricAggReqData,
-    ) {
-        if let Some(missing) = req_data.missing_u64.as_ref() {
-            req_data.column_block_accessor.fetch_block_with_missing(
-                docs,
-                &req_data.accessor,
-                *missing,
-            );
-        } else {
-            req_data
-                .column_block_accessor
-                .fetch_block(docs, &req_data.accessor);
-        }
-        if req_data.is_number_or_date_type {
-            for val in req_data.column_block_accessor.iter_vals() {
-                let val1 = f64_from_fastfield_u64(val, &req_data.field_type);
-                self.stats.collect(val1);
-            }
-        } else {
-            for _val in req_data.column_block_accessor.iter_vals() {
-                // we ignore the value and simply record that we got something
-                self.stats.collect(0.0);
-            }
-        }
-    }
-}
-
-impl SegmentAggregationCollector for SegmentStatsCollector {
+impl<const COLUMN_TYPE_ID: u8> SegmentAggregationCollector
+    for SegmentStatsCollector<COLUMN_TYPE_ID>
+{
    #[inline]
    fn add_intermediate_aggregation_result(
-        self: Box<Self>,
+        &mut self,
        agg_data: &AggregationsSegmentCtx,
        results: &mut IntermediateAggregationResults,
+        parent_bucket_id: BucketId,
    ) -> crate::Result<()> {
-        let req = agg_data.get_metric_req_data(self.accessor_idx);
-        let name = req.name.clone();
+        let name = self.name.clone();

-        let intermediate_metric_result = match req.collecting_for {
+        self.prepare_max_bucket(parent_bucket_id, agg_data)?;
+        let stats = self.buckets[parent_bucket_id as usize];
+        let intermediate_metric_result = match self.collecting_for {
            StatsType::Average => {
-                IntermediateMetricResult::Average(IntermediateAverage::from_collector(*self))
+                IntermediateMetricResult::Average(IntermediateAverage::from_stats(stats))
            }
            StatsType::Count => {
-                IntermediateMetricResult::Count(IntermediateCount::from_collector(*self))
+                IntermediateMetricResult::Count(IntermediateCount::from_stats(stats))
            }
-            StatsType::Max => IntermediateMetricResult::Max(IntermediateMax::from_collector(*self)),
-            StatsType::Min => IntermediateMetricResult::Min(IntermediateMin::from_collector(*self)),
-            StatsType::Stats => IntermediateMetricResult::Stats(self.stats),
-            StatsType::Sum => IntermediateMetricResult::Sum(IntermediateSum::from_collector(*self)),
+            StatsType::Max => IntermediateMetricResult::Max(IntermediateMax::from_stats(stats)),
+            StatsType::Min => IntermediateMetricResult::Min(IntermediateMin::from_stats(stats)),
+            StatsType::Stats => IntermediateMetricResult::Stats(stats),
+            StatsType::Sum => IntermediateMetricResult::Sum(IntermediateSum::from_stats(stats)),
            _ => {
                return Err(TantivyError::InvalidArgument(format!(
                    "Unsupported stats type for stats aggregation: {:?}",
-                    req.collecting_for
+                    self.collecting_for
                )))
            }
        };
@@ -271,41 +281,67 @@ impl SegmentAggregationCollector for SegmentStatsCollector {
    #[inline]
    fn collect(
        &mut self,
-        doc: crate::DocId,
-        agg_data: &mut AggregationsSegmentCtx,
-    ) -> crate::Result<()> {
-        let req_data = agg_data.get_metric_req_data(self.accessor_idx);
-        if let Some(missing) = req_data.missing_u64 {
-            let mut has_val = false;
-            for val in req_data.accessor.values_for_doc(doc) {
-                let val1 = f64_from_fastfield_u64(val, &req_data.field_type);
-                self.stats.collect(val1);
-                has_val = true;
-            }
-            if !has_val {
-                self.stats
-                    .collect(f64_from_fastfield_u64(missing, &req_data.field_type));
-            }
-        } else {
-            for val in req_data.accessor.values_for_doc(doc) {
-                let val1 = f64_from_fastfield_u64(val, &req_data.field_type);
-                self.stats.collect(val1);
-            }
-        }
-
-        Ok(())
-    }
-
-    #[inline]
-    fn collect_block(
-        &mut self,
+        parent_bucket_id: BucketId,
        docs: &[crate::DocId],
        agg_data: &mut AggregationsSegmentCtx,
    ) -> crate::Result<()> {
-        let req_data = agg_data.get_metric_req_data_mut(self.accessor_idx);
-        self.collect_block_with_field(docs, req_data);
+        // TODO: remove once we fetch all values for all bucket ids in one go
+        if docs.len() == 1 && self.missing_u64.is_none() {
+            collect_stats::<COLUMN_TYPE_ID>(
+                &mut self.buckets[parent_bucket_id as usize],
+                self.accessor.values_for_doc(docs[0]),
+                self.is_number_or_date_type,
+            )?;
+
+            return Ok(());
+        }
+        agg_data.column_block_accessor.fetch_block_with_missing(
+            docs,
+            &self.accessor,
+            self.missing_u64,
+        );
+        collect_stats::<COLUMN_TYPE_ID>(
+            &mut self.buckets[parent_bucket_id as usize],
+            agg_data.column_block_accessor.iter_vals(),
+            self.is_number_or_date_type,
+        )?;
+
        Ok(())
    }
+
+    fn prepare_max_bucket(
+        &mut self,
+        max_bucket: BucketId,
+        _agg_data: &AggregationsSegmentCtx,
+    ) -> crate::Result<()> {
+        let required_buckets = (max_bucket as usize) + 1;
+        if self.buckets.len() < required_buckets {
+            self.buckets
+                .resize_with(required_buckets, IntermediateStats::default);
+        }
+        Ok(())
+    }
+}
+
+#[inline]
+fn collect_stats<const COLUMN_TYPE_ID: u8>(
+    stats: &mut IntermediateStats,
+    vals: impl Iterator<Item = u64>,
+    is_number_or_date_type: bool,
+) -> crate::Result<()> {
+    if is_number_or_date_type {
+        for val in vals {
+            let val1 = convert_to_f64::<COLUMN_TYPE_ID>(val);
+            stats.collect(val1);
+        }
+    } else {
+        for _val in vals {
+            // we ignore the value and simply record that we got something
+            stats.collect(0.0);
+        }
+    }
+
+    Ok(())
 }

 #[cfg(test)]
--- a/src/aggregation/metric/sum.rs
+++ b/src/aggregation/metric/sum.rs
@@ -52,10 +52,8 @@ pub struct IntermediateSum {

 impl IntermediateSum {
    /// Creates a new [`IntermediateSum`] instance from a [`SegmentStatsCollector`].
-    pub(crate) fn from_collector(collector: SegmentStatsCollector) -> Self {
-        Self {
-            stats: collector.stats,
-        }
+    pub(crate) fn from_stats(stats: IntermediateStats) -> Self {
+        Self { stats }
    }
    /// Merges the other intermediate result into self.
    pub fn merge_fruits(&mut self, other: IntermediateSum) {
--- a/src/aggregation/metric/top_hits.rs
+++ b/src/aggregation/metric/top_hits.rs
@@ -15,11 +15,11 @@ use crate::aggregation::intermediate_agg_result::{
    IntermediateAggregationResult, IntermediateMetricResult,
 };
 use crate::aggregation::segment_agg_result::SegmentAggregationCollector;
-use crate::aggregation::AggregationError;
+use crate::aggregation::{AggregationError, BucketId};
+use crate::collector::sort_key::ReverseComparator;
 use crate::collector::TopNComputer;
 use crate::schema::OwnedValue;
 use crate::{DocAddress, DocId, SegmentOrdinal};
-// duplicate import removed; already imported above

 /// Contains all information required by the TopHitsSegmentCollector to perform the
 /// top_hits aggregation on a segment.
@@ -458,7 +458,7 @@ impl Eq for DocSortValuesAndFields {}
 #[derive(Clone, Serialize, Deserialize, Debug)]
 pub struct TopHitsTopNComputer {
    req: TopHitsAggregationReq,
-    top_n: TopNComputer<DocSortValuesAndFields, DocAddress, false>,
+    top_n: TopNComputer<DocSortValuesAndFields, DocAddress, ReverseComparator>,
 }

 impl std::cmp::PartialEq for TopHitsTopNComputer {
@@ -471,7 +471,10 @@ impl TopHitsTopNComputer {
    /// Create a new TopHitsCollector
    pub fn new(req: &TopHitsAggregationReq) -> Self {
        Self {
-            top_n: TopNComputer::new(req.size + req.from.unwrap_or(0)),
+            top_n: TopNComputer::new_with_comparator(
+                req.size + req.from.unwrap_or(0),
+                ReverseComparator,
+            ),
            req: req.clone(),
        }
    }
@@ -482,7 +485,7 @@ impl TopHitsTopNComputer {

    pub(crate) fn merge_fruits(&mut self, other_fruit: Self) -> crate::Result<()> {
        for doc in other_fruit.top_n.into_vec() {
-            self.collect(doc.feature, doc.doc);
+            self.collect(doc.sort_key, doc.doc);
        }
        Ok(())
    }
@@ -494,9 +497,9 @@ impl TopHitsTopNComputer {
            .into_sorted_vec()
            .into_iter()
            .map(|doc| TopHitsVecEntry {
-                sort: doc.feature.sorts.iter().map(|f| f.value).collect(),
+                sort: doc.sort_key.sorts.iter().map(|f| f.value).collect(),
                doc_value_fields: doc
-                    .feature
+                    .sort_key
                    .doc_value_fields
                    .into_iter()
                    .map(|(k, v)| (k, v.into()))
@@ -517,7 +520,8 @@ impl TopHitsTopNComputer {
 pub(crate) struct TopHitsSegmentCollector {
    segment_ordinal: SegmentOrdinal,
    accessor_idx: usize,
-    top_n: TopNComputer<Vec<DocValueAndOrder>, DocAddress, false>,
+    buckets: Vec<TopNComputer<Vec<DocValueAndOrder>, DocAddress, ReverseComparator>>,
+    num_hits: usize,
 }

 impl TopHitsSegmentCollector {
@@ -526,25 +530,35 @@ impl TopHitsSegmentCollector {
        accessor_idx: usize,
        segment_ordinal: SegmentOrdinal,
    ) -> Self {
+        let num_hits = req.size + req.from.unwrap_or(0);
        Self {
-            top_n: TopNComputer::new(req.size + req.from.unwrap_or(0)),
+            num_hits,
            segment_ordinal,
            accessor_idx,
+            buckets: vec![TopNComputer::new_with_comparator(num_hits, ReverseComparator); 1],
        }
    }
-    fn into_top_hits_collector(
-        self,
+    fn get_top_hits_computer(
+        &mut self,
+        parent_bucket_id: BucketId,
        value_accessors: &HashMap<String, Vec<DynamicColumn>>,
        req: &TopHitsAggregationReq,
    ) -> TopHitsTopNComputer {
+        if parent_bucket_id as usize >= self.buckets.len() {
+            return TopHitsTopNComputer::new(req);
+        }
+        let top_n = std::mem::replace(
+            &mut self.buckets[parent_bucket_id as usize],
+            TopNComputer::new(0),
+        );
        let mut top_hits_computer = TopHitsTopNComputer::new(req);
-        let top_results = self.top_n.into_vec();
+        let top_results = top_n.into_vec();

        for res in top_results {
            let doc_value_fields = req.get_document_field_data(value_accessors, res.doc.doc_id);
            top_hits_computer.collect(
                DocSortValuesAndFields {
-                    sorts: res.feature,
+                    sorts: res.sort_key,
                    doc_value_fields,
                },
                res.doc,
@@ -553,54 +567,24 @@ impl TopHitsSegmentCollector {

        top_hits_computer
    }
-
-    /// TODO add a specialized variant for a single sort field
-    fn collect_with(
-        &mut self,
-        doc_id: crate::DocId,
-        req: &TopHitsAggregationReq,
-        accessors: &[(Column<u64>, ColumnType)],
-    ) -> crate::Result<()> {
-        let sorts: Vec<DocValueAndOrder> = req
-            .sort
-            .iter()
-            .enumerate()
-            .map(|(idx, KeyOrder { order, .. })| {
-                let order = *order;
-                let value = accessors
-                    .get(idx)
-                    .expect("could not find field in accessors")
-                    .0
-                    .values_for_doc(doc_id)
-                    .next();
-                DocValueAndOrder { value, order }
-            })
-            .collect();
-
-        self.top_n.push(
-            sorts,
-            DocAddress {
-                segment_ord: self.segment_ordinal,
-                doc_id,
-            },
-        );
-        Ok(())
-    }
 }

 impl SegmentAggregationCollector for TopHitsSegmentCollector {
    fn add_intermediate_aggregation_result(
-        self: Box<Self>,
+        &mut self,
        agg_data: &AggregationsSegmentCtx,
        results: &mut crate::aggregation::intermediate_agg_result::IntermediateAggregationResults,
+        parent_bucket_id: BucketId,
    ) -> crate::Result<()> {
        let req_data = agg_data.get_top_hits_req_data(self.accessor_idx);

        let value_accessors = &req_data.value_accessors;

-        let intermediate_result = IntermediateMetricResult::TopHits(
-            self.into_top_hits_collector(value_accessors, &req_data.req),
-        );
+        let intermediate_result = IntermediateMetricResult::TopHits(self.get_top_hits_computer(
+            parent_bucket_id,
+            value_accessors,
+            &req_data.req,
+        ));
        results.push(
            req_data.name.to_string(),
            IntermediateAggregationResult::Metric(intermediate_result),
@@ -610,26 +594,56 @@ impl SegmentAggregationCollector for TopHitsSegmentCollector {
    /// TODO: Consider a caching layer to reduce the call overhead
    fn collect(
        &mut self,
-        doc_id: crate::DocId,
-        agg_data: &mut AggregationsSegmentCtx,
-    ) -> crate::Result<()> {
-        let req_data = agg_data.get_top_hits_req_data(self.accessor_idx);
-        self.collect_with(doc_id, &req_data.req, &req_data.accessors)?;
-        Ok(())
-    }
-
-    fn collect_block(
-        &mut self,
+        parent_bucket_id: BucketId,
        docs: &[crate::DocId],
        agg_data: &mut AggregationsSegmentCtx,
    ) -> crate::Result<()> {
+        let top_n = &mut self.buckets[parent_bucket_id as usize];
        let req_data = agg_data.get_top_hits_req_data(self.accessor_idx);
-        // TODO: Consider getting fields with the column block accessor.
-        for doc in docs {
-            self.collect_with(*doc, &req_data.req, &req_data.accessors)?;
+        let req = &req_data.req;
+        let accessors = &req_data.accessors;
+        for &doc_id in docs {
+            // TODO: this is terrible, a new vec is allocated for every doc
+            // We can fetch blocks instead
+            // We don't need to store the order for every value
+            let sorts: Vec<DocValueAndOrder> = req
+                .sort
+                .iter()
+                .enumerate()
+                .map(|(idx, KeyOrder { order, .. })| {
+                    let order = *order;
+                    let value = accessors
+                        .get(idx)
+                        .expect("could not find field in accessors")
+                        .0
+                        .values_for_doc(doc_id)
+                        .next();
+                    DocValueAndOrder { value, order }
+                })
+                .collect();
+
+            top_n.push(
+                sorts,
+                DocAddress {
+                    segment_ord: self.segment_ordinal,
+                    doc_id,
+                },
+            );
        }
        Ok(())
    }
+
+    fn prepare_max_bucket(
+        &mut self,
+        max_bucket: BucketId,
+        _agg_data: &AggregationsSegmentCtx,
+    ) -> crate::Result<()> {
+        self.buckets.resize(
+            (max_bucket as usize) + 1,
+            TopNComputer::new_with_comparator(self.num_hits, ReverseComparator),
+        );
+        Ok(())
+    }
 }

 #[cfg(test)]
@@ -645,6 +659,7 @@ mod tests {
    use crate::aggregation::bucket::tests::get_test_index_from_docs;
    use crate::aggregation::tests::get_test_index_from_values;
    use crate::aggregation::AggregationCollector;
+    use crate::collector::sort_key::ReverseComparator;
    use crate::collector::ComparableDoc;
    use crate::query::AllQuery;
    use crate::schema::OwnedValue;
@@ -660,7 +675,7 @@ mod tests {

    fn collector_with_capacity(capacity: usize) -> super::TopHitsTopNComputer {
        super::TopHitsTopNComputer {
-            top_n: super::TopNComputer::new(capacity),
+            top_n: super::TopNComputer::new_with_comparator(capacity, ReverseComparator),
            req: Default::default(),
        }
    }
@@ -744,7 +759,7 @@ mod tests {
                    ],
                    "from": 0,
                }
-        }
+            }
        }))
        .unwrap();

@@ -774,12 +789,12 @@ mod tests {
    #[test]
    fn test_top_hits_collector_single_feature() -> crate::Result<()> {
        let docs = vec![
-            ComparableDoc::<_, _, false> {
+            ComparableDoc::<_, _> {
                doc: crate::DocAddress {
                    segment_ord: 0,
                    doc_id: 0,
                },
-                feature: DocSortValuesAndFields {
+                sort_key: DocSortValuesAndFields {
                    sorts: vec![DocValueAndOrder {
                        value: Some(1),
                        order: Order::Asc,
@@ -792,7 +807,7 @@ mod tests {
                    segment_ord: 0,
                    doc_id: 2,
                },
-                feature: DocSortValuesAndFields {
+                sort_key: DocSortValuesAndFields {
                    sorts: vec![DocValueAndOrder {
                        value: Some(3),
                        order: Order::Asc,
@@ -805,7 +820,7 @@ mod tests {
                    segment_ord: 0,
                    doc_id: 1,
                },
-                feature: DocSortValuesAndFields {
+                sort_key: DocSortValuesAndFields {
                    sorts: vec![DocValueAndOrder {
                        value: Some(5),
                        order: Order::Asc,
@@ -817,7 +832,7 @@ mod tests {

        let mut collector = collector_with_capacity(3);
        for doc in docs.clone() {
-            collector.collect(doc.feature, doc.doc);
+            collector.collect(doc.sort_key, doc.doc);
        }

        let res = collector.into_final_result();
@@ -827,15 +842,15 @@ mod tests {
            super::TopHitsMetricResult {
                hits: vec![
                    super::TopHitsVecEntry {
-                        sort: vec![docs[0].feature.sorts[0].value],
+                        sort: vec![docs[0].sort_key.sorts[0].value],
                        doc_value_fields: Default::default(),
                    },
                    super::TopHitsVecEntry {
-                        sort: vec![docs[1].feature.sorts[0].value],
+                        sort: vec![docs[1].sort_key.sorts[0].value],
                        doc_value_fields: Default::default(),
                    },
                    super::TopHitsVecEntry {
-                        sort: vec![docs[2].feature.sorts[0].value],
+                        sort: vec![docs[2].sort_key.sorts[0].value],
                        doc_value_fields: Default::default(),
                    },
                ]
@@ -873,7 +888,7 @@ mod tests {
                        "mixed.*",
                    ],
                }
-        }
+            }
        }))?;

        let collector = AggregationCollector::from_aggs(d, Default::default());
--- a/src/aggregation/mod.rs
+++ b/src/aggregation/mod.rs
@@ -133,7 +133,7 @@ mod agg_limits;
 pub mod agg_req;
 pub mod agg_result;
 pub mod bucket;
-mod buf_collector;
+pub(crate) mod cached_sub_aggs;
 mod collector;
 mod date;
 mod error;
@@ -162,6 +162,19 @@ use serde::{Deserialize, Deserializer, Serialize};

 use crate::tokenizer::TokenizerManager;

+/// A bucket id is a dense identifier for a bucket within an aggregation.
+/// It is used to index into a Vec that hold per-bucket data.
+///
+/// For example, in a terms aggregation, each unique term will be assigned a incremental BucketId.
+/// This BucketId will be forwarded to sub-aggregations to identify the parent bucket.
+///
+/// This allows to have a single AggregationCollector instance per aggregation,
+/// that can handle multiple buckets efficiently.
+///
+/// The API to call sub-aggregations is therefore a &[(BucketId, &[DocId])].
+/// For that we'll need a buffer. One Vec per bucket aggregation is needed.
+pub type BucketId = u32;
+
 /// Context parameters for aggregation execution
 ///
 /// This struct holds shared resources needed during aggregation execution:
@@ -335,19 +348,37 @@ impl Display for Key {
    }
 }

+pub(crate) fn convert_to_f64<const COLUMN_TYPE_ID: u8>(val: u64) -> f64 {
+    if COLUMN_TYPE_ID == ColumnType::U64 as u8 {
+        val as f64
+    } else if COLUMN_TYPE_ID == ColumnType::I64 as u8
+        || COLUMN_TYPE_ID == ColumnType::DateTime as u8
+    {
+        i64::from_u64(val) as f64
+    } else if COLUMN_TYPE_ID == ColumnType::F64 as u8 {
+        f64::from_u64(val)
+    } else if COLUMN_TYPE_ID == ColumnType::Bool as u8 {
+        val as f64
+    } else {
+        panic!(
+            "ColumnType ID {} cannot be converted to f64 metric",
+            COLUMN_TYPE_ID
+        )
+    }
+}
+
 /// Inverse of `to_fastfield_u64`. Used to convert to `f64` for metrics.
 ///
 /// # Panics
 /// Only `u64`, `f64`, `date`, and `i64` are supported.
-pub(crate) fn f64_from_fastfield_u64(val: u64, field_type: &ColumnType) -> f64 {
+pub(crate) fn f64_from_fastfield_u64(val: u64, field_type: ColumnType) -> f64 {
    match field_type {
-        ColumnType::U64 => val as f64,
-        ColumnType::I64 | ColumnType::DateTime => i64::from_u64(val) as f64,
-        ColumnType::F64 => f64::from_u64(val),
-        ColumnType::Bool => val as f64,
-        _ => {
-            panic!("unexpected type {field_type:?}. This should not happen")
-        }
+        ColumnType::U64 => convert_to_f64::<{ ColumnType::U64 as u8 }>(val),
+        ColumnType::I64 => convert_to_f64::<{ ColumnType::I64 as u8 }>(val),
+        ColumnType::F64 => convert_to_f64::<{ ColumnType::F64 as u8 }>(val),
+        ColumnType::Bool => convert_to_f64::<{ ColumnType::Bool as u8 }>(val),
+        ColumnType::DateTime => convert_to_f64::<{ ColumnType::DateTime as u8 }>(val),
+        _ => panic!("unexpected type {field_type:?}. This should not happen"),
    }
 }

--- a/src/aggregation/segment_agg_result.rs
+++ b/src/aggregation/segment_agg_result.rs
@@ -8,30 +8,69 @@ use std::fmt::Debug;
 pub(crate) use super::agg_limits::AggregationLimitsGuard;
 use super::intermediate_agg_result::IntermediateAggregationResults;
 use crate::aggregation::agg_data::AggregationsSegmentCtx;
+use crate::aggregation::BucketId;
+
+/// Monotonically increasing provider of BucketIds.
+#[derive(Debug, Clone, Default)]
+pub struct BucketIdProvider(u32);
+impl BucketIdProvider {
+    /// Get the next BucketId.
+    pub fn next_bucket_id(&mut self) -> BucketId {
+        let bucket_id = self.0;
+        self.0 += 1;
+        bucket_id
+    }
+}

 /// A SegmentAggregationCollector is used to collect aggregation results.
-pub trait SegmentAggregationCollector: CollectorClone + Debug {
+pub trait SegmentAggregationCollector: Debug {
    fn add_intermediate_aggregation_result(
-        self: Box<Self>,
+        &mut self,
        agg_data: &AggregationsSegmentCtx,
        results: &mut IntermediateAggregationResults,
+        parent_bucket_id: BucketId,
    ) -> crate::Result<()>;

-    #[inline]
+    /// Note: The caller needs to call `prepare_max_bucket` before calling `collect`.
    fn collect(
        &mut self,
-        doc: crate::DocId,
-        agg_data: &mut AggregationsSegmentCtx,
-    ) -> crate::Result<()> {
-        self.collect_block(&[doc], agg_data)
-    }
-
-    fn collect_block(
-        &mut self,
+        parent_bucket_id: BucketId,
        docs: &[crate::DocId],
        agg_data: &mut AggregationsSegmentCtx,
    ) -> crate::Result<()>;

+    /// Collect docs for multiple buckets in one call.
+    /// Minimizes dynamic dispatch overhead when collecting many buckets.
+    ///
+    /// Note: The caller needs to call `prepare_max_bucket` before calling `collect`.
+    fn collect_multiple(
+        &mut self,
+        bucket_ids: &[BucketId],
+        docs: &[crate::DocId],
+        agg_data: &mut AggregationsSegmentCtx,
+    ) -> crate::Result<()> {
+        debug_assert_eq!(bucket_ids.len(), docs.len());
+        let mut start = 0;
+        while start < bucket_ids.len() {
+            let bucket_id = bucket_ids[start];
+            let mut end = start + 1;
+            while end < bucket_ids.len() && bucket_ids[end] == bucket_id {
+                end += 1;
+            }
+            self.collect(bucket_id, &docs[start..end], agg_data)?;
+            start = end;
+        }
+        Ok(())
+    }
+
+    /// Prepare the collector for collecting up to BucketId `max_bucket`.
+    /// This is useful so we can split allocation ahead of time of collecting.
+    fn prepare_max_bucket(
+        &mut self,
+        max_bucket: BucketId,
+        agg_data: &AggregationsSegmentCtx,
+    ) -> crate::Result<()>;
+
    /// Finalize method. Some Aggregator collect blocks of docs before calling `collect_block`.
    /// This method ensures those staged docs will be collected.
    fn flush(&mut self, _agg_data: &mut AggregationsSegmentCtx) -> crate::Result<()> {
@@ -39,26 +78,7 @@ pub trait SegmentAggregationCollector: CollectorClone + Debug {
    }
 }

-/// A helper trait to enable cloning of Box<dyn SegmentAggregationCollector>
-pub trait CollectorClone {
-    fn clone_box(&self) -> Box<dyn SegmentAggregationCollector>;
-}
-
-impl<T> CollectorClone for T
-where T: 'static + SegmentAggregationCollector + Clone
-{
-    fn clone_box(&self) -> Box<dyn SegmentAggregationCollector> {
-        Box::new(self.clone())
-    }
-}
-
-impl Clone for Box<dyn SegmentAggregationCollector> {
-    fn clone(&self) -> Box<dyn SegmentAggregationCollector> {
-        self.clone_box()
-    }
-}
-
-#[derive(Clone, Default)]
+#[derive(Default)]
 /// The GenericSegmentAggregationResultsCollector is the generic version of the collector, which
 /// can handle arbitrary complexity of  sub-aggregations. Ideally we never have to pick this one
 /// and can provide specialized versions instead, that remove some of its overhead.
@@ -76,12 +96,13 @@ impl Debug for GenericSegmentAggregationResultsCollector {

 impl SegmentAggregationCollector for GenericSegmentAggregationResultsCollector {
    fn add_intermediate_aggregation_result(
-        self: Box<Self>,
+        &mut self,
        agg_data: &AggregationsSegmentCtx,
        results: &mut IntermediateAggregationResults,
+        parent_bucket_id: BucketId,
    ) -> crate::Result<()> {
-        for agg in self.aggs {
-            agg.add_intermediate_aggregation_result(agg_data, results)?;
+        for agg in &mut self.aggs {
+            agg.add_intermediate_aggregation_result(agg_data, results, parent_bucket_id)?;
        }

        Ok(())
@@ -89,23 +110,13 @@ impl SegmentAggregationCollector for GenericSegmentAggregationResultsCollector {

    fn collect(
        &mut self,
-        doc: crate::DocId,
-        agg_data: &mut AggregationsSegmentCtx,
-    ) -> crate::Result<()> {
-        self.collect_block(&[doc], agg_data)?;
-
-        Ok(())
-    }
-
-    fn collect_block(
-        &mut self,
+        parent_bucket_id: BucketId,
        docs: &[crate::DocId],
        agg_data: &mut AggregationsSegmentCtx,
    ) -> crate::Result<()> {
        for collector in &mut self.aggs {
-            collector.collect_block(docs, agg_data)?;
+            collector.collect(parent_bucket_id, docs, agg_data)?;
        }
-
        Ok(())
    }

@@ -115,4 +126,15 @@ impl SegmentAggregationCollector for GenericSegmentAggregationResultsCollector {
        }
        Ok(())
    }
+
+    fn prepare_max_bucket(
+        &mut self,
+        max_bucket: BucketId,
+        agg_data: &AggregationsSegmentCtx,
+    ) -> crate::Result<()> {
+        for collector in &mut self.aggs {
+            collector.prepare_max_bucket(max_bucket, agg_data)?;
+        }
+        Ok(())
+    }
 }
--- a/src/codec/mod.rs
+++ b/src/codec/mod.rs
@@ -0,0 +1,170 @@
+/// Codec specific to postings data.
+pub mod postings;
+
+/// Standard tantivy codec. This is the codec you use by default.
+pub mod standard;
+
+use std::sync::Arc;
+
+pub use standard::StandardCodec;
+
+use crate::codec::postings::PostingsCodec;
+use crate::directory::Directory;
+use crate::fastfield::AliveBitSet;
+use crate::query::score_combiner::DoNothingCombiner;
+use crate::query::term_query::TermScorer;
+use crate::query::{box_scorer, BufferedUnionScorer, Scorer, SumCombiner};
+use crate::schema::Schema;
+use crate::{DocId, Score, SegmentMeta, SegmentReader, TantivySegmentReader};
+
+/// Codecs describes how data is layed out on disk.
+///
+/// For the moment, only postings codec can be custom.
+pub trait Codec: Clone + std::fmt::Debug + Send + Sync + 'static {
+    /// The specific postings type used by this codec.
+    type PostingsCodec: PostingsCodec;
+
+    /// ID of the codec. It should be unique to your codec.
+    /// Make it human-readable, descriptive, short and unique.
+    const ID: &'static str;
+
+    /// Load codec based on the codec configuration.
+    fn from_json_props(json_value: &serde_json::Value) -> crate::Result<Self>;
+
+    /// Get codec configuration.
+    fn to_json_props(&self) -> serde_json::Value;
+
+    /// Returns the postings codec.
+    fn postings_codec(&self) -> &Self::PostingsCodec;
+
+    /// Loads postings using the codec's concrete postings type.
+    fn load_postings_typed(
+        &self,
+        reader: &dyn crate::index::InvertedIndexReader,
+        term_info: &crate::postings::TermInfo,
+        option: crate::schema::IndexRecordOption,
+    ) -> std::io::Result<<Self::PostingsCodec as crate::codec::postings::PostingsCodec>::Postings>
+    {
+        let postings_data = reader.read_raw_postings_data(term_info, option)?;
+        self.postings_codec()
+            .load_postings(term_info.doc_freq, postings_data)
+    }
+
+    /// Opens a segment reader using this codec.
+    ///
+    /// Override this if your codec uses a custom segment reader implementation.
+    fn open_segment_reader(
+        &self,
+        directory: &dyn Directory,
+        segment_meta: &SegmentMeta,
+        schema: Schema,
+        custom_bitset: Option<AliveBitSet>,
+    ) -> crate::Result<Arc<dyn SegmentReader>> {
+        let codec: Arc<dyn ObjectSafeCodec> = Arc::new(self.clone());
+        let reader = TantivySegmentReader::open_with_custom_alive_set_from_directory(
+            directory,
+            segment_meta,
+            schema,
+            codec,
+            custom_bitset,
+        )?;
+        Ok(Arc::new(reader))
+    }
+}
+
+/// Object-safe codec is a Codec that can be used in a trait object.
+///
+/// The point of it is to offer a way to use a codec without a proliferation of generics.
+pub trait ObjectSafeCodec: 'static + Send + Sync {
+    /// Performs a for_each_pruning operation on the given scorer.
+    ///
+    /// The function will go through matching documents and call the callback
+    /// function for all docs with a score exceeding the threshold.
+    ///
+    /// The function itself will return a larger threshold value,
+    /// meant to update the threshold value.
+    ///
+    /// If the codec and the scorer allow it, this function can rely on
+    /// optimizations like the block-max wand.
+    fn for_each_pruning(
+        &self,
+        threshold: Score,
+        scorer: Box<dyn Scorer>,
+        callback: &mut dyn FnMut(DocId, Score) -> Score,
+    );
+
+    /// Builds a union scorer possibly specialized if
+    /// all scorers are `Term<Self::Postings>`.
+    fn build_union_scorer_with_sum_combiner(
+        &self,
+        scorers: Vec<Box<dyn Scorer>>,
+        num_docs: DocId,
+        score_combiner_type: SumOrDoNothingCombiner,
+    ) -> Box<dyn Scorer>;
+}
+
+impl<TCodec: Codec> ObjectSafeCodec for TCodec {
+    fn build_union_scorer_with_sum_combiner(
+        &self,
+        scorers: Vec<Box<dyn Scorer>>,
+        num_docs: DocId,
+        sum_or_do_nothing_combiner: SumOrDoNothingCombiner,
+    ) -> Box<dyn Scorer> {
+        if !scorers.iter().all(|scorer| {
+            scorer.is::<TermScorer<<<Self as Codec>::PostingsCodec as PostingsCodec>::Postings>>()
+        }) {
+            return box_scorer(BufferedUnionScorer::build(
+                scorers,
+                SumCombiner::default,
+                num_docs,
+            ));
+        }
+        let specialized_scorers: Vec<
+            TermScorer<<<Self as Codec>::PostingsCodec as PostingsCodec>::Postings>,
+        > = scorers
+            .into_iter()
+            .map(|scorer| {
+                *scorer.downcast::<TermScorer<_>>().ok().expect(
+                    "Downcast failed despite the fact we already checked the type was correct",
+                )
+            })
+            .collect();
+        match sum_or_do_nothing_combiner {
+            SumOrDoNothingCombiner::Sum => box_scorer(BufferedUnionScorer::build(
+                specialized_scorers,
+                SumCombiner::default,
+                num_docs,
+            )),
+            SumOrDoNothingCombiner::DoNothing => box_scorer(BufferedUnionScorer::build(
+                specialized_scorers,
+                DoNothingCombiner::default,
+                num_docs,
+            )),
+        }
+    }
+
+    fn for_each_pruning(
+        &self,
+        threshold: Score,
+        scorer: Box<dyn Scorer>,
+        callback: &mut dyn FnMut(DocId, Score) -> Score,
+    ) {
+        let accerelerated_foreach_pruning_res =
+            <TCodec as Codec>::PostingsCodec::try_accelerated_for_each_pruning(
+                threshold, scorer, callback,
+            );
+        if let Err(mut scorer) = accerelerated_foreach_pruning_res {
+            // No acceleration available. We need to do things manually.
+            scorer.for_each_pruning(threshold, callback);
+        }
+    }
+}
+
+/// SumCombiner or DoNothingCombiner
+#[derive(Copy, Clone)]
+pub enum SumOrDoNothingCombiner {
+    /// Sum scores together
+    Sum,
+    /// Do not track any score.
+    DoNothing,
+}
--- a/src/query/boolean_query/block_wand.rs
+++ b/src/query/boolean_query/block_wand.rs
@@ -1,5 +1,6 @@
 use std::ops::{Deref, DerefMut};

+use crate::codec::postings::PostingsWithBlockMax;
 use crate::query::term_query::TermScorer;
 use crate::query::Scorer;
 use crate::{DocId, DocSet, Score, TERMINATED};
@@ -13,8 +14,8 @@ use crate::{DocId, DocSet, Score, TERMINATED};
 /// We always have `before_pivot_len` < `pivot_len`.
 ///
 /// `None` is returned if we establish that no document can exceed the threshold.
-fn find_pivot_doc(
-    term_scorers: &[TermScorerWithMaxScore],
+fn find_pivot_doc<TPostings: PostingsWithBlockMax>(
+    term_scorers: &[TermScorerWithMaxScore<TPostings>],
    threshold: Score,
 ) -> Option<(usize, usize, DocId)> {
    let mut max_score = 0.0;
@@ -46,8 +47,8 @@ fn find_pivot_doc(
 /// the next doc candidate defined by the min of `last_doc_in_block + 1` for
 /// scorer in scorers[..pivot_len] and `scorer.doc()` for scorer in scorers[pivot_len..].
 /// Note: before and after calling this method, scorers need to be sorted by their `.doc()`.
-fn block_max_was_too_low_advance_one_scorer(
-    scorers: &mut [TermScorerWithMaxScore],
+fn block_max_was_too_low_advance_one_scorer<TPostings: PostingsWithBlockMax>(
+    scorers: &mut [TermScorerWithMaxScore<TPostings>],
    pivot_len: usize,
 ) {
    debug_assert!(is_sorted(scorers.iter().map(|scorer| scorer.doc())));
@@ -82,7 +83,10 @@ fn block_max_was_too_low_advance_one_scorer(
 // Given a list of term_scorers and a `ord` and assuming that `term_scorers[ord]` is sorted
 // except term_scorers[ord] that might be in advance compared to its ranks,
 // bubble up term_scorers[ord] in order to restore the ordering.
-fn restore_ordering(term_scorers: &mut [TermScorerWithMaxScore], ord: usize) {
+fn restore_ordering<TPostings: PostingsWithBlockMax>(
+    term_scorers: &mut [TermScorerWithMaxScore<TPostings>],
+    ord: usize,
+) {
    let doc = term_scorers[ord].doc();
    for i in ord + 1..term_scorers.len() {
        if term_scorers[i].doc() >= doc {
@@ -97,9 +101,10 @@ fn restore_ordering(term_scorers: &mut [TermScorerWithMaxScore], ord: usize) {
 // If this works, return true.
 // If this fails (ie: one of the term_scorer does not contain `pivot_doc` and seek goes past the
 // pivot), reorder the term_scorers to ensure the list is still sorted and returns `false`.
-// If a term_scorer reach TERMINATED in the process return false remove the term_scorer and return.
-fn align_scorers(
-    term_scorers: &mut Vec<TermScorerWithMaxScore>,
+// If a term_scorer reach TERMINATED in the process return false remove the term_scorer and
+// return.
+fn align_scorers<TPostings: PostingsWithBlockMax>(
+    term_scorers: &mut Vec<TermScorerWithMaxScore<TPostings>>,
    pivot_doc: DocId,
    before_pivot_len: usize,
 ) -> bool {
@@ -126,7 +131,10 @@ fn align_scorers(
 // Assumes terms_scorers[..pivot_len] are positioned on the same doc (pivot_doc).
 // Advance term_scorers[..pivot_len] and out of these removes the terminated scores.
 // Restores the ordering of term_scorers.
-fn advance_all_scorers_on_pivot(term_scorers: &mut Vec<TermScorerWithMaxScore>, pivot_len: usize) {
+fn advance_all_scorers_on_pivot<TPostings: PostingsWithBlockMax>(
+    term_scorers: &mut Vec<TermScorerWithMaxScore<TPostings>>,
+    pivot_len: usize,
+) {
    for term_scorer in &mut term_scorers[..pivot_len] {
        term_scorer.advance();
    }
@@ -145,12 +153,12 @@ fn advance_all_scorers_on_pivot(term_scorers: &mut Vec<TermScorerWithMaxScore>,
 /// Implements the WAND (Weak AND) algorithm for dynamic pruning
 /// described in the paper "Faster Top-k Document Retrieval Using Block-Max Indexes".
 /// Link: <http://engineering.nyu.edu/~suel/papers/bmw.pdf>
-pub fn block_wand(
-    mut scorers: Vec<TermScorer>,
+pub fn block_wand<TPostings: PostingsWithBlockMax>(
+    mut scorers: Vec<TermScorer<TPostings>>,
    mut threshold: Score,
    callback: &mut dyn FnMut(u32, Score) -> Score,
 ) {
-    let mut scorers: Vec<TermScorerWithMaxScore> = scorers
+    let mut scorers: Vec<TermScorerWithMaxScore<TPostings>> = scorers
        .iter_mut()
        .map(TermScorerWithMaxScore::from)
        .collect();
@@ -166,10 +174,7 @@ pub fn block_wand(

        let block_max_score_upperbound: Score = scorers[..pivot_len]
            .iter_mut()
-            .map(|scorer| {
-                scorer.seek_block(pivot_doc);
-                scorer.block_max_score()
-            })
+            .map(|scorer| scorer.seek_block_max(pivot_doc))
            .sum();

        // Beware after shallow advance, skip readers can be in advance compared to
@@ -220,21 +225,22 @@ pub fn block_wand(
 ///   - On a block, advance until the end and execute `callback` when the doc score is greater or
 ///     equal to the `threshold`.
 pub fn block_wand_single_scorer(
-    mut scorer: TermScorer,
+    mut scorer: TermScorer<impl PostingsWithBlockMax>,
    mut threshold: Score,
    callback: &mut dyn FnMut(u32, Score) -> Score,
 ) {
    let mut doc = scorer.doc();
+    let mut block_max_score = scorer.seek_block_max(doc);
    loop {
        // We position the scorer on a block that can reach
        // the threshold.
-        while scorer.block_max_score() < threshold {
+        while block_max_score < threshold {
            let last_doc_in_block = scorer.last_doc_in_block();
            if last_doc_in_block == TERMINATED {
                return;
            }
            doc = last_doc_in_block + 1;
-            scorer.seek_block(doc);
+            block_max_score = scorer.seek_block_max(doc);
        }
        // Seek will effectively load that block.
        doc = scorer.seek(doc);
@@ -256,31 +262,33 @@ pub fn block_wand_single_scorer(
            }
        }
        doc += 1;
-        scorer.seek_block(doc);
+        block_max_score = scorer.seek_block_max(doc);
    }
 }

-struct TermScorerWithMaxScore<'a> {
-    scorer: &'a mut TermScorer,
+struct TermScorerWithMaxScore<'a, TPostings: PostingsWithBlockMax> {
+    scorer: &'a mut TermScorer<TPostings>,
    max_score: Score,
 }

-impl<'a> From<&'a mut TermScorer> for TermScorerWithMaxScore<'a> {
-    fn from(scorer: &'a mut TermScorer) -> Self {
+impl<'a, TPostings: PostingsWithBlockMax> From<&'a mut TermScorer<TPostings>>
+    for TermScorerWithMaxScore<'a, TPostings>
+{
+    fn from(scorer: &'a mut TermScorer<TPostings>) -> Self {
        let max_score = scorer.max_score();
        TermScorerWithMaxScore { scorer, max_score }
    }
 }

-impl Deref for TermScorerWithMaxScore<'_> {
-    type Target = TermScorer;
+impl<TPostings: PostingsWithBlockMax> Deref for TermScorerWithMaxScore<'_, TPostings> {
+    type Target = TermScorer<TPostings>;

    fn deref(&self) -> &Self::Target {
        self.scorer
    }
 }

-impl DerefMut for TermScorerWithMaxScore<'_> {
+impl<TPostings: PostingsWithBlockMax> DerefMut for TermScorerWithMaxScore<'_, TPostings> {
    fn deref_mut(&mut self) -> &mut Self::Target {
        self.scorer
    }
@@ -483,7 +491,7 @@ mod tests {
            let checkpoints_for_each_pruning =
                compute_checkpoints_for_each_pruning(term_scorers.clone(), top_k);
            let checkpoints_manual =
-                compute_checkpoints_manual(term_scorers.clone(), top_k, 100_000);
+                compute_checkpoints_manual(term_scorers.clone(), top_k, max_doc as u32);
            assert_eq!(checkpoints_for_each_pruning.len(), checkpoints_manual.len());
            for (&(left_doc, left_score), &(right_doc, right_score)) in checkpoints_for_each_pruning
                .iter()
--- a/src/codec/postings/mod.rs
+++ b/src/codec/postings/mod.rs
@@ -0,0 +1,75 @@
+/// Block-max WAND algorithm.
+pub mod block_wand;
+use std::io;
+
+use common::OwnedBytes;
+
+use crate::fieldnorm::FieldNormReader;
+use crate::postings::Postings;
+use crate::query::{Bm25Weight, Scorer};
+use crate::schema::IndexRecordOption;
+use crate::{DocId, Score};
+
+/// Postings codec (read path).
+pub trait PostingsCodec: Send + Sync + 'static {
+    /// Postings type for the postings codec.
+    type Postings: Postings + Clone;
+
+    /// Load postings from raw bytes and metadata.
+    fn load_postings(
+        &self,
+        doc_freq: u32,
+        postings_data: RawPostingsData,
+    ) -> io::Result<Self::Postings>;
+
+    /// If your codec supports different ways to accelerate `for_each_pruning` that's
+    /// where you should implement it.
+    ///
+    /// Returning `Err(scorer)` without mutating the scorer nor calling the callback function,
+    /// is never "wrong". It just leaves the responsability to the caller to call a fallback
+    /// implementation on the scorer.
+    ///
+    /// If your codec supports BlockMax-Wand, you just need to have your
+    /// postings implement `PostingsWithBlockMax` and copy what is done in the StandardPostings
+    /// codec to enable it.
+    fn try_accelerated_for_each_pruning(
+        _threshold: Score,
+        scorer: Box<dyn Scorer>,
+        _callback: &mut dyn FnMut(DocId, Score) -> Score,
+    ) -> Result<(), Box<dyn Scorer>> {
+        Err(scorer)
+    }
+}
+
+/// Raw postings bytes and metadata read from storage.
+#[derive(Debug, Clone)]
+pub struct RawPostingsData {
+    /// Raw postings bytes for the term.
+    pub postings_data: OwnedBytes,
+    /// Raw positions bytes for the term, if positions are available.
+    pub positions_data: Option<OwnedBytes>,
+    /// Record option of the indexed field.
+    pub record_option: IndexRecordOption,
+    /// Effective record option after downgrading to the indexed field capability.
+    pub effective_option: IndexRecordOption,
+}
+
+/// A light complement interface to Postings to allow block-max wand acceleration.
+pub trait PostingsWithBlockMax: Postings {
+    /// Moves the postings to the block containign `target_doc` and returns
+    /// an upperbound of the score for documents in the block.
+    ///
+    /// `Warning`: Calling this method may leave the postings in an invalid state.
+    /// callers are required to call seek before calling any other of the
+    /// `Postings` method (like doc / advance etc.).
+    fn seek_block_max(
+        &mut self,
+        target_doc: crate::DocId,
+        fieldnorm_reader: &FieldNormReader,
+        similarity_weight: &Bm25Weight,
+    ) -> Score;
+
+    /// Returns the last document in the current block (or Terminated if this
+    /// is the last block).
+    fn last_doc_in_block(&self) -> crate::DocId;
+}
--- a/src/codec/standard/mod.rs
+++ b/src/codec/standard/mod.rs
@@ -0,0 +1,35 @@
+use serde::{Deserialize, Serialize};
+
+use crate::codec::standard::postings::StandardPostingsCodec;
+use crate::codec::Codec;
+
+/// Tantivy's default postings codec.
+pub mod postings;
+
+/// Tantivy's default codec.
+#[derive(Debug, Default, Clone, Serialize, Deserialize)]
+pub struct StandardCodec;
+
+impl Codec for StandardCodec {
+    type PostingsCodec = StandardPostingsCodec;
+
+    const ID: &'static str = "tantivy-default";
+
+    fn from_json_props(json_value: &serde_json::Value) -> crate::Result<Self> {
+        if !json_value.is_null() {
+            return Err(crate::TantivyError::InvalidArgument(format!(
+                "Codec property for the StandardCodec are unexpected. expected null, got {}",
+                json_value.as_str().unwrap_or("null")
+            )));
+        }
+        Ok(StandardCodec)
+    }
+
+    fn to_json_props(&self) -> serde_json::Value {
+        serde_json::Value::Null
+    }
+
+    fn postings_codec(&self) -> &Self::PostingsCodec {
+        &StandardPostingsCodec
+    }
+}
--- a/src/codec/standard/postings/block_segment_postings.rs
+++ b/src/codec/standard/postings/block_segment_postings.rs
@@ -1,28 +1,19 @@
 use std::io;

-use common::VInt;
+use common::{OwnedBytes, VInt};

-use crate::directory::{FileSlice, OwnedBytes};
+use crate::codec::standard::postings::FreqReadingOption;
 use crate::fieldnorm::FieldNormReader;
-use crate::postings::compression::{BlockDecoder, VIntDecoder, COMPRESSION_BLOCK_SIZE};
-use crate::postings::{BlockInfo, FreqReadingOption, SkipReader};
+use crate::postings::compression::{BlockDecoder, VIntDecoder as _, COMPRESSION_BLOCK_SIZE};
+use crate::postings::skip::{BlockInfo, SkipReader};
 use crate::query::Bm25Weight;
 use crate::schema::IndexRecordOption;
 use crate::{DocId, Score, TERMINATED};

-fn max_score<I: Iterator<Item = Score>>(mut it: I) -> Option<Score> {
-    it.next().map(|first| it.fold(first, Score::max))
-}
-
 /// `BlockSegmentPostings` is a cursor iterating over blocks
 /// of documents.
-///
-/// # Warning
-///
-/// While it is useful for some very specific high-performance
-/// use cases, you should prefer using `SegmentPostings` for most usage.
 #[derive(Clone)]
-pub struct BlockSegmentPostings {
+pub(crate) struct BlockSegmentPostings {
    pub(crate) doc_decoder: BlockDecoder,
    block_loaded: bool,
    freq_decoder: BlockDecoder,
@@ -88,7 +79,7 @@ fn split_into_skips_and_postings(
 }

 impl BlockSegmentPostings {
-    /// Opens a `BlockSegmentPostings`.
+    /// Opens a `StandardPostingsReader`.
    /// `doc_freq` is the number of documents in the posting list.
    /// `record_option` represents the amount of data available according to the schema.
    /// `requested_option` is the amount of data requested by the user.
@@ -96,11 +87,10 @@ impl BlockSegmentPostings {
    /// term frequency blocks.
    pub(crate) fn open(
        doc_freq: u32,
-        data: FileSlice,
+        bytes: OwnedBytes,
        mut record_option: IndexRecordOption,
        requested_option: IndexRecordOption,
    ) -> io::Result<BlockSegmentPostings> {
-        let bytes = data.read_bytes()?;
        let (skip_data_opt, postings_data) = split_into_skips_and_postings(doc_freq, bytes)?;
        let skip_reader = match skip_data_opt {
            Some(skip_data) => {
@@ -138,6 +128,86 @@ impl BlockSegmentPostings {
        block_segment_postings.load_block();
        Ok(block_segment_postings)
    }
+}
+
+fn max_score<I: Iterator<Item = Score>>(mut it: I) -> Option<Score> {
+    it.next().map(|first| it.fold(first, Score::max))
+}
+
+impl BlockSegmentPostings {
+    /// Returns the overall number of documents in the block postings.
+    /// It does not take in account whether documents are deleted or not.
+    ///
+    /// This `doc_freq` is simply the sum of the length of all of the blocks
+    /// length, and it does not take in account deleted documents.
+    pub fn doc_freq(&self) -> u32 {
+        self.doc_freq
+    }
+
+    /// Returns the array of docs in the current block.
+    ///
+    /// Before the first call to `.advance()`, the block
+    /// returned by `.docs()` is empty.
+    #[inline]
+    pub fn docs(&self) -> &[DocId] {
+        debug_assert!(self.block_loaded);
+        self.doc_decoder.output_array()
+    }
+
+    /// Return the document at index `idx` of the block.
+    #[inline]
+    pub fn doc(&self, idx: usize) -> u32 {
+        self.doc_decoder.output(idx)
+    }
+
+    /// Return the array of `term freq` in the block.
+    #[inline]
+    pub fn freqs(&self) -> &[u32] {
+        debug_assert!(self.block_loaded);
+        self.freq_decoder.output_array()
+    }
+
+    /// Return the frequency at index `idx` of the block.
+    #[inline]
+    pub fn freq(&self, idx: usize) -> u32 {
+        debug_assert!(self.block_loaded);
+        self.freq_decoder.output(idx)
+    }
+
+    /// Position on a block that may contains `target_doc`.
+    ///
+    /// If all docs are smaller than target, the block loaded may be empty,
+    /// or be the last an incomplete VInt block.
+    pub fn seek(&mut self, target_doc: DocId) -> usize {
+        // Move to the block that might contain our document.
+        self.seek_block_without_loading(target_doc);
+        self.load_block();
+
+        // At this point we are on the block that might contain our document.
+        let doc = self.doc_decoder.seek_within_block(target_doc);
+
+        // The last block is not full and padded with TERMINATED,
+        // so we are guaranteed to have at least one value (real or padding)
+        // that is >= target_doc.
+        debug_assert!(doc < COMPRESSION_BLOCK_SIZE);
+
+        // `doc` is now the first element >= `target_doc`.
+        // If all docs are smaller than target, the current block is incomplete and padded
+        // with TERMINATED. After the search, the cursor points to the first TERMINATED.
+        doc
+    }
+
+    pub fn position_offset(&self) -> u64 {
+        self.skip_reader.position_offset()
+    }
+
+    /// Advance to the next block.
+    pub fn advance(&mut self) {
+        self.skip_reader.advance();
+        self.block_loaded = false;
+        self.block_max_score_cache = None;
+        self.load_block();
+    }

    /// Returns the block_max_score for the current block.
    /// It does not require the block to be loaded. For instance, it is ok to call this method
@@ -160,7 +230,7 @@ impl BlockSegmentPostings {
        }
        // this is the last block of the segment posting list.
        // If it is actually loaded, we can compute block max manually.
-        if self.block_is_loaded() {
+        if self.block_loaded {
            let docs = self.doc_decoder.output_array().iter().cloned();
            let freqs = self.freq_decoder.output_array().iter().cloned();
            let bm25_scores = docs.zip(freqs).map(|(doc, term_freq)| {
@@ -177,112 +247,25 @@ impl BlockSegmentPostings {
        // We do not cache it however, so that it gets computed when once block is loaded.
        bm25_weight.max_score()
    }
+}

-    pub(crate) fn freq_reading_option(&self) -> FreqReadingOption {
-        self.freq_reading_option
-    }
-
-    // Resets the block segment postings on another position
-    // in the postings file.
-    //
-    // This is useful for enumerating through a list of terms,
-    // and consuming the associated posting lists while avoiding
-    // reallocating a `BlockSegmentPostings`.
-    //
-    // # Warning
-    //
-    // This does not reset the positions list.
-    pub(crate) fn reset(&mut self, doc_freq: u32, postings_data: OwnedBytes) -> io::Result<()> {
-        let (skip_data_opt, postings_data) =
-            split_into_skips_and_postings(doc_freq, postings_data)?;
-        self.data = postings_data;
-        self.block_max_score_cache = None;
-        self.block_loaded = false;
-        if let Some(skip_data) = skip_data_opt {
-            self.skip_reader.reset(skip_data, doc_freq);
-        } else {
-            self.skip_reader.reset(OwnedBytes::empty(), doc_freq);
+impl BlockSegmentPostings {
+    /// Returns an empty segment postings object
+    pub fn empty() -> BlockSegmentPostings {
+        BlockSegmentPostings {
+            doc_decoder: BlockDecoder::with_val(TERMINATED),
+            block_loaded: true,
+            freq_decoder: BlockDecoder::with_val(1),
+            freq_reading_option: FreqReadingOption::NoFreq,
+            block_max_score_cache: None,
+            doc_freq: 0,
+            data: OwnedBytes::empty(),
+            skip_reader: SkipReader::new(OwnedBytes::empty(), 0, IndexRecordOption::Basic),
        }
-        self.doc_freq = doc_freq;
-        self.load_block();
-        Ok(())
    }

-    /// Returns the overall number of documents in the block postings.
-    /// It does not take in account whether documents are deleted or not.
-    ///
-    /// This `doc_freq` is simply the sum of the length of all of the blocks
-    /// length, and it does not take in account deleted documents.
-    pub fn doc_freq(&self) -> u32 {
-        self.doc_freq
-    }
-
-    /// Returns the array of docs in the current block.
-    ///
-    /// Before the first call to `.advance()`, the block
-    /// returned by `.docs()` is empty.
-    #[inline]
-    pub fn docs(&self) -> &[DocId] {
-        debug_assert!(self.block_is_loaded());
-        self.doc_decoder.output_array()
-    }
-
-    /// Return the document at index `idx` of the block.
-    #[inline]
-    pub fn doc(&self, idx: usize) -> u32 {
-        self.doc_decoder.output(idx)
-    }
-
-    /// Return the array of `term freq` in the block.
-    #[inline]
-    pub fn freqs(&self) -> &[u32] {
-        debug_assert!(self.block_is_loaded());
-        self.freq_decoder.output_array()
-    }
-
-    /// Return the frequency at index `idx` of the block.
-    #[inline]
-    pub fn freq(&self, idx: usize) -> u32 {
-        debug_assert!(self.block_is_loaded());
-        self.freq_decoder.output(idx)
-    }
-
-    /// Returns the length of the current block.
-    ///
-    /// All blocks have a length of `NUM_DOCS_PER_BLOCK`,
-    /// except the last block that may have a length
-    /// of any number between 1 and `NUM_DOCS_PER_BLOCK - 1`
-    #[inline]
-    pub fn block_len(&self) -> usize {
-        debug_assert!(self.block_is_loaded());
-        self.doc_decoder.output_len
-    }
-
-    /// Position on a block that may contains `target_doc`.
-    ///
-    /// If all docs are smaller than target, the block loaded may be empty,
-    /// or be the last an incomplete VInt block.
-    pub fn seek(&mut self, target_doc: DocId) -> usize {
-        // Move to the block that might contain our document.
-        self.seek_block(target_doc);
-        self.load_block();
-
-        // At this point we are on the block that might contain our document.
-        let doc = self.doc_decoder.seek_within_block(target_doc);
-
-        // The last block is not full and padded with TERMINATED,
-        // so we are guaranteed to have at least one value (real or padding)
-        // that is >= target_doc.
-        debug_assert!(doc < COMPRESSION_BLOCK_SIZE);
-
-        // `doc` is now the first element >= `target_doc`.
-        // If all docs are smaller than target, the current block is incomplete and padded
-        // with TERMINATED. After the search, the cursor points to the first TERMINATED.
-        doc
-    }
-
-    pub(crate) fn position_offset(&self) -> u64 {
-        self.skip_reader.position_offset()
+    pub(crate) fn skip_reader(&self) -> &SkipReader {
+        &self.skip_reader
    }

    /// Dangerous API! This calls seeks the next block on the skip list,
@@ -291,22 +274,18 @@ impl BlockSegmentPostings {
    /// `.load_block()` needs to be called manually afterwards.
    /// If all docs are smaller than target, the block loaded may be empty,
    /// or be the last an incomplete VInt block.
-    pub(crate) fn seek_block(&mut self, target_doc: DocId) {
+    pub(crate) fn seek_block_without_loading(&mut self, target_doc: DocId) {
        if self.skip_reader.seek(target_doc) {
            self.block_max_score_cache = None;
            self.block_loaded = false;
        }
    }

-    pub(crate) fn block_is_loaded(&self) -> bool {
-        self.block_loaded
-    }
-
    pub(crate) fn load_block(&mut self) {
-        let offset = self.skip_reader.byte_offset();
-        if self.block_is_loaded() {
+        if self.block_loaded {
            return;
        }
+        let offset = self.skip_reader.byte_offset();
        match self.skip_reader.block_info() {
            BlockInfo::BitPacked {
                doc_num_bits,
@@ -351,68 +330,39 @@ impl BlockSegmentPostings {
        }
        self.block_loaded = true;
    }
-
-    /// Advance to the next block.
-    pub fn advance(&mut self) {
-        self.skip_reader.advance();
-        self.block_loaded = false;
-        self.block_max_score_cache = None;
-        self.load_block();
-    }
-
-    /// Returns an empty segment postings object
-    pub fn empty() -> BlockSegmentPostings {
-        BlockSegmentPostings {
-            doc_decoder: BlockDecoder::with_val(TERMINATED),
-            block_loaded: true,
-            freq_decoder: BlockDecoder::with_val(1),
-            freq_reading_option: FreqReadingOption::NoFreq,
-            block_max_score_cache: None,
-            doc_freq: 0,
-            data: OwnedBytes::empty(),
-            skip_reader: SkipReader::new(OwnedBytes::empty(), 0, IndexRecordOption::Basic),
-        }
-    }
-
-    pub(crate) fn skip_reader(&self) -> &SkipReader {
-        &self.skip_reader
-    }
 }

 #[cfg(test)]
 mod tests {
-    use common::HasLen;
+    use common::OwnedBytes;

    use super::BlockSegmentPostings;
+    use crate::codec::standard::postings::segment_postings::SegmentPostings;
    use crate::docset::{DocSet, TERMINATED};
-    use crate::index::Index;
    use crate::postings::compression::COMPRESSION_BLOCK_SIZE;
-    use crate::postings::postings::Postings;
-    use crate::postings::SegmentPostings;
-    use crate::schema::{IndexRecordOption, Schema, Term, INDEXED};
-    use crate::DocId;
+    use crate::postings::serializer::PostingsSerializer;
+    use crate::schema::IndexRecordOption;

-    #[test]
-    fn test_empty_segment_postings() {
-        let mut postings = SegmentPostings::empty();
-        assert_eq!(postings.doc(), TERMINATED);
-        assert_eq!(postings.advance(), TERMINATED);
-        assert_eq!(postings.advance(), TERMINATED);
-        assert_eq!(postings.doc_freq(), 0);
-        assert_eq!(postings.len(), 0);
-    }
-
-    #[test]
-    fn test_empty_postings_doc_returns_terminated() {
-        let mut postings = SegmentPostings::empty();
-        assert_eq!(postings.doc(), TERMINATED);
-        assert_eq!(postings.advance(), TERMINATED);
-    }
-
-    #[test]
-    fn test_empty_postings_doc_term_freq_returns_0() {
-        let postings = SegmentPostings::empty();
-        assert_eq!(postings.term_freq(), 1);
+    #[cfg(test)]
+    fn build_block_postings(docs: &[u32]) -> BlockSegmentPostings {
+        let doc_freq = docs.len() as u32;
+        let mut postings_serializer =
+            PostingsSerializer::new(1.0f32, IndexRecordOption::Basic, None);
+        postings_serializer.new_term(docs.len() as u32, false);
+        for doc in docs {
+            postings_serializer.write_doc(*doc, 1u32);
+        }
+        let mut buffer: Vec<u8> = Vec::new();
+        postings_serializer
+            .close_term(doc_freq, &mut buffer)
+            .unwrap();
+        BlockSegmentPostings::open(
+            doc_freq,
+            OwnedBytes::new(buffer),
+            IndexRecordOption::Basic,
+            IndexRecordOption::Basic,
+        )
+        .unwrap()
    }

    #[test]
@@ -427,7 +377,7 @@ mod tests {

    #[test]
    fn test_block_segment_postings() -> crate::Result<()> {
-        let mut block_segments = build_block_postings(&(0..100_000).collect::<Vec<u32>>())?;
+        let mut block_segments = build_block_postings(&(0..100_000).collect::<Vec<u32>>());
        let mut offset: u32 = 0u32;
        // checking that the `doc_freq` is correct
        assert_eq!(block_segments.doc_freq(), 100_000);
@@ -452,7 +402,7 @@ mod tests {
        doc_ids.push(129);
        doc_ids.push(130);
        {
-            let block_segments = build_block_postings(&doc_ids)?;
+            let block_segments = build_block_postings(&doc_ids);
            let mut docset = SegmentPostings::from_block_postings(block_segments, None);
            assert_eq!(docset.seek(128), 129);
            assert_eq!(docset.doc(), 129);
@@ -461,7 +411,7 @@ mod tests {
            assert_eq!(docset.advance(), TERMINATED);
        }
        {
-            let block_segments = build_block_postings(&doc_ids).unwrap();
+            let block_segments = build_block_postings(&doc_ids);
            let mut docset = SegmentPostings::from_block_postings(block_segments, None);
            assert_eq!(docset.seek(129), 129);
            assert_eq!(docset.doc(), 129);
@@ -470,7 +420,7 @@ mod tests {
            assert_eq!(docset.advance(), TERMINATED);
        }
        {
-            let block_segments = build_block_postings(&doc_ids)?;
+            let block_segments = build_block_postings(&doc_ids);
            let mut docset = SegmentPostings::from_block_postings(block_segments, None);
            assert_eq!(docset.doc(), 0);
            assert_eq!(docset.seek(131), TERMINATED);
@@ -479,38 +429,13 @@ mod tests {
        Ok(())
    }

-    fn build_block_postings(docs: &[DocId]) -> crate::Result<BlockSegmentPostings> {
-        let mut schema_builder = Schema::builder();
-        let int_field = schema_builder.add_u64_field("id", INDEXED);
-        let schema = schema_builder.build();
-        let index = Index::create_in_ram(schema);
-        let mut index_writer = index.writer_for_tests()?;
-        let mut last_doc = 0u32;
-        for &doc in docs {
-            for _ in last_doc..doc {
-                index_writer.add_document(doc!(int_field=>1u64))?;
-            }
-            index_writer.add_document(doc!(int_field=>0u64))?;
-            last_doc = doc + 1;
-        }
-        index_writer.commit()?;
-        let searcher = index.reader()?.searcher();
-        let segment_reader = searcher.segment_reader(0);
-        let inverted_index = segment_reader.inverted_index(int_field).unwrap();
-        let term = Term::from_field_u64(int_field, 0u64);
-        let term_info = inverted_index.get_term_info(&term)?.unwrap();
-        let block_postings = inverted_index
-            .read_block_postings_from_terminfo(&term_info, IndexRecordOption::Basic)?;
-        Ok(block_postings)
-    }
-
    #[test]
    fn test_block_segment_postings_seek() -> crate::Result<()> {
-        let mut docs = vec![0];
+        let mut docs = Vec::new();
        for i in 0..1300 {
            docs.push((i * i / 100) + i);
        }
-        let mut block_postings = build_block_postings(&docs[..])?;
+        let mut block_postings = build_block_postings(&docs[..]);
        for i in &[0, 424, 10000] {
            block_postings.seek(*i);
            let docs = block_postings.docs();
@@ -521,40 +446,4 @@ mod tests {
        assert_eq!(block_postings.doc(COMPRESSION_BLOCK_SIZE - 1), TERMINATED);
        Ok(())
    }
-
-    #[test]
-    fn test_reset_block_segment_postings() -> crate::Result<()> {
-        let mut schema_builder = Schema::builder();
-        let int_field = schema_builder.add_u64_field("id", INDEXED);
-        let schema = schema_builder.build();
-        let index = Index::create_in_ram(schema);
-        let mut index_writer = index.writer_for_tests()?;
-        // create two postings list, one containing even number,
-        // the other containing odd numbers.
-        for i in 0..6 {
-            let doc = doc!(int_field=> (i % 2) as u64);
-            index_writer.add_document(doc)?;
-        }
-        index_writer.commit()?;
-        let searcher = index.reader()?.searcher();
-        let segment_reader = searcher.segment_reader(0);
-
-        let mut block_segments;
-        {
-            let term = Term::from_field_u64(int_field, 0u64);
-            let inverted_index = segment_reader.inverted_index(int_field)?;
-            let term_info = inverted_index.get_term_info(&term)?.unwrap();
-            block_segments = inverted_index
-                .read_block_postings_from_terminfo(&term_info, IndexRecordOption::Basic)?;
-        }
-        assert_eq!(block_segments.docs(), &[0, 2, 4]);
-        {
-            let term = Term::from_field_u64(int_field, 1u64);
-            let inverted_index = segment_reader.inverted_index(int_field)?;
-            let term_info = inverted_index.get_term_info(&term)?.unwrap();
-            inverted_index.reset_block_postings_from_terminfo(&term_info, &mut block_segments)?;
-        }
-        assert_eq!(block_segments.docs(), &[1, 3, 5]);
-        Ok(())
-    }
 }
--- a/src/codec/standard/postings/mod.rs
+++ b/src/codec/standard/postings/mod.rs
@@ -0,0 +1,171 @@
+use std::io;
+
+use common::BitSet;
+
+use crate::codec::postings::block_wand::{block_wand, block_wand_single_scorer};
+use crate::codec::postings::{PostingsCodec, RawPostingsData};
+use crate::codec::standard::postings::block_segment_postings::BlockSegmentPostings;
+pub use crate::codec::standard::postings::segment_postings::SegmentPostings;
+use crate::positions::PositionReader;
+use crate::query::term_query::TermScorer;
+use crate::query::{BufferedUnionScorer, Scorer, SumCombiner};
+use crate::{DocSet as _, Score, TERMINATED};
+
+mod block_segment_postings;
+mod segment_postings;
+
+pub use segment_postings::SegmentPostings as StandardPostings;
+
+/// The default postings codec for tantivy.
+pub struct StandardPostingsCodec;
+
+#[expect(clippy::enum_variant_names)]
+#[derive(Debug, PartialEq, Clone, Copy, Eq)]
+pub(crate) enum FreqReadingOption {
+    NoFreq,
+    SkipFreq,
+    ReadFreq,
+}
+
+impl PostingsCodec for StandardPostingsCodec {
+    type Postings = SegmentPostings;
+
+    fn load_postings(
+        &self,
+        doc_freq: u32,
+        postings_data: RawPostingsData,
+    ) -> io::Result<Self::Postings> {
+        load_postings_from_raw_data(doc_freq, postings_data)
+    }
+
+    fn try_accelerated_for_each_pruning(
+        mut threshold: Score,
+        mut scorer: Box<dyn Scorer>,
+        callback: &mut dyn FnMut(crate::DocId, Score) -> Score,
+    ) -> Result<(), Box<dyn Scorer>> {
+        scorer = match scorer.downcast::<TermScorer<Self::Postings>>() {
+            Ok(term_scorer) => {
+                block_wand_single_scorer(*term_scorer, threshold, callback);
+                return Ok(());
+            }
+            Err(scorer) => scorer,
+        };
+        let mut union_scorer =
+            scorer.downcast::<BufferedUnionScorer<TermScorer<Self::Postings>, SumCombiner>>()?;
+        let doc = union_scorer.doc();
+        if doc == TERMINATED {
+            return Ok(());
+        }
+        let score = union_scorer.score();
+        if score > threshold {
+            threshold = callback(doc, score);
+        }
+        let scorers: Vec<TermScorer<Self::Postings>> = union_scorer.into_scorers();
+        block_wand(scorers, threshold, callback);
+        Ok(())
+    }
+}
+pub(crate) fn load_postings_from_raw_data(
+    doc_freq: u32,
+    postings_data: RawPostingsData,
+) -> io::Result<SegmentPostings> {
+    let RawPostingsData {
+        postings_data,
+        positions_data: positions_data_opt,
+        record_option,
+        effective_option,
+    } = postings_data;
+    let requested_option = effective_option;
+    let block_segment_postings =
+        BlockSegmentPostings::open(doc_freq, postings_data, record_option, requested_option)?;
+    let position_reader = positions_data_opt.map(PositionReader::open).transpose()?;
+    Ok(SegmentPostings::from_block_postings(
+        block_segment_postings,
+        position_reader,
+    ))
+}
+
+pub(crate) fn fill_bitset_from_raw_data(
+    doc_freq: u32,
+    postings_data: RawPostingsData,
+    doc_bitset: &mut BitSet,
+) -> io::Result<()> {
+    let RawPostingsData {
+        postings_data,
+        record_option,
+        effective_option,
+        ..
+    } = postings_data;
+    let mut block_postings =
+        BlockSegmentPostings::open(doc_freq, postings_data, record_option, effective_option)?;
+    loop {
+        let docs = block_postings.docs();
+        if docs.is_empty() {
+            break;
+        }
+        for &doc in docs {
+            doc_bitset.insert(doc);
+        }
+        block_postings.advance();
+    }
+    Ok(())
+}
+
+#[cfg(test)]
+mod tests {
+    use common::OwnedBytes;
+
+    use super::*;
+    use crate::postings::serializer::PostingsSerializer;
+    use crate::postings::Postings as _;
+    use crate::schema::IndexRecordOption;
+
+    fn test_segment_postings_tf_aux(num_docs: u32, include_term_freq: bool) -> SegmentPostings {
+        let mut postings_serializer =
+            PostingsSerializer::new(1.0f32, IndexRecordOption::WithFreqs, None);
+        let mut buffer = Vec::new();
+        postings_serializer.new_term(num_docs, include_term_freq);
+        for i in 0..num_docs {
+            postings_serializer.write_doc(i, 2);
+        }
+        postings_serializer
+            .close_term(num_docs, &mut buffer)
+            .unwrap();
+        load_postings_from_raw_data(
+            num_docs,
+            RawPostingsData {
+                postings_data: OwnedBytes::new(buffer),
+                positions_data: None,
+                record_option: IndexRecordOption::WithFreqs,
+                effective_option: IndexRecordOption::WithFreqs,
+            },
+        )
+        .unwrap()
+    }
+
+    #[test]
+    fn test_segment_postings_small_block_with_and_without_freq() {
+        let small_block_without_term_freq = test_segment_postings_tf_aux(1, false);
+        assert!(!small_block_without_term_freq.has_freq());
+        assert_eq!(small_block_without_term_freq.doc(), 0);
+        assert_eq!(small_block_without_term_freq.term_freq(), 1);
+
+        let small_block_with_term_freq = test_segment_postings_tf_aux(1, true);
+        assert!(small_block_with_term_freq.has_freq());
+        assert_eq!(small_block_with_term_freq.doc(), 0);
+        assert_eq!(small_block_with_term_freq.term_freq(), 2);
+    }
+
+    #[test]
+    fn test_segment_postings_large_block_with_and_without_freq() {
+        let large_block_without_term_freq = test_segment_postings_tf_aux(128, false);
+        assert!(!large_block_without_term_freq.has_freq());
+        assert_eq!(large_block_without_term_freq.doc(), 0);
+        assert_eq!(large_block_without_term_freq.term_freq(), 1);
+
+        let large_block_with_term_freq = test_segment_postings_tf_aux(128, true);
+        assert!(large_block_with_term_freq.has_freq());
+        assert_eq!(large_block_with_term_freq.doc(), 0);
+        assert_eq!(large_block_with_term_freq.term_freq(), 2);
+    }
+}
--- a/src/codec/standard/postings/segment_postings.rs
+++ b/src/codec/standard/postings/segment_postings.rs
@@ -1,11 +1,14 @@
-use common::HasLen;
+use common::BitSet;

+use super::BlockSegmentPostings;
+use crate::codec::postings::PostingsWithBlockMax;
 use crate::docset::DocSet;
-use crate::fastfield::AliveBitSet;
+use crate::fieldnorm::FieldNormReader;
 use crate::positions::PositionReader;
 use crate::postings::compression::COMPRESSION_BLOCK_SIZE;
-use crate::postings::{BlockSegmentPostings, Postings};
-use crate::{DocId, TERMINATED};
+use crate::postings::{DocFreq, Postings};
+use crate::query::Bm25Weight;
+use crate::{DocId, Score};

 /// `SegmentPostings` represents the inverted list or postings associated with
 /// a term in a `Segment`.
@@ -29,31 +32,6 @@ impl SegmentPostings {
        }
    }

-    /// Compute the number of non-deleted documents.
-    ///
-    /// This method will clone and scan through the posting lists.
-    /// (this is a rather expensive operation).
-    pub fn doc_freq_given_deletes(&self, alive_bitset: &AliveBitSet) -> u32 {
-        let mut docset = self.clone();
-        let mut doc_freq = 0;
-        loop {
-            let doc = docset.doc();
-            if doc == TERMINATED {
-                return doc_freq;
-            }
-            if alive_bitset.is_alive(doc) {
-                doc_freq += 1u32;
-            }
-            docset.advance();
-        }
-    }
-
-    /// Returns the overall number of documents in the block postings.
-    /// It does not take in account whether documents are deleted or not.
-    pub fn doc_freq(&self) -> u32 {
-        self.block_cursor.doc_freq()
-    }
-
    /// Creates a segment postings object with the given documents
    /// and no frequency encoded.
    ///
@@ -64,24 +42,26 @@ impl SegmentPostings {
    /// buffer with the serialized data.
    #[cfg(test)]
    pub fn create_from_docs(docs: &[u32]) -> SegmentPostings {
-        use crate::directory::FileSlice;
-        use crate::postings::serializer::PostingsSerializer;
+        use common::OwnedBytes;
+
        use crate::schema::IndexRecordOption;
        let mut buffer = Vec::new();
        {
+            use crate::postings::serializer::PostingsSerializer;
+
            let mut postings_serializer =
-                PostingsSerializer::new(&mut buffer, 0.0, IndexRecordOption::Basic, None);
+                PostingsSerializer::new(0.0, IndexRecordOption::Basic, None);
            postings_serializer.new_term(docs.len() as u32, false);
            for &doc in docs {
                postings_serializer.write_doc(doc, 1u32);
            }
            postings_serializer
-                .close_term(docs.len() as u32)
+                .close_term(docs.len() as u32, &mut buffer)
                .expect("In memory Serialization should never fail.");
        }
        let block_segment_postings = BlockSegmentPostings::open(
            docs.len() as u32,
-            FileSlice::from(buffer),
+            OwnedBytes::new(buffer),
            IndexRecordOption::Basic,
            IndexRecordOption::Basic,
        )
@@ -95,7 +75,8 @@ impl SegmentPostings {
        doc_and_tfs: &[(u32, u32)],
        fieldnorms: Option<&[u32]>,
    ) -> SegmentPostings {
-        use crate::directory::FileSlice;
+        use common::OwnedBytes;
+
        use crate::fieldnorm::FieldNormReader;
        use crate::postings::serializer::PostingsSerializer;
        use crate::schema::IndexRecordOption;
@@ -115,7 +96,6 @@ impl SegmentPostings {
            })
            .unwrap_or(0.0);
        let mut postings_serializer = PostingsSerializer::new(
-            &mut buffer,
            average_field_norm,
            IndexRecordOption::WithFreqs,
            fieldnorm_reader,
@@ -125,11 +105,11 @@ impl SegmentPostings {
            postings_serializer.write_doc(doc, tf);
        }
        postings_serializer
-            .close_term(doc_and_tfs.len() as u32)
+            .close_term(doc_and_tfs.len() as u32, &mut buffer)
            .unwrap();
        let block_segment_postings = BlockSegmentPostings::open(
            doc_and_tfs.len() as u32,
-            FileSlice::from(buffer),
+            OwnedBytes::new(buffer),
            IndexRecordOption::WithFreqs,
            IndexRecordOption::WithFreqs,
        )
@@ -159,7 +139,6 @@ impl DocSet for SegmentPostings {
    // next needs to be called a first time to point to the correct element.
    #[inline]
    fn advance(&mut self) -> DocId {
-        debug_assert!(self.block_cursor.block_is_loaded());
        if self.cur == COMPRESSION_BLOCK_SIZE - 1 {
            self.cur = 0;
            self.block_cursor.advance();
@@ -169,12 +148,20 @@ impl DocSet for SegmentPostings {
        self.doc()
    }

+    #[inline]
    fn seek(&mut self, target: DocId) -> DocId {
        debug_assert!(self.doc() <= target);
        if self.doc() >= target {
            return self.doc();
        }

+        // As an optimization, if the block is already loaded, we can
+        // cheaply check the next doc.
+        self.cur = (self.cur + 1).min(COMPRESSION_BLOCK_SIZE - 1);
+        if self.doc() >= target {
+            return self.doc();
+        }
+
        // Delegate block-local search to BlockSegmentPostings::seek, which returns
        // the in-block index of the first doc >= target.
        self.cur = self.block_cursor.seek(target);
@@ -190,13 +177,31 @@ impl DocSet for SegmentPostings {
    }

    fn size_hint(&self) -> u32 {
-        self.len() as u32
+        self.doc_freq().into()
    }
-}

-impl HasLen for SegmentPostings {
-    fn len(&self) -> usize {
-        self.block_cursor.doc_freq() as usize
+    fn fill_bitset(&mut self, bitset: &mut BitSet) {
+        let bitset_max_value: DocId = bitset.max_value();
+        loop {
+            let docs = self.block_cursor.docs();
+            let Some(&last_doc) = docs.last() else {
+                break;
+            };
+            if last_doc < bitset_max_value {
+                // All docs are within the range of the bitset
+                for &doc in docs {
+                    bitset.insert(doc);
+                }
+            } else {
+                for &doc in docs {
+                    if doc < bitset_max_value {
+                        bitset.insert(doc);
+                    }
+                }
+                break;
+            }
+            self.block_cursor.advance();
+        }
    }
 }

@@ -222,6 +227,13 @@ impl Postings for SegmentPostings {
        self.block_cursor.freq(self.cur)
    }

+    /// Returns the overall number of documents in the block postings.
+    /// It does not take in account whether documents are deleted or not.
+    #[inline(always)]
+    fn doc_freq(&self) -> DocFreq {
+        DocFreq::Exact(self.block_cursor.doc_freq())
+    }
+
    fn append_positions_with_offset(&mut self, offset: u32, output: &mut Vec<u32>) {
        let term_freq = self.term_freq();
        let prev_len = output.len();
@@ -245,24 +257,44 @@ impl Postings for SegmentPostings {
            }
        }
    }
+
+    fn has_freq(&self) -> bool {
+        !self.block_cursor.freqs().is_empty()
+    }
+}
+
+impl PostingsWithBlockMax for SegmentPostings {
+    #[inline]
+    fn seek_block_max(
+        &mut self,
+        target_doc: crate::DocId,
+        fieldnorm_reader: &FieldNormReader,
+        similarity_weight: &Bm25Weight,
+    ) -> Score {
+        self.block_cursor.seek_block_without_loading(target_doc);
+        self.block_cursor
+            .block_max_score(fieldnorm_reader, similarity_weight)
+    }
+
+    #[inline]
+    fn last_doc_in_block(&self) -> crate::DocId {
+        self.block_cursor.skip_reader().last_doc_in_block()
+    }
 }

 #[cfg(test)]
 mod tests {
-
-    use common::HasLen;
-
    use super::SegmentPostings;
    use crate::docset::{DocSet, TERMINATED};
-    use crate::fastfield::AliveBitSet;
-    use crate::postings::postings::Postings;
+    use crate::postings::Postings;

    #[test]
    fn test_empty_segment_postings() {
        let mut postings = SegmentPostings::empty();
+        assert_eq!(postings.doc(), TERMINATED);
        assert_eq!(postings.advance(), TERMINATED);
        assert_eq!(postings.advance(), TERMINATED);
-        assert_eq!(postings.len(), 0);
+        assert_eq!(postings.doc_freq(), crate::postings::DocFreq::Exact(0));
    }

    #[test]
@@ -277,15 +309,4 @@ mod tests {
        let postings = SegmentPostings::empty();
        assert_eq!(postings.term_freq(), 1);
    }
-
-    #[test]
-    fn test_doc_freq() {
-        let docs = SegmentPostings::create_from_docs(&[0, 2, 10]);
-        assert_eq!(docs.doc_freq(), 3);
-        let alive_bitset = AliveBitSet::for_test_from_deleted_docs(&[2], 12);
-        assert_eq!(docs.doc_freq_given_deletes(&alive_bitset), 2);
-        let all_deleted =
-            AliveBitSet::for_test_from_deleted_docs(&[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11], 12);
-        assert_eq!(docs.doc_freq_given_deletes(&all_deleted), 0);
-    }
 }
--- a/src/collector/count_collector.rs
+++ b/src/collector/count_collector.rs
@@ -43,7 +43,7 @@ impl Collector for Count {
    fn for_segment(
        &self,
        _: SegmentOrdinal,
-        _: &SegmentReader,
+        _: &dyn SegmentReader,
    ) -> crate::Result<SegmentCountCollector> {
        Ok(SegmentCountCollector::default())
    }
--- a/src/collector/custom_score_top_collector.rs
+++ b/src/collector/custom_score_top_collector.rs
@@ -1,121 +0,0 @@
-use crate::collector::top_collector::{TopCollector, TopSegmentCollector};
-use crate::collector::{Collector, SegmentCollector};
-use crate::{DocAddress, DocId, Score, SegmentReader};
-
-pub(crate) struct CustomScoreTopCollector<TCustomScorer, TScore = Score> {
-    custom_scorer: TCustomScorer,
-    collector: TopCollector<TScore>,
-}
-
-impl<TCustomScorer, TScore> CustomScoreTopCollector<TCustomScorer, TScore>
-where TScore: Clone + PartialOrd
-{
-    pub(crate) fn new(
-        custom_scorer: TCustomScorer,
-        collector: TopCollector<TScore>,
-    ) -> CustomScoreTopCollector<TCustomScorer, TScore> {
-        CustomScoreTopCollector {
-            custom_scorer,
-            collector,
-        }
-    }
-}
-
-/// A custom segment scorer makes it possible to define any kind of score
-/// for a given document belonging to a specific segment.
-///
-/// It is the segment local version of the [`CustomScorer`].
-pub trait CustomSegmentScorer<TScore>: 'static {
-    /// Computes the score of a specific `doc`.
-    fn score(&mut self, doc: DocId) -> TScore;
-}
-
-/// `CustomScorer` makes it possible to define any kind of score.
-///
-/// The `CustomerScorer` itself does not make much of the computation itself.
-/// Instead, it helps constructing `Self::Child` instances that will compute
-/// the score at a segment scale.
-pub trait CustomScorer<TScore>: Sync {
-    /// Type of the associated [`CustomSegmentScorer`].
-    type Child: CustomSegmentScorer<TScore>;
-    /// Builds a child scorer for a specific segment. The child scorer is associated with
-    /// a specific segment.
-    fn segment_scorer(&self, segment_reader: &SegmentReader) -> crate::Result<Self::Child>;
-}
-
-impl<TCustomScorer, TScore> Collector for CustomScoreTopCollector<TCustomScorer, TScore>
-where
-    TCustomScorer: CustomScorer<TScore> + Send + Sync,
-    TScore: 'static + PartialOrd + Clone + Send + Sync,
-{
-    type Fruit = Vec<(TScore, DocAddress)>;
-
-    type Child = CustomScoreTopSegmentCollector<TCustomScorer::Child, TScore>;
-
-    fn for_segment(
-        &self,
-        segment_local_id: u32,
-        segment_reader: &SegmentReader,
-    ) -> crate::Result<Self::Child> {
-        let segment_collector = self.collector.for_segment(segment_local_id, segment_reader);
-        let segment_scorer = self.custom_scorer.segment_scorer(segment_reader)?;
-        Ok(CustomScoreTopSegmentCollector {
-            segment_collector,
-            segment_scorer,
-        })
-    }
-
-    fn requires_scoring(&self) -> bool {
-        false
-    }
-
-    fn merge_fruits(&self, segment_fruits: Vec<Self::Fruit>) -> crate::Result<Self::Fruit> {
-        self.collector.merge_fruits(segment_fruits)
-    }
-}
-
-pub struct CustomScoreTopSegmentCollector<T, TScore>
-where
-    TScore: 'static + PartialOrd + Clone + Send + Sync + Sized,
-    T: CustomSegmentScorer<TScore>,
-{
-    segment_collector: TopSegmentCollector<TScore>,
-    segment_scorer: T,
-}
-
-impl<T, TScore> SegmentCollector for CustomScoreTopSegmentCollector<T, TScore>
-where
-    TScore: 'static + PartialOrd + Clone + Send + Sync,
-    T: 'static + CustomSegmentScorer<TScore>,
-{
-    type Fruit = Vec<(TScore, DocAddress)>;
-
-    fn collect(&mut self, doc: DocId, _score: Score) {
-        let score = self.segment_scorer.score(doc);
-        self.segment_collector.collect(doc, score);
-    }
-
-    fn harvest(self) -> Vec<(TScore, DocAddress)> {
-        self.segment_collector.harvest()
-    }
-}
-
-impl<F, TScore, T> CustomScorer<TScore> for F
-where
-    F: 'static + Send + Sync + Fn(&SegmentReader) -> T,
-    T: CustomSegmentScorer<TScore>,
-{
-    type Child = T;
-
-    fn segment_scorer(&self, segment_reader: &SegmentReader) -> crate::Result<Self::Child> {
-        Ok((self)(segment_reader))
-    }
-}
-
-impl<F, TScore> CustomSegmentScorer<TScore> for F
-where F: 'static + FnMut(DocId) -> TScore
-{
-    fn score(&mut self, doc: DocId) -> TScore {
-        (self)(doc)
-    }
-}
--- a/Show More
+++ b/Show More
Author	SHA1	Message	Date
Pascal Seitz	0bdec77410	pub method on Term	2026-02-24 13:31:51 +01:00
Pascal Seitz	1a1c29c785	allow Searcher to be constructed without index	2026-02-17 17:56:33 +01:00
Pascal Seitz	8a16afa2f1	add to_json->Value method	2026-02-17 13:21:23 +01:00
Pascal Seitz	e841cebba4	convert StoreReader to trait this will remove the DocumentDeserialize (maybe added later in a different form)	2026-02-16 17:33:49 +01:00
Pascal Seitz	05f255b757	add async methods for quickwit	2026-02-16 10:32:32 +01:00
Pascal Seitz	e6318e1591	add comments, remove fieldnorms	2026-02-12 14:13:33 +01:00
Pascal Seitz	70bb97231b	remove fieldnorms_readers	2026-02-11 19:40:45 +01:00
Paul Masurel	6038455761	First stab at tantivy's codec Convert SegmentReader, InvertedIndexReader and postinglists to traits. Add special functions to pushdown certain performance methods to keep them strictly typed. We rely on a ObjectSafeCodec contraption to avoid the proliferation of generics. That object's point is to make sure we can build TermScorer with a concrete codec specific type before reboxing it. (same thing for PhraseScorer). fix performance regression: fix incorrect scorer cast for buffered union bock wand	2026-02-11 15:11:29 +01:00
PSeitz	57fe659fff	make serializer pub (#2835 ) some changes on the posting list serializer to make it usable in other contexts. Improve errors Signed-off-by: Pascal Seitz <pascal.seitz@gmail.com>	2026-02-11 14:37:42 +01:00
trinity-1686a	5562ce6037	Merge pull request #2818 from Darkheir/fix/query_grammar_regex_between_parentheses	2026-02-11 11:39:58 +01:00
Metin Dumandag	09b6ececa7	Export fields of the PercentileValuesVecEntry (#2833 ) Otherwise, there is no way to access these fields when not using the json serialized form of the aggregation results. This simple data struct is part of the public api, so its fields should be accessible as well.	2026-02-11 11:31:07 +01:00
Moe	8018016e46	feat: add fast field support for Bytes type (#100 ) (#2830 ) ## What Enable range queries and TopN sorting on `Bytes` fast fields, bringing them to parity with `Str` fields. ## Why `BytesColumn` uses the same dictionary encoding as `StrColumn` internally, but range queries and TopN sorting were explicitly disabled for `Bytes`. This prevented use cases like storing lexicographically sortable binary data (e.g., arbitrary-precision decimals) that need efficient range filtering. ## How 1. Enable range queries for Bytes - Changed `is_type_valid_for_fastfield_range_query()` to return `true` for `Type::Bytes` 2. Add BytesColumn handling in scorer - Added a branch in `FastFieldRangeWeight::scorer()` to handle bytes fields using dictionary ordinal lookup (mirrors the existing `StrColumn` logic) 3. Add SortByBytes - New sort key computer for TopN queries on bytes columns ## Tests - `test_bytes_field_ff_range_query` - Tests inclusive/exclusive bounds and unbounded ranges - `test_sort_by_bytes_asc` / `test_sort_by_bytes_desc` - Tests lexicographic ordering in both directions	2026-02-11 11:26:18 +01:00
trinity-1686a	6bf185dc3f	Merge pull request #2829 from quickwit-oss/cong.xie/add-intermediate-accessors	2026-02-10 17:07:24 +01:00
cong.xie	bb141abe22	feat(aggregation): add keys() accessor to IntermediateAggregationResults	2026-02-09 15:38:35 -05:00
cong.xie	f1c29ba972	resolve conflcit	2026-02-06 14:23:11 -05:00
cong.xie	ae0554a6a5	feat(aggregation): add public accessors for intermediate aggregation results Add accessor methods to allow external crates to read intermediate aggregation results without accessing pub(crate) fields: - IntermediateAggregationResults: get(), remove() - IntermediateTermBucketResult: entries(), sum_other_doc_count(), doc_count_error_upper_bound() - IntermediateAverage: stats() - IntermediateStats: count(), sum() - IntermediateKey: Display impl for string conversion	2026-02-06 11:12:20 -05:00
cong.xie	0d7abe5d23	feat(aggregation): add public accessors for intermediate aggregation results Add accessor methods to allow external crates to read intermediate aggregation results without accessing pub(crate) fields: - IntermediateAggregationResults: get(), get_mut(), remove() - IntermediateTermBucketResult: entries(), sum_other_doc_count(), doc_count_error_upper_bound() - IntermediateAverage: stats() - IntermediateStats: count(), sum() - IntermediateKey: Display impl for string conversion	2026-02-06 10:28:59 -05:00
PSeitz	28db952131	Add regex search and merge segments benchmark (#2826 ) * add merge_segments benchmark * add regex search bench	2026-02-02 17:28:02 +01:00
PSeitz	98ebbf922d	faster exclude queries (#2825 ) * faster exclude queries Faster exclude queries with multiple terms. Changes `Exclude` to be able to exclude multiple DocSets, instead of putting the docsets into a union. Use `seek_danger` in `Exclude`. closes #2822 * replace unwrap with match	2026-01-30 17:06:41 +01:00
Paul Masurel	4a89e74597	Fix rfc3339 typos and add Claude Code skills (#2823 ) Closes #2817	2026-01-30 12:00:28 +01:00
Alex Lazar	4d99e51e50	Bump oneshot to 0.1.13 per dependabot (#2821 )	2026-01-30 11:42:01 +01:00
Darkheir	a55e4069e4	feat(query-grammar): Apply PR review suggestions Signed-off-by: Darkheir <raphael.cohen@sekoia.io>	2026-01-28 14:13:55 +01:00
Darkheir	1fd30c62be	fix(query-grammar): Fix regexes between parentheses Signed-off-by: Darkheir <raphael.cohen@sekoia.io>	2026-01-28 10:37:51 +01:00
trinity-1686a	9b619998bd	Merge pull request #2816 from evance-br/fix-closing-paren-elastic-range	2026-01-27 17:00:08 +01:00
Evance Soumaoro	765c448945	uncomment commented code when testing	2026-01-27 13:19:41 +00:00
Evance Soumaoro	943594ebaa	uncomment commented code when testing	2026-01-27 13:08:38 +00:00
Evance Soumaoro	df17daae0d	fix closing parenthesis error on elastic range queries for lenient parser	2026-01-27 13:01:14 +00:00
Paul Masurel	0ae94baef5	Remove temp file (#2815 ) Co-authored-by: Paul Masurel <paul.masurel@datadoghq.com>	2026-01-27 09:22:11 +01:00
Paul Masurel	3f448ecf79	Bugfix on intersection. (#2812 ) The intersection algorithm made it possible for .seek(..) with values lower than the current doc id, breaking the DocSet contract. The fix removes the optimization that caused left.seek(..) to be replaced by a simpler left.advance(..). Simply doing so lead to a performance regression. I therefore integrated that idea within SegmentPostings.seek. We now attempt to check the next doc systematically on seek, PROVIDED the block is already loaded. Closes #2811 Co-authored-by: Paul Masurel <paul.masurel@datadoghq.com>	2026-01-27 09:21:09 +01:00
Paul Masurel	b86caeefe2	Major bugfix in intersection A bug was added with the `seek_into_the_danger_zone()` optimization (Spotted and fixed by Stu) The contract says seek_into_the_danger_zone returns true if do is part of the docset. The blanket implementation goes like this. ``` let current_doc = self.doc(); if current_doc < target { self.seek(target); } self.doc() == target ``` So it will return true if target is TERMINATED, where really TERMINATED does not belong to the docset. The fix tries to clarify the contracts and fixes the intersection algorithm. We observe a small but all over the board improvement in intersection performance. --------- Co-authored-by: Stu Hood <stuhood@gmail.com> Co-authored-by: Paul Masurel <paul.masurel@datadoghq.com>	2026-01-23 18:44:10 +01:00
ChangRui-Ryan	abf1e64f4d	add benchmark for string search and get (#2795 )	2026-01-19 11:50:41 +01:00
trinity-1686a	12977bc7c4	upgrade some dependancies (#2802 ) including rand, which had a few breaking changes	2026-01-14 10:19:09 +01:00
trinity-1686a	0c94eb94c3	Merge pull request #2799 from jollygreenlaser/lru	2026-01-13 22:47:35 +01:00
Paul Masurel	c92e831dde	Minor refactoring in PostingsSerializer (#2801 ) Removes the Write generics argument in PostingsSerializer. This removes useless generic. Prepares the path for codecs. Removes one useless CountingWrite layer. etc. Co-authored-by: Paul Masurel <paul.masurel@datadoghq.com>	2026-01-12 13:53:43 +01:00
Alex Lazar	947c0d5f40	Bump lru to 0.16.3 per dependabot	2026-01-09 23:25:51 -08:00
Paul Masurel	d904630e6a	Bumped bitpacking version (#2797 ) Co-authored-by: Paul Masurel <paul.masurel@datadoghq.com>	2026-01-08 15:50:22 +01:00
PSeitz-dd	65b5a1a306	one collector per agg request instead per bucket (#2759 ) * improve bench * add more tests for new collection type * one collector per agg request instead per bucket In this refactoring a collector knows in which bucket of the parent their data is in. This allows to convert the previous approach of one collector per bucket to one collector per request. low card bucket optimization * reduce dynamic dispatch, faster term agg * use radix map, fix prepare_max_bucket use paged term map in term agg use special no sub agg term map impl * specialize columntype in stats * remove stacktrace bloat, use &mut helper increase cache to 2048 * cleanup remove clone move data in term req, single doc opt for stats * add comment * share column block accessor * simplify fetch block in column_block_accessor * split subaggcache into two trait impls * move partitions to heap * fix name, add comment --------- Co-authored-by: Pascal Seitz <pascal.seitz@gmail.com>	2026-01-06 11:50:55 +01:00
ChangRui-Ryan	db2ecc6057	fix Column.first method parameter type (#2792 )	2026-01-05 10:03:01 +01:00
Paul Masurel	77505c3d03	Making stemming optional. (#2791 ) Fixed code and CI to run on no default features. Co-authored-by: Paul Masurel <paul.masurel@datadoghq.com>	2026-01-02 12:40:42 +01:00
PSeitz	735c588f4f	fix union performance regression (#2790 ) * add inlines * fix union performance regression Remove unwrap from hotpath generates better assembly. closes #2788	2026-01-02 12:06:51 +01:00
PSeitz	242a1531bf	fix flaky test (#2784 ) Signed-off-by: Pascal Seitz <pascal.seitz@gmail.com>	2026-01-02 11:30:51 +01:00
trinity-1686a	6443b63177	document 1bit hole and some queries supporting running with just fastfield (#2779 ) * add small doc on some queries using fast field when not indexed * document 1 unused bit in skiplist	2026-01-02 10:32:37 +01:00
Stu Hood	4987495ee4	Add an erased `SortKeyComputer` to sort on types which are not known until runtime (#2770 ) * Remove PartialOrd bound on compared values. * Fix declared `SortKey` type of `impl<..> SortKeyComputer for (HeadSortKeyComputer, TailSortKeyComputer)` * Add a SortByOwnedValue implementation to provide a type-erased column. * Add support for comparing mismatched `OwnedValue` types. * Support JSON columns. * Refer to https://github.com/quickwit-oss/tantivy/issues/2776 * Rename to `SortByErasedType`. * Comment on transitivity. Co-authored-by: Paul Masurel <paul@quickwit.io> * Fix clippy warnings in new code. --------- Co-authored-by: Paul Masurel <paul@quickwit.io>	2026-01-02 10:28:47 +01:00
Paul Masurel	b11605f045	Addressing clippy comments (#2789 ) Co-authored-by: Paul Masurel <paul.masurel@datadoghq.com>	2025-12-31 18:02:00 +01:00
ChangRui-Ryan	75d7989cc6	add benchmark for boolean query with range sub query (#2787 )	2025-12-31 12:00:53 +01:00
PSeitz	923f0508f2	seek_exact + cost based intersection (#2538 ) * seek_exact + cost based intersection Adds `seek_exact` and `cost` to `DocSet` for a more efficient intersection. Unlike `seek`, `seek_exact` does not require the DocSet to advance to the next hit, if the target does not exist. `cost` allows to address the different DocSet types and their cost model and is used to determine the DocSet that drives the intersection. E.g. fast field range queries may do a full scan. Phrase queries load the positions to check if a we have a hit. They both have a higher cost than their size_hint would suggest. Improves `size_hint` estimation for intersection and union, by having a estimation based on random distribution with a co-location factor. Refactor range query benchmark. Closes #2531 Future Work Implement `seek_exact` for BufferedUnionScorer and RangeDocSet (fast field range queries) Evaluate replacing `seek` with `seek_exact` to reduce code complexity * Apply suggestions from code review Co-authored-by: Paul Masurel <paul@quickwit.io> * add API contract verfication * impl seek_exact on union * rename seek_exact * add mixed AND OR test, fix buffered_union * Add a proptest of BooleanQuery. (#2690) * fix build * Increase the document count. * fix merge conflict * fix debug assert * Fix compilation errors after rebase - Remove duplicate proptest_boolean_query module - Remove duplicate cost() method implementations - Fix TopDocs API usage (add .order_by_score()) - Remove duplicate imports - Remove unused variable assignments --------- Co-authored-by: Paul Masurel <paul@quickwit.io> Co-authored-by: Pascal Seitz <pascal.seitz@datadoghq.com> Co-authored-by: Stu Hood <stuhood@gmail.com>	2025-12-30 14:43:25 +01:00
ChangRui-Ryan	e0b62e00ac	optimize RangeDocSet for non-overlapping query ranges (#2783 )	2025-12-29 16:55:28 +01:00
Stu Hood	ce97beb86f	Add support for natural-order-with-none-highest in `TopDocs::order_by` (#2780 ) * Add `ComparatorEnum::NaturalNoneHigher`. * Fix comments.	2025-12-23 09:22:20 +01:00
Stu Hood	c0f21a45ae	Use a strict comparison in TopNComputer (#2777 ) * Remove `(Partial)Ord` from `ComparableDoc`, and unify comparison between `TopNComputer` and `Comparator`. * Doc cleanups. * Require Ord for `ComparableDoc`. * Semantics are actually _ascending_ DocId order. * Adjust docs again for ascending DocId order. * minor change --------- Co-authored-by: Paul Masurel <paul.masurel@datadoghq.com>	2025-12-18 12:13:23 +01:00
Moe	73657dff77	fix: fixed integer overflow in ExpUnrolledLinkedList for large datasets (#2735 ) * Fixed the overflow issue. * Fixed lint issues. * Applied PR fixes. * Fixed a lint issue.	2025-12-16 22:57:12 +01:00
Moe	e3c9be1f92	fix: boolean query incorrectly dropping documents when AllScorer is present (#2760 ) * Fixed the range issue. * Fixed the second all scorer issue * Improved docs + tests * Improved code. * Fixed lint issues. * Improved tests + logic based on PR comments. * Fixed lint issues. * Increase the document count. * Improved the prop-tests * Expand the index size, and remove unused parameter. --------- Co-authored-by: Stu Hood <stuhood@gmail.com>	2025-12-16 22:52:02 +01:00
Ming	ba61ed6ef3	fix: vint buffer can overflow (#2778 ) * fix vint overflow * comment	2025-12-16 22:50:41 +01:00
trinity-1686a	d0e1600135	fix bug with minimum_should_match and AllScorer (#2774 )	2025-12-14 10:10:45 +01:00
PSeitz-dd	e9020d17d4	fix coverage (#2769 )	2025-12-11 11:35:58 +01:00
PSeitz-dd	5ba0031f7d	move rand_distr to dev_dep (#2772 )	2025-12-11 18:23:50 +08:00
Philippe Noël	22dde8f9ae	chore: Make some delete-related functions public (#46 ) (#2766 ) Co-authored-by: Ming <ming.ying.nyc@gmail.com>	2025-12-11 01:22:15 +01:00
Philippe Noël	14cc24614e	Make DeleteMeta pub (#2765 ) Co-authored-by: Ming Ying <ming.ying.nyc@gmail.com>	2025-12-11 00:11:03 +01:00
Philippe Noël	8a1079b2dc	expose AddOperation and with_max_doc (#7 ) (#2762 ) Co-authored-by: Ming <ming.ying.nyc@gmail.com>	2025-12-11 00:10:42 +01:00
Philippe Noël	794ff1ffc9	chore: Make `Language` hashable (#79 ) (#2763 ) Co-authored-by: Ming <ming.ying.nyc@gmail.com>	2025-12-10 15:38:43 +01:00
PSeitz-dd	c6912ce89a	Handle JSON fields and columnar in space_usage (#2761 ) return field names in space_usage instead of `Field` more detailed info for columns	2025-12-10 20:33:33 +08:00
PSeitz	618e3bd11b	Term and IndexingTerm cleanup (#2750 ) * refactor term * add deprecated functions --------- Co-authored-by: Pascal Seitz <pascal.seitz@datadoghq.com>	2025-12-05 09:48:40 +08:00
PSeitz	b2f99c6217	add term->histogram benchmark (#2758 ) * add term->histogram benchmark * add more term aggs --------- Co-authored-by: Pascal Seitz <pascal.seitz@datadoghq.com>	2025-12-04 02:29:37 +01:00
PSeitz	76de5bab6f	fix unsafe warnings (#2757 )	2025-12-03 20:15:21 +08:00
rustmailer	b7eb31162b	docs: add usage example to README (#2743 )	2025-12-02 21:56:57 +01:00
Paul Masurel	63c66005db	Lazy scorers (#2726 ) * Refactoring of the score tweaker into `SortKeyComputer`s to unlock two features. - Allow lazy evaluation of score. As soon as we identified that a doc won't reach the topK threshold, we can stop the evaluation. - Allow for a different segment level score, segment level score and their conversion. This PR breaks public API, but fixing code is straightforward. * Bumping tantivy version --------- Co-authored-by: Paul Masurel <paul.masurel@datadoghq.com>	2025-12-01 15:38:57 +01:00
Paul Masurel	7d513a44c5	Added some benchmark for top K by a fast field (#2754 ) Also removed query parsing from the bench code. Co-authored-by: Paul Masurel <paul.masurel@datadoghq.com>	2025-12-01 14:58:29 +01:00
Stu Hood	ca87fcd454	Implement `collect_block` for `Collector`s which wrap other `Collector`s (#2727 ) * Implement `collect_block` for tuple Collectors, and for MultiCollector. * Two more.	2025-12-01 12:26:29 +01:00
Ang	08a92675dc	Fix typos again (#2753 ) Found via `codespell -S benches,stopwords.rs -L womens,parth,abd,childs,ond,ser,ue,mot,hel,atleast,pris,claus,allo`	2025-12-01 12:15:41 +01:00
Raphaël Cohen	f7f4b354d6	fix: Handle phrase prefixed with star (#2751 ) Signed-off-by: Darkheir <raphael.cohen@sekoia.io>	2025-12-01 11:43:25 +01:00
Paul Masurel	25d44fcec8	Revert "remove unused columnar api (#2742 )" (#2748 ) * Revert "remove unused columnar api (#2742)" This reverts commit `8725594d47`. * Clippy comment + removing fill_vals --------- Co-authored-by: Paul Masurel <paul.masurel@datadoghq.com>	2025-11-26 17:44:02 +01:00
PSeitz-dd	842fe9295f	split Term in Term and IndexingTerm (#2744 ) * split Term in Term and IndexingTerm * add append_json_path to JsonTermSerializer	2025-11-26 16:48:59 +01:00
Paul Masurel	f88b7200b2	Optimization when posting list are saturated. (#2745 ) * Optimization when posting list are saturated. If a posting list doc freq is the segment reader's max_doc, and if scoring does not matter, we can replace it by a AllScorer. In turn, in a boolean query, we can dismiss all scorers and empty scorers, to accelerate the request. * Added range query optimization * CR comment * CR comments * CR comment --------- Co-authored-by: Paul Masurel <paul.masurel@datadoghq.com>	2025-11-26 15:50:57 +01:00
PSeitz-dd	8725594d47	remove unused columnar api (#2742 )	2025-11-21 18:07:25 +01:00
PSeitz	43a784671a	clippy (#2741 ) Co-authored-by: Pascal Seitz <pascal.seitz@datadoghq.com>	2025-11-21 18:07:03 +01:00
Paul Masurel	c363bbd23d	Optimize term aggregation with low cardinality + some refactoring (#2740 ) This introduce an optimization of top level term aggregation on field with a low cardinality. We then use a Vec as the underlying map. In addition, we buffer subaggregations. --------- Co-authored-by: Pascal Seitz <pascal.seitz@datadoghq.com> Co-authored-by: Paul Masurel <paul@quickwit.io>	2025-11-21 14:46:29 +01:00