rebased on main

First stab at tantivy's codec
For the moment, this only allows for postings codec. Also, on the write side, it does not include positions yet. Implementation details: On the write side, we use static typing. A lot of types are now generics over the codec, but with a default codec type that makes it so, we should not break client projects too much. On the read side, we rely on a ObjectSafeCodec contraption to avoid the proliferation of generics. That object's point is to make sure we can build TermScorer with a concrete codec specific type before reboxing it. (same thing for PhraseScorer).
2026-06-08 11:30:41 +00:00 · 2026-06-08 11:22:50 +02:00 · 2026-06-08 11:17:54 +02:00 · 2026-06-08 10:55:54 +02:00 · 2026-06-08 10:49:04 +02:00 · 2026-06-08 10:48:48 +02:00
151 changed files with 12613 additions and 2749 deletions
--- a/.claude/skills/update-changelog/SKILL.md
+++ b/.claude/skills/update-changelog/SKILL.md
@@ -0,0 +1,87 @@
+---
+name: update-changelog
+description: Update CHANGELOG.md with merged PRs since the last changelog update, categorized by type
+---
+
+# Update Changelog
+
+This skill updates CHANGELOG.md with merged PRs that aren't already listed.
+
+## Step 1: Determine the changelog scope
+
+Read `CHANGELOG.md` to identify the current unreleased version section at the top (e.g., `Tantivy 0.26 (Unreleased)`).
+
+Collect all PR numbers already mentioned in the unreleased section by extracting `#NNNN` references.
+
+## Step 2: Find merged PRs not yet in the changelog
+
+Use `gh` to list recently merged PRs from the upstream repo:
+
+```bash
+gh pr list --repo quickwit-oss/tantivy --state merged --limit 100 --json number,title,author,labels,mergedAt
+```
+
+Filter out any PRs whose number already appears in the unreleased section of the changelog.
+
+## Step 3: Consolidate related PRs
+
+Before categorizing, group PRs that belong to the same logical change. This is critical for producing a clean changelog. Use PR descriptions, titles, cross-references, and the files touched to identify relationships.
+
+**Merge follow-up PRs into the original:**
+- If a PR is a bugfix, refinement, or follow-up to another PR in the same unreleased cycle, combine them into a single changelog entry with multiple `[#N](url)` links.
+- Also consolidate PRs that touch the same feature area even if not explicitly linked — e.g., a PR fixing an edge case in a new API should be folded into the entry for the PR that introduced that API.
+
+**Filter out bugfixes on unreleased features:**
+- If a bugfix PR fixes something introduced by another PR in the **same unreleased version**, it must NOT appear as a separate Bugfixes entry. Instead, silently fold it into the original feature/improvement entry. The changelog should describe the final shipped state, not the development history.
+- To detect this: check if the bugfix PR references or reverts changes from another PR in the same release cycle, or if it touches code that was newly added (not present in the previous release).
+
+## Step 4: Review the actual code diff
+
+**Do not rely on PR titles or descriptions alone.** For every candidate PR, run `gh pr diff <number> --repo quickwit-oss/tantivy` and read the actual changes. PR titles are often misleading — the diff is the source of truth.
+
+**What to look for in the diff:**
+- Does it change observable behavior, public API surface, or performance characteristics?
+- Is the change something a user of the library would notice or need to know about?
+- Could the change break existing code (API changes, removed features)?
+
+**Skip PRs where the diff reveals the change is not meaningful enough for the changelog** — e.g., cosmetic renames, trivial visibility tweaks, test-only changes, etc.
+
+## Step 5: Categorize each PR group
+
+For each PR (or consolidated group) that survived the diff review, determine its category:
+
+- **Bugfixes** — fixes to behavior that existed in the **previous release**. NOT fixes to features introduced in this release cycle.
+- **Features/Improvements** — new features, API additions, new options, improvements that change user-facing behavior or add new capabilities.
+- **Performance** — optimizations, speed improvements, memory reductions. **If a PR adds new API whose primary purpose is enabling a performance optimization, categorize it as Performance, not Features.** The deciding question is: does a user benefit from this because of new functionality, or because things got faster/leaner? For example, a new trait method that exists solely to enable cheaper intersection ordering is Performance, not a Feature.
+
+If a PR doesn't clearly fit any category (e.g., CI-only changes, internal refactors with no user-facing impact, dependency bumps with no behavior change), skip it — not everything belongs in the changelog.
+
+When unclear, use your best judgment or ask the user.
+
+## Step 6: Format entries
+
+Each entry must follow this exact format:
+
+```
+- Description [#NUMBER](https://github.com/quickwit-oss/tantivy/pull/NUMBER)(@author)
+```
+
+Rules:
+- The description should be concise and describe the user-facing change (not the implementation). Describe the final shipped state, not the incremental development steps.
+- Use sub-categories with bold headers when multiple entries relate to the same area (e.g., `- **Aggregation**` with indented entries beneath). Follow the existing grouping style in the changelog.
+- Author is the GitHub username from the PR, prefixed with `@`. For consolidated entries, include all contributing authors.
+- For consolidated PRs, list all PR links in a single entry: `[#100](url) [#110](url)` (see existing entries for examples).
+
+## Step 7: Present changes to the user
+
+Show the user the proposed changelog entries grouped by category **before** editing the file. Ask for confirmation or adjustments.
+
+## Step 8: Update CHANGELOG.md
+
+Insert the new entries into the appropriate sections of the unreleased version block. If a section doesn't exist yet, create it following the order: Bugfixes, Features/Improvements, Performance.
+
+Append new entries at the end of each section (before the next section header or version header).
+
+## Step 9: Verify
+
+Read back the updated unreleased section and display it to the user for final review.
--- a/.github/dependabot.yml
+++ b/.github/dependabot.yml
@@ -6,6 +6,8 @@ updates:
    interval: daily
    time: "20:00"
  open-pull-requests-limit: 10
+  cooldown:
+    default-days: 2

 - package-ecosystem: "github-actions"
  directory: "/"
@@ -13,3 +15,5 @@ updates:
    interval: daily
    time: "20:00"
  open-pull-requests-limit: 10
+  cooldown:
+    default-days: 2
--- a/.github/workflows/coverage.yml
+++ b/.github/workflows/coverage.yml
@@ -4,6 +4,9 @@ on:
  push:
    branches: [main]

+permissions:
+  contents: read
+
 # Ensures that we cancel running jobs for the same PR / same workflow.
 concurrency:
  group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }}
@@ -12,16 +15,20 @@ concurrency:
 jobs:
  coverage:
    runs-on: ubuntu-latest
+
+    permissions:
+      contents: read
+
    steps:
-      - uses: actions/checkout@v4
+      - uses: actions/checkout@df4cb1c069e1874edd31b4311f1884172cec0e10 # v6.0.3
      - name: Install Rust
        run: rustup toolchain install nightly-2025-12-01 --profile minimal --component llvm-tools-preview
-      - uses: Swatinem/rust-cache@v2
-      - uses: taiki-e/install-action@cargo-llvm-cov
+      - uses: Swatinem/rust-cache@c19371144df3bb44fab255c43d04cbc2ab54d1c4 # v2.9.1
+      - uses: taiki-e/install-action@e4b3a0453201addddc06d3a72db90326aad87084 # cargo-llvm-cov
      - name: Generate code coverage
        run: cargo +nightly-2025-12-01 llvm-cov --all-features --workspace --doctests --lcov --output-path lcov.info
      - name: Upload coverage to Codecov
-        uses: codecov/codecov-action@v3
+        uses: codecov/codecov-action@57e3a136b779b570ffcdbf80b3bdc90e7fab3de2 # v6.0.0
        continue-on-error: true
        with:
          token: ${{ secrets.CODECOV_TOKEN }} # not required for public repos
--- a/.github/workflows/long_running.yml
+++ b/.github/workflows/long_running.yml
@@ -8,6 +8,9 @@ env:
  CARGO_TERM_COLOR: always
  NUM_FUNCTIONAL_TEST_ITERATIONS: 20000

+permissions:
+  contents: read
+
 # Ensures that we cancel running jobs for the same PR / same workflow.
 concurrency:
  group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }}
@@ -18,10 +21,13 @@ jobs:

    runs-on: ubuntu-latest

+    permissions:
+      contents: read
+
    steps:
-    - uses: actions/checkout@v4
+    - uses: actions/checkout@df4cb1c069e1874edd31b4311f1884172cec0e10 # v6.0.3
    - name: Install stable
-      uses: actions-rs/toolchain@v1
+      uses: actions-rs/toolchain@16499b5e05bf2e26879000db0c1d13f7e13fa3af # v1.0.7
      with:
          toolchain: stable
          profile: minimal
--- a/.github/workflows/scorecard.yml
+++ b/.github/workflows/scorecard.yml
@@ -0,0 +1,49 @@
+name: OpenSSF Scorecard
+
+on:
+  schedule:
+    - cron: '0 0 * * 0'
+  push:
+    branches:
+      - main
+
+permissions:
+  contents: read
+
+jobs:
+  analysis:
+    name: Scorecards analysis
+    runs-on: ubuntu-latest
+    permissions:
+      # Needed to upload the results to code-scanning dashboard.
+      security-events: write
+      # Needed to publish results
+      id-token: write
+
+    steps:
+      - name: 'Checkout code'
+        uses: actions/checkout@df4cb1c069e1874edd31b4311f1884172cec0e10 # v6.0.3
+        with:
+          persist-credentials: false
+
+      - name: 'Run analysis'
+        uses: ossf/scorecard-action@4eaacf0543bb3f2c246792bd56e8cdeffafb205a # v2.4.3
+        with:
+          results_file: results.sarif
+          results_format: sarif
+          repo_token: ${{ secrets.GITHUB_TOKEN }}
+          publish_results: true
+
+      # Upload the results as artifacts.
+      - name: 'Upload artifact'
+        uses: actions/upload-artifact@043fb46d1a93c77aae656e7c1c64a875d1fc6a0a # v7.0.1
+        with:
+          name: SARIF file
+          path: results.sarif
+          retention-days: 5
+
+      # Upload the results to GitHub's code scanning dashboard.
+      - name: 'Upload to code-scanning'
+        uses: github/codeql-action/upload-sarif@87557b9c84dde89fdd9b10e88954ac2f4248e463 # v4.36.1
+        with:
+          sarif_file: results.sarif
--- a/.github/workflows/test.yml
+++ b/.github/workflows/test.yml
@@ -9,6 +9,9 @@ on:
 env:
  CARGO_TERM_COLOR: always

+permissions:
+  contents: read
+
 # Ensures that we cancel running jobs for the same PR / same workflow.
 concurrency:
  group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }}
@@ -19,23 +22,27 @@ jobs:

    runs-on: ubuntu-latest

+    permissions:
+      contents: read
+      checks: write
+
    steps:
-    - uses: actions/checkout@v4
+    - uses: actions/checkout@df4cb1c069e1874edd31b4311f1884172cec0e10 # v6.0.3

    - name: Install nightly
-      uses: actions-rs/toolchain@v1
+      uses: actions-rs/toolchain@16499b5e05bf2e26879000db0c1d13f7e13fa3af # v1.0.7
      with:
            toolchain: nightly
            profile: minimal
            components: rustfmt
    - name: Install stable
-      uses: actions-rs/toolchain@v1
+      uses: actions-rs/toolchain@16499b5e05bf2e26879000db0c1d13f7e13fa3af # v1.0.7
      with:
            toolchain: stable
            profile: minimal
            components: clippy

-    - uses: Swatinem/rust-cache@v2
+    - uses: Swatinem/rust-cache@c19371144df3bb44fab255c43d04cbc2ab54d1c4 # v2.9.1

    - name: Check Formatting
      run: cargo +nightly fmt --all -- --check
@@ -47,7 +54,7 @@ jobs:
    - name: Check Bench Compilation
      run: cargo +nightly bench --no-run --profile=dev --all-features

-    - uses: actions-rs/clippy-check@v1
+    - uses: actions-rs/clippy-check@b5b5f21f4797c02da247df37026fcd0a5024aa4d # v1.0.7
      with:
        toolchain: stable
        token: ${{ secrets.GITHUB_TOKEN }}
@@ -57,6 +64,9 @@ jobs:

    runs-on: ubuntu-latest

+    permissions:
+      contents: read
+
    strategy:
      matrix:
        features:
@@ -67,17 +77,17 @@ jobs:
    name: test-${{ matrix.features.label}}

    steps:
-    - uses: actions/checkout@v4
+    - uses: actions/checkout@df4cb1c069e1874edd31b4311f1884172cec0e10 # v6.0.3

    - name: Install stable
-      uses: actions-rs/toolchain@v1
+      uses: actions-rs/toolchain@16499b5e05bf2e26879000db0c1d13f7e13fa3af # v1.0.7
      with:
            toolchain: stable
            profile: minimal
            override: true

-    - uses: taiki-e/install-action@nextest
-    - uses: Swatinem/rust-cache@v2
+    - uses: taiki-e/install-action@56cc9adf3a3e2c23eafb56e8acaf9d0373cb845a # nextest
+    - uses: Swatinem/rust-cache@c19371144df3bb44fab255c43d04cbc2ab54d1c4 # v2.9.1

    - name: Run tests
      run: |
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -1,3 +1,58 @@
+Tantivy 0.26.1
+================================
+
+## Performance
+- Fix quadratic runtime in nested term and composite aggregations: memory accounting scanned all parent buckets on every collect instead of just the current parent (@PSeitz @fulmicoton)
+
+Tantivy 0.26 (Unreleased)
+================================
+
+## Bugfixes
+- Align float query coercion during search with the columnar coercion rules [#2692](https://github.com/quickwit-oss/tantivy/pull/2692)(@fulmicoton)
+- Fix lenient elastic range queries with trailing closing parentheses [#2816](https://github.com/quickwit-oss/tantivy/pull/2816)(@evance-br)
+- Fix intersection `seek()` advancing below current doc id [#2812](https://github.com/quickwit-oss/tantivy/pull/2812)(@fulmicoton)
+- Fix phrase query prefixed with `*` [#2751](https://github.com/quickwit-oss/tantivy/pull/2751)(@Darkheir)
+- Fix `vint` buffer overflow during index creation [#2778](https://github.com/quickwit-oss/tantivy/pull/2778)(@rebasedming)
+- Fix integer overflow in `ExpUnrolledLinkedList` for large datasets [#2735](https://github.com/quickwit-oss/tantivy/pull/2735)(@mdashti)
+- Fix integer overflow in segment sorting and merge policy truncation [#2846](https://github.com/quickwit-oss/tantivy/pull/2846)(@anaslimem)
+- Fix merging of intermediate aggregation results [#2719](https://github.com/quickwit-oss/tantivy/pull/2719)(@PSeitz)
+- Fix deduplicate doc counts in term aggregation for multi-valued fields [#2854](https://github.com/quickwit-oss/tantivy/pull/2854)(@nuri-yoo)
+
+## Features/Improvements
+- **Aggregation**
+    - Add filter aggregation [#2711](https://github.com/quickwit-oss/tantivy/pull/2711)(@mdashti)
+    - Add include/exclude filtering for term aggregations [#2717](https://github.com/quickwit-oss/tantivy/pull/2717)(@PSeitz)
+    - Add public accessors for intermediate aggregation results [#2829](https://github.com/quickwit-oss/tantivy/pull/2829)(@congx4)
+    - Replace HyperLogLog++ with Apache DataSketches HLL for cardinality aggregation [#2837](https://github.com/quickwit-oss/tantivy/pull/2837) [#2842](https://github.com/quickwit-oss/tantivy/pull/2842)(@congx4)
+    - Add composite aggregation [#2856](https://github.com/quickwit-oss/tantivy/pull/2856)(@fulmicoton)
+- **Fast Fields**
+    - Add fast field fallback for `TermQuery` when the field is not indexed [#2693](https://github.com/quickwit-oss/tantivy/pull/2693)(@PSeitz-dd)
+    - Add fast field support for `Bytes` values [#2830](https://github.com/quickwit-oss/tantivy/pull/2830)(@mdashti)
+- **Query Parser**
+    - Add support for regexes in the query grammar [#2677](https://github.com/quickwit-oss/tantivy/pull/2677) [#2818](https://github.com/quickwit-oss/tantivy/pull/2818)(@Darkheir)
+    - Deduplicate queries in query parser [#2698](https://github.com/quickwit-oss/tantivy/pull/2698)(@PSeitz-dd)
+- Add erased `SortKeyComputer` for sorting on column types unknown until runtime [#2770](https://github.com/quickwit-oss/tantivy/pull/2770) [#2790](https://github.com/quickwit-oss/tantivy/pull/2790)(@stuhood @PSeitz)
+- Add natural-order-with-none-highest support in `TopDocs::order_by` [#2780](https://github.com/quickwit-oss/tantivy/pull/2780)(@stuhood)
+- Move stemming behing `stemmer` feature flag [#2791](https://github.com/quickwit-oss/tantivy/pull/2791)(@fulmicoton)
+- Make `DeleteMeta`, `AddOperation`, `advance_deletes`, `with_max_doc`, `serializer` module, and `delete_queue` public [#2762](https://github.com/quickwit-oss/tantivy/pull/2762) [#2765](https://github.com/quickwit-oss/tantivy/pull/2765) [#2766](https://github.com/quickwit-oss/tantivy/pull/2766) [#2835](https://github.com/quickwit-oss/tantivy/pull/2835)(@philippemnoel @PSeitz)
+- Make `Language` hashable [#2763](https://github.com/quickwit-oss/tantivy/pull/2763)(@philippemnoel)
+- Improve `space_usage` reporting for JSON fields and columnar data [#2761](https://github.com/quickwit-oss/tantivy/pull/2761)(@PSeitz-dd)
+- Split `Term` into `Term` and `IndexingTerm` [#2744](https://github.com/quickwit-oss/tantivy/pull/2744) [#2750](https://github.com/quickwit-oss/tantivy/pull/2750)(@PSeitz-dd @PSeitz)
+
+## Performance
+- **Aggregation**
+    - Large speed up and memory reduction for nested high cardinality aggregations by using one collector per request instead of one per bucket, and adding `PagedTermMap` for faster medium cardinality term aggregations [#2715](https://github.com/quickwit-oss/tantivy/pull/2715) [#2759](https://github.com/quickwit-oss/tantivy/pull/2759)(@PSeitz @PSeitz-dd)
+    - Optimize low-cardinality term aggregations by using a `Vec` instead of a `HashMap` [#2740](https://github.com/quickwit-oss/tantivy/pull/2740)(@fulmicoton-dd)
+- Optimize `ExistsQuery` for a high number of dynamic columns [#2694](https://github.com/quickwit-oss/tantivy/pull/2694)(@PSeitz-dd)
+- Add lazy scorers to stop score evaluation early when a doc won't reach the top-K threshold [#2726](https://github.com/quickwit-oss/tantivy/pull/2726) [#2777](https://github.com/quickwit-oss/tantivy/pull/2777)(@fulmicoton @stuhood)
+- Add `DocSet::cost()` and use it to order scorers in intersections [#2707](https://github.com/quickwit-oss/tantivy/pull/2707)(@PSeitz)
+- Add `collect_block` support for collector wrappers [#2727](https://github.com/quickwit-oss/tantivy/pull/2727)(@stuhood)
+- Optimize saturated posting lists by replacing them with `AllScorer` in boolean queries [#2745](https://github.com/quickwit-oss/tantivy/pull/2745) [#2760](https://github.com/quickwit-oss/tantivy/pull/2760) [#2774](https://github.com/quickwit-oss/tantivy/pull/2774)(@fulmicoton @mdashti @trinity-1686a)
+- Add `seek_danger` on `DocSet` for more efficient intersections [#2538](https://github.com/quickwit-oss/tantivy/pull/2538) [#2810](https://github.com/quickwit-oss/tantivy/pull/2810)(@PSeitz @stuhood @fulmicoton)
+- Skip column traversal in `RangeDocSet` when query range does not overlap with column bounds [#2783](https://github.com/quickwit-oss/tantivy/pull/2783)(@ChangRui-Ryan)
+- Speed up exclude queries by supporting multiple excluded `DocSet`s without intermediate union [#2825](https://github.com/quickwit-oss/tantivy/pull/2825)(@PSeitz)
+- Improve union performance for non-score unions with `fill_buffer` and optimized `TinySet` [#2863](https://github.com/quickwit-oss/tantivy/pull/2863)(@PSeitz)
+
 Tantivy 0.25
 ================================

--- a/Cargo.toml
+++ b/Cargo.toml
@@ -11,7 +11,7 @@ repository = "https://github.com/quickwit-oss/tantivy"
 readme = "README.md"
 keywords = ["search", "information", "retrieval"]
 edition = "2021"
-rust-version = "1.85"
+rust-version = "1.86"
 exclude = ["benches/*.json", "benches/*.txt"]

 [dependencies]
@@ -27,7 +27,7 @@ regex = { version = "1.5.5", default-features = false, features = [
 aho-corasick = "1.0"
 tantivy-fst = "0.5"
 memmap2 = { version = "0.9.0", optional = true }
-lz4_flex = { version = "0.12", default-features = false, optional = true }
+lz4_flex = { version = "0.13", default-features = false, optional = true }
 zstd = { version = "0.13", optional = true, default-features = false }
 tempfile = { version = "3.12.0", optional = true }
 log = "0.4.16"
@@ -47,7 +47,7 @@ rustc-hash = "2.0.0"
 thiserror = "2.0.1"
 htmlescape = "0.3.1"
 fail = { version = "0.5.0", optional = true }
-time = { version = "0.3.35", features = ["serde-well-known"] }
+time = { version = "0.3.47", features = ["serde-well-known"] }
 smallvec = "1.8.0"
 rayon = "1.5.2"
 lru = "0.16.3"
@@ -57,15 +57,15 @@ measure_time = "0.9.0"
 arc-swap = "1.5.0"
 bon = "3.3.1"

-columnar = { version = "0.6", path = "./columnar", package = "tantivy-columnar" }
-sstable = { version = "0.6", path = "./sstable", package = "tantivy-sstable", optional = true }
-stacker = { version = "0.6", path = "./stacker", package = "tantivy-stacker" }
-query-grammar = { version = "0.25.0", path = "./query-grammar", package = "tantivy-query-grammar" }
-tantivy-bitpacker = { version = "0.9", path = "./bitpacker" }
-common = { version = "0.10", path = "./common/", package = "tantivy-common" }
-tokenizer-api = { version = "0.6", path = "./tokenizer-api", package = "tantivy-tokenizer-api" }
-sketches-ddsketch = { version = "0.3.0", features = ["use_serde"] }
-hyperloglogplus = { version = "0.4.1", features = ["const-loop"] }
+columnar = { version = "0.7", path = "./columnar", package = "tantivy-columnar" }
+sstable = { version = "0.7", path = "./sstable", package = "tantivy-sstable", optional = true }
+stacker = { version = "0.7", path = "./stacker", package = "tantivy-stacker" }
+query-grammar = { version = "0.26.0", path = "./query-grammar", package = "tantivy-query-grammar" }
+tantivy-bitpacker = { version = "0.10", path = "./bitpacker" }
+common = { version = "0.11", path = "./common/", package = "tantivy-common" }
+tokenizer-api = { version = "0.7", path = "./tokenizer-api", package = "tantivy-tokenizer-api" }
+sketches-ddsketch = { version = "0.4", features = ["use_serde"] }
+datasketches = { version = "0.3.0", features = ["hll"] }
 futures-util = { version = "0.3.28", optional = true }
 futures-channel = { version = "0.3.28", optional = true }
 fnv = "1.0.7"
@@ -75,7 +75,7 @@ typetag = "0.2.21"
 winapi = "0.3.9"

 [dev-dependencies]
-binggan = "0.14.2"
+binggan = "0.17.0"
 rand = "0.9"
 maplit = "1.0.2"
 matches = "0.1.9"
@@ -86,13 +86,13 @@ futures = "0.3.21"
 paste = "1.0.11"
 more-asserts = "0.3.1"
 rand_distr = "0.5"
-time = { version = "0.3.10", features = ["serde-well-known", "macros"] }
+time = { version = "0.3.47", features = ["serde-well-known", "macros"] }
 postcard = { version = "1.0.4", features = [
    "use-std",
 ], default-features = false }

 [target.'cfg(not(windows))'.dev-dependencies]
-criterion = { version = "0.5", default-features = false }
+criterion = { version = "0.8", default-features = false }

 [dev-dependencies.fail]
 version = "0.5.0"
@@ -202,3 +202,10 @@ harness = false
 name = "regex_all_terms"
 harness = false

+[[bench]]
+name = "query_parser_nested"
+harness = false
+
+[[bench]]
+name = "intersection_bench"
+harness = false
--- a/README.md
+++ b/README.md
@@ -1,6 +1,7 @@
 [![Docs](https://docs.rs/tantivy/badge.svg)](https://docs.rs/crate/tantivy/)
 [![Build Status](https://github.com/quickwit-oss/tantivy/actions/workflows/test.yml/badge.svg)](https://github.com/quickwit-oss/tantivy/actions/workflows/test.yml)
 [![codecov](https://codecov.io/gh/quickwit-oss/tantivy/branch/main/graph/badge.svg)](https://codecov.io/gh/quickwit-oss/tantivy)
+[![OpenSSF Scorecard](https://api.scorecard.dev/projects/github.com/quickwit-oss/tantivy/badge)](https://scorecard.dev/viewer/?uri=github.com/quickwit-oss/tantivy)
 [![Join the chat at https://discord.gg/MT27AG5EVE](https://shields.io/discord/908281611840282624?label=chat%20on%20discord)](https://discord.gg/MT27AG5EVE)
 [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
 [![Crates.io](https://img.shields.io/crates/v/tantivy.svg)](https://crates.io/crates/tantivy)
--- a/benches/agg_bench.rs
+++ b/benches/agg_bench.rs
@@ -10,7 +10,7 @@ use tantivy::aggregation::agg_req::Aggregations;
 use tantivy::aggregation::AggregationCollector;
 use tantivy::query::{AllQuery, TermQuery};
 use tantivy::schema::{IndexRecordOption, Schema, TextFieldIndexing, FAST, STRING};
-use tantivy::{doc, Index, Term};
+use tantivy::{doc, DateTime, Index, Term};

 #[global_allocator]
 pub static GLOBAL: &PeakMemAlloc<std::alloc::System> = &INSTRUMENTED_SYSTEM;
@@ -63,6 +63,8 @@ fn bench_agg(mut group: InputGroup<Index>) {
    register!(group, terms_all_unique_with_avg_sub_agg);
    register!(group, terms_many_with_avg_sub_agg);
    register!(group, terms_status_with_avg_sub_agg);
+    register!(group, terms_status_with_terms_zipf_1000_sub_agg);
+    register!(group, terms_zipf_1000_with_terms_status_sub_agg);
    register!(group, terms_status_with_histogram);
    register!(group, terms_zipf_1000);
    register!(group, terms_zipf_1000_with_histogram);
@@ -70,8 +72,19 @@ fn bench_agg(mut group: InputGroup<Index>) {

    register!(group, terms_many_json_mixed_type_with_avg_sub_agg);

+    register!(group, composite_term_many_page_1000);
+    register!(group, composite_term_many_page_1000_with_avg_sub_agg);
+    register!(group, composite_term_few);
+    register!(group, composite_histogram);
+    register!(group, composite_histogram_calendar);
+
    register!(group, cardinality_agg);
+    register!(group, cardinality_agg_high_card);
+    register!(group, cardinality_agg_low_card);
    register!(group, terms_status_with_cardinality_agg);
+    register!(group, terms_100_buckets_with_cardinality_agg);
+    register!(group, terms_many_with_single_term_order_by_card);
+    register!(group, terms_many_with_single_term_2_order_by_card);

    register!(group, range_agg);
    register!(group, range_agg_with_avg_sub_agg);
@@ -159,10 +172,52 @@ fn cardinality_agg(index: &Index) {
    });
    execute_agg(index, agg_req);
 }
+// Full-scan cardinality on a near-1M-cardinality string field.
+// Hits the dense (PagedBitset) path: every doc has a unique term,
+// so the bucket promotes from FxHashSet shortly into the scan.
+fn cardinality_agg_high_card(index: &Index) {
+    let agg_req = json!({
+        "cardinality": {
+            "cardinality": {
+                "field": "text_all_unique_terms"
+            },
+        }
+    });
+    execute_agg(index, agg_req);
+}
+// Full-scan cardinality on a tiny-cardinality string field (7 distinct
+// values). Stays on the FxHashSet path — the promotion threshold is
+// never crossed. Validates no regression on the sparse path.
+fn cardinality_agg_low_card(index: &Index) {
+    let agg_req = json!({
+        "cardinality": {
+            "cardinality": {
+                "field": "text_few_terms_status"
+            },
+        }
+    });
+    execute_agg(index, agg_req);
+}
 fn terms_status_with_cardinality_agg(index: &Index) {
    let agg_req = json!({
        "my_texts": {
            "terms": { "field": "text_few_terms_status" },
+            "aggs": {
+                "cardinality": {
+                    "cardinality": {
+                        "field": "text_few_terms_status"
+                    },
+                }
+            }
+        },
+    });
+    execute_agg(index, agg_req);
+}
+
+fn terms_100_buckets_with_cardinality_agg(index: &Index) {
+    let agg_req = json!({
+        "my_texts": {
+            "terms": { "field": "text_1000_terms_zipf", "size": 100 },
            "aggs": {
                "cardinality": {
                    "cardinality": {
@@ -175,6 +230,58 @@ fn terms_status_with_cardinality_agg(index: &Index) {
    execute_agg(index, agg_req);
 }

+fn terms_many_with_single_term_order_by_card(index: &Index) {
+    let agg_req = json!({
+        "my_texts": {
+            "terms": { "field": "text_many_terms" },
+            "aggs": {
+                "nested_terms": {
+                    "terms": {
+                        "field": "single_term",
+                        "order": { "cardinality": "desc" }
+                    },
+                    "aggs": {
+                        "cardinality": {
+                            "cardinality": { "field": "text_few_terms" }
+                        }
+                    }
+                }
+            }
+        },
+    });
+    execute_agg(index, agg_req);
+}
+
+// Two-level terms ordered by cardinality at each level: a high-card outer terms
+// (text_many_terms) ordered by a cardinality sub-agg, with a nested low-card terms
+// (text_few_terms_status) also ordered by a cardinality sub-agg, plus an avg.
+fn terms_many_with_single_term_2_order_by_card(index: &Index) {
+    let agg_req = json!({
+        "by_ip": {
+            "terms": {
+                "field": "text_many_terms",
+                "order": { "card_few_terms": "desc" }
+            },
+            "aggs": {
+                "card_few_terms": {
+                    "cardinality": { "field": "text_few_terms" }
+                },
+                "nested_terms": {
+                    "terms": {
+                        "field": " single_term",
+                        "order": { "distinct_path2": "desc" }
+                    },
+                    "aggs": {
+                        "avg_botscore": { "avg": { "field": "score" } },
+                        "distinct_path2": { "cardinality": { "field": "text_few_terms" } }
+                    }
+                }
+            }
+        }
+    });
+    execute_agg(index, agg_req);
+}
+
 fn terms_7(index: &Index) {
    let agg_req = json!({
        "my_texts": { "terms": { "field": "text_few_terms_status" } },
@@ -247,6 +354,30 @@ fn terms_all_unique_with_avg_sub_agg(index: &Index) {
    });
    execute_agg(index, agg_req);
 }
+fn terms_status_with_terms_zipf_1000_sub_agg(index: &Index) {
+    let agg_req = json!({
+        "my_texts": {
+            "terms": { "field": "text_few_terms_status" },
+            "aggs": {
+                "nested_terms": { "terms": { "field": "text_1000_terms_zipf" } }
+            }
+        }
+    });
+    execute_agg(index, agg_req);
+}
+
+fn terms_zipf_1000_with_terms_status_sub_agg(index: &Index) {
+    let agg_req = json!({
+        "my_texts": {
+            "terms": { "field": "text_1000_terms_zipf" },
+            "aggs": {
+                "nested_terms": { "terms": { "field": "text_few_terms_status" } }
+            }
+        }
+    });
+    execute_agg(index, agg_req);
+}
+
 fn terms_status_with_histogram(index: &Index) {
    let agg_req = json!({
        "my_texts": {
@@ -314,6 +445,75 @@ fn terms_many_json_mixed_type_with_avg_sub_agg(index: &Index) {
    execute_agg(index, agg_req);
 }

+fn composite_term_few(index: &Index) {
+    let agg_req = json!({
+        "my_ctf": {
+            "composite": {
+                "sources": [
+                    { "text_few_terms": { "terms": { "field": "text_few_terms" } } }
+                ],
+                "size": 1000
+            }
+        },
+    });
+    execute_agg(index, agg_req);
+}
+fn composite_term_many_page_1000(index: &Index) {
+    let agg_req = json!({
+        "my_ctmp1000": {
+            "composite": {
+                "sources": [
+                    { "text_many_terms": { "terms": { "field": "text_many_terms" } } }
+                ],
+                "size": 1000
+            }
+        },
+    });
+    execute_agg(index, agg_req);
+}
+fn composite_term_many_page_1000_with_avg_sub_agg(index: &Index) {
+    let agg_req = json!({
+        "my_ctmp1000wasa": {
+            "composite": {
+                "sources": [
+                    { "text_many_terms": { "terms": { "field": "text_many_terms" } } }
+                ],
+                "size": 1000,
+            },
+            "aggs": {
+                "average_f64": { "avg": { "field": "score_f64" } }
+            }
+        },
+    });
+    execute_agg(index, agg_req);
+}
+fn composite_histogram(index: &Index) {
+    let agg_req = json!({
+        "my_ch": {
+            "composite": {
+                "sources": [
+                    { "f64_histogram": { "histogram": { "field": "score_f64", "interval": 1 } } }
+                ],
+                "size": 1000
+            }
+        },
+    });
+    execute_agg(index, agg_req);
+}
+fn composite_histogram_calendar(index: &Index) {
+    let agg_req = json!({
+        "my_chc": {
+            "composite": {
+                "sources": [
+                    { "time_histogram": { "date_histogram": { "field": "timestamp", "calendar_interval": "month" } } }
+                ],
+                "size": 1000
+            }
+        },
+    });
+    execute_agg(index, agg_req);
+}
+
 fn execute_agg(index: &Index, agg_req: serde_json::Value) {
    let agg_req: Aggregations = serde_json::from_value(agg_req).unwrap();
    let collector = get_collector(agg_req);
@@ -491,11 +691,13 @@ fn get_test_index_bench(cardinality: Cardinality) -> tantivy::Result<Index> {
            TextFieldIndexing::default().set_index_option(IndexRecordOption::WithFreqs),
        )
        .set_stored();
-    let text_field = schema_builder.add_text_field("text", text_fieldtype);
+    let text_field = schema_builder.add_text_field("text", text_fieldtype.clone());
+    let single_term = schema_builder.add_text_field("single_term", FAST);
    let json_field = schema_builder.add_json_field("json", FAST);
    let text_field_all_unique_terms =
        schema_builder.add_text_field("text_all_unique_terms", STRING | FAST);
    let text_field_many_terms = schema_builder.add_text_field("text_many_terms", STRING | FAST);
+    let text_field_few_terms = schema_builder.add_text_field("text_few_terms", STRING | FAST);
    let text_field_few_terms_status =
        schema_builder.add_text_field("text_few_terms_status", STRING | FAST);
    let text_field_1000_terms_zipf =
@@ -504,6 +706,7 @@ fn get_test_index_bench(cardinality: Cardinality) -> tantivy::Result<Index> {
    let score_field = schema_builder.add_u64_field("score", score_fieldtype.clone());
    let score_field_f64 = schema_builder.add_f64_field("score_f64", score_fieldtype.clone());
    let score_field_i64 = schema_builder.add_i64_field("score_i64", score_fieldtype);
+    let date_field = schema_builder.add_date_field("timestamp", FAST);
    // use tmp dir
    let index = if reuse_index {
        Index::create_in_dir("agg_bench", schema_builder.build())?
@@ -523,6 +726,7 @@ fn get_test_index_bench(cardinality: Cardinality) -> tantivy::Result<Index> {
    let log_level_distribution =
        WeightedIndex::new(status_field_data.iter().map(|item| item.1)).unwrap();

+    let few_terms_data = ["INFO", "ERROR", "WARN", "DEBUG"];
    let lg_norm = rand_distr::LogNormal::new(2.996f64, 0.979f64).unwrap();

    let many_terms_data = (0..150_000)
@@ -552,12 +756,16 @@ fn get_test_index_bench(cardinality: Cardinality) -> tantivy::Result<Index> {
            index_writer.add_document(doc!(
                json_field => json!({"mixed_type": 10.0}),
                json_field => json!({"mixed_type": 10.0}),
+                single_term => "single_term",
+                single_term => "single_term",
                text_field => "cool",
                text_field => "cool",
                text_field_all_unique_terms => "cool",
                text_field_all_unique_terms => "coolo",
                text_field_many_terms => "cool",
                text_field_many_terms => "cool",
+                text_field_few_terms => "cool",
+                text_field_few_terms => "cool",
                text_field_few_terms_status => log_level_sample_a,
                text_field_few_terms_status => log_level_sample_b,
                text_field_1000_terms_zipf => term_1000_a.as_str(),
@@ -584,15 +792,18 @@ fn get_test_index_bench(cardinality: Cardinality) -> tantivy::Result<Index> {
                json!({"mixed_type": many_terms_data.choose(&mut rng).unwrap().to_string()})
            };
            index_writer.add_document(doc!(
+                single_term => "single_term",
                text_field => "cool",
                json_field => json,
                text_field_all_unique_terms => format!("unique_term_{}", rng.random::<u64>()),
                text_field_many_terms => many_terms_data.choose(&mut rng).unwrap().to_string(),
+                text_field_few_terms => few_terms_data.choose(&mut rng).unwrap().to_string(),
                text_field_few_terms_status => status_field_data[log_level_distribution.sample(&mut rng)].0,
                text_field_1000_terms_zipf => terms_1000[zipf_1000.sample(&mut rng) as usize - 1].as_str(),
                score_field => val as u64,
                score_field_f64 => lg_norm.sample(&mut rng),
                score_field_i64 => val as i64,
+                date_field => DateTime::from_timestamp_millis((val * 1_000_000.) as i64),
            ))?;
            if cardinality == Cardinality::OptionalSparse {
                for _ in 0..20 {
--- a/benches/and_or_queries.rs
+++ b/benches/and_or_queries.rs
@@ -22,7 +22,7 @@ use rand::rngs::StdRng;
 use rand::SeedableRng;
 use tantivy::collector::sort_key::SortByStaticFastValue;
 use tantivy::collector::{Collector, Count, TopDocs};
-use tantivy::query::{Query, QueryParser};
+use tantivy::query::QueryParser;
 use tantivy::schema::{Schema, FAST, TEXT};
 use tantivy::{doc, Index, Order, ReloadPolicy, Searcher};

@@ -38,7 +38,7 @@ struct BenchIndex {
 /// return two BenchIndex views:
 /// - single_field: QueryParser defaults to only "body"
 /// - multi_field:  QueryParser defaults to ["title", "body"]
-fn build_shared_indices(num_docs: usize, p_a: f32, p_b: f32, p_c: f32) -> (BenchIndex, BenchIndex) {
+fn build_index(num_docs: usize, terms: &[(&str, f32)]) -> (BenchIndex, BenchIndex) {
    // Unified schema (two text fields)
    let mut schema_builder = Schema::builder();
    let f_title = schema_builder.add_text_field("title", TEXT);
@@ -55,32 +55,17 @@ fn build_shared_indices(num_docs: usize, p_a: f32, p_b: f32, p_c: f32) -> (Bench
    {
        let mut writer = index.writer_with_num_threads(1, 500_000_000).unwrap();
        for _ in 0..num_docs {
-            let has_a = rng.random_bool(p_a as f64);
-            let has_b = rng.random_bool(p_b as f64);
-            let has_c = rng.random_bool(p_c as f64);
            let score = rng.random_range(0u64..100u64);
            let score2 = rng.random_range(0u64..100_000u64);
            let mut title_tokens: Vec<&str> = Vec::new();
            let mut body_tokens: Vec<&str> = Vec::new();
-            if has_a {
-                if rng.random_bool(0.1) {
-                    title_tokens.push("a");
-                } else {
-                    body_tokens.push("a");
-                }
-            }
-            if has_b {
-                if rng.random_bool(0.1) {
-                    title_tokens.push("b");
-                } else {
-                    body_tokens.push("b");
-                }
-            }
-            if has_c {
-                if rng.random_bool(0.1) {
-                    title_tokens.push("c");
-                } else {
-                    body_tokens.push("c");
+            for &(tok, prob) in terms {
+                if rng.random_bool(prob as f64) {
+                    if rng.random_bool(0.1) {
+                        title_tokens.push(tok);
+                    } else {
+                        body_tokens.push(tok);
+                    }
                }
            }
            if title_tokens.is_empty() && body_tokens.is_empty() {
@@ -110,59 +95,97 @@ fn build_shared_indices(num_docs: usize, p_a: f32, p_b: f32, p_c: f32) -> (Bench
    let qp_single = QueryParser::for_index(&index, vec![f_body]);
    let qp_multi = QueryParser::for_index(&index, vec![f_title, f_body]);

-    let single_view = BenchIndex {
+    let only_title = BenchIndex {
        index: index.clone(),
        searcher: searcher.clone(),
        query_parser: qp_single,
    };
-    let multi_view = BenchIndex {
+    let title_and_body = BenchIndex {
        index,
        searcher,
        query_parser: qp_multi,
    };
-    (single_view, multi_view)
+    (only_title, title_and_body)
+}
+
+fn format_pct(p: f32) -> String {
+    let pct = (p as f64) * 100.0;
+    let rounded = (pct * 1_000_000.0).round() / 1_000_000.0;
+    if rounded.fract() <= 0.001 {
+        format!("{}%", rounded as u64)
+    } else {
+        format!("{}%", rounded)
+    }
+}
+
+fn query_label(query_str: &str, term_pcts: &[(&str, String)]) -> String {
+    let mut label = query_str.to_string();
+    for (term, pct) in term_pcts {
+        label = label.replace(term, pct);
+    }
+    label.replace(' ', "_")
 }

 fn main() {
-    // Prepare corpora with varying selectivity. Build one index per corpus
-    // and derive two views (single-field vs multi-field) from it.
-    let scenarios = vec![
+    // terms with varying selectivity, ordered from rarest to most common.
+    // With 1M docs, we expect:
+    // a: 0.01% (100), b: 1% (10k), c: 5% (50k), d: 15% (150k), e: 30% (300k)
+    let num_docs = 1_000_000;
+    let terms: &[(&str, f32)] = &[
+        ("a", 0.0001),
+        ("b", 0.01),
+        ("c", 0.05),
+        ("d", 0.15),
+        ("e", 0.30),
+    ];
+
+    let queries: &[(&str, &[&str])] = &[
        (
-            "N=1M, p(a)=5%, p(b)=1%, p(c)=15%".to_string(),
-            1_000_000,
-            0.05,
-            0.01,
-            0.15,
+            "only_union",
+            &["c OR b", "c OR b OR d", "c OR e", "e OR a"] as &[&str],
        ),
        (
-            "N=1M, p(a)=1%, p(b)=1%, p(c)=15%".to_string(),
-            1_000_000,
-            0.01,
-            0.01,
-            0.15,
+            "only_intersection",
+            &["+c +b", "+c +b +d", "+c +e", "+e +a"] as &[&str],
+        ),
+        (
+            "union_intersection",
+            &["+c +(b OR d)", "+e +(c OR a)", "+(c OR b) +(d OR e)"] as &[&str],
        ),
    ];

-    let queries = &["a", "+a +b", "+a +b +c", "a OR b", "a OR b OR c"];
-
    let mut runner = BenchRunner::new();
-    for (label, n, pa, pb, pc) in scenarios {
-        let (single_view, multi_view) = build_shared_indices(n, pa, pb, pc);
+    let (only_title, title_and_body) = build_index(num_docs, terms);
+    let term_pcts: Vec<(&str, String)> = terms
+        .iter()
+        .map(|&(term, p)| (term, format_pct(p)))
+        .collect();

-        for (view_name, bench_index) in [("single_field", single_view), ("multi_field", multi_view)]
-        {
-            // Single-field group: default field is body only
-            let mut group = runner.new_group();
-            group.set_name(format!("{} — {}", view_name, label));
-            for query_str in queries {
+    for (view_name, bench_index) in [
+        ("single_field", only_title),
+        ("multi_field", title_and_body),
+    ] {
+        for (category_name, category_queries) in queries {
+            for query_str in *category_queries {
+                let mut group = runner.new_group();
+                let query_label = query_label(query_str, &term_pcts);
+                group.set_name(format!("{}_{}_{}", view_name, category_name, query_label));
                add_bench_task(&mut group, &bench_index, query_str, Count, "count");
                add_bench_task(
                    &mut group,
                    &bench_index,
                    query_str,
                    TopDocs::with_limit(10).order_by_score(),
-                    "top10",
+                    "top10_inv_idx",
                );
+                add_bench_task(
+                    &mut group,
+                    &bench_index,
+                    query_str,
+                    (Count, TopDocs::with_limit(10).order_by_score()),
+                    "count+top10",
+                );
+
                add_bench_task(
                    &mut group,
                    &bench_index,
@@ -180,39 +203,47 @@ fn main() {
                    )),
                    "top10_by_2ff",
                );
+
+                group.run();
            }
-            group.run();
        }
    }
 }

+trait FruitCount {
+    fn count(&self) -> usize;
+}
+
+impl FruitCount for usize {
+    fn count(&self) -> usize {
+        *self
+    }
+}
+
+impl<T> FruitCount for Vec<T> {
+    fn count(&self) -> usize {
+        self.len()
+    }
+}
+
+impl<A: FruitCount, B> FruitCount for (A, B) {
+    fn count(&self) -> usize {
+        self.0.count()
+    }
+}
+
 fn add_bench_task<C: Collector + 'static>(
    bench_group: &mut BenchGroup,
    bench_index: &BenchIndex,
    query_str: &str,
    collector: C,
    collector_name: &str,
-) {
-    let task_name = format!("{}_{}", query_str.replace(" ", "_"), collector_name);
+) where
+    C::Fruit: FruitCount,
+{
    let query = bench_index.query_parser.parse_query(query_str).unwrap();
-    let search_task = SearchTask {
-        searcher: bench_index.searcher.clone(),
-        collector,
-        query,
-    };
-    bench_group.register(task_name, move |_| black_box(search_task.run()));
-}
-
-struct SearchTask<C: Collector> {
-    searcher: Searcher,
-    collector: C,
-    query: Box<dyn Query>,
-}
-
-impl<C: Collector> SearchTask<C> {
-    #[inline(never)]
-    pub fn run(&self) -> usize {
-        self.searcher.search(&self.query, &self.collector).unwrap();
-        1
-    }
+    let searcher = bench_index.searcher.clone();
+    bench_group.register(collector_name.to_string(), move |_| {
+        black_box(searcher.search(&query, &collector).unwrap().count())
+    });
 }
--- a/benches/intersection_bench.rs
+++ b/benches/intersection_bench.rs
@@ -0,0 +1,149 @@
+// Benchmarks top-K intersection of term scorers (block_wand_intersection).
+//
+// What's measured:
+// - Conjunctive queries (+a +b, +a +b +c) with top-10 by score
+// - Varying doc-frequency balance between terms (balanced, skewed, very skewed)
+// - Realistic term frequencies (geometric distribution, mostly low)
+// - 1M-doc single segment
+//
+// Run with: cargo bench --bench intersection_bench
+
+use binggan::{black_box, BenchRunner};
+use rand::prelude::*;
+use rand::rngs::StdRng;
+use rand::SeedableRng;
+use tantivy::collector::TopDocs;
+use tantivy::query::QueryParser;
+use tantivy::schema::{Schema, TEXT};
+use tantivy::{doc, Index, ReloadPolicy, Searcher};
+
+const NUM_DOCS: usize = 1_000_000;
+
+struct BenchIndex {
+    searcher: Searcher,
+    query_parser: QueryParser,
+}
+
+/// Generate term frequency from a geometric-like distribution.
+/// Most values are 1, a few are 2-3, rarely higher.
+/// p controls the decay: higher p → more weight on tf=1.
+fn random_term_freq(rng: &mut StdRng, p: f64) -> u32 {
+    let mut tf = 1u32;
+    while tf < 10 && rng.random_bool(1.0 - p) {
+        tf += 1;
+    }
+    tf
+}
+
+/// Build an index with three terms (a, b, c) with given doc-frequency probabilities.
+/// Each term occurrence has a realistic term frequency (geometric distribution).
+/// Field length is padded with filler tokens to create varied fieldnorms.
+fn build_index(p_a: f64, p_b: f64, p_c: f64) -> BenchIndex {
+    let mut schema_builder = Schema::builder();
+    let body = schema_builder.add_text_field("body", TEXT);
+    let schema = schema_builder.build();
+    let index = Index::create_in_ram(schema);
+
+    let mut rng = StdRng::from_seed([42u8; 32]);
+
+    {
+        let mut writer = index.writer_with_num_threads(1, 500_000_000).unwrap();
+        for _ in 0..NUM_DOCS {
+            let mut tokens: Vec<String> = Vec::new();
+
+            if rng.random_bool(p_a) {
+                let tf = random_term_freq(&mut rng, 0.7);
+                for _ in 0..tf {
+                    tokens.push("aaa".to_string());
+                }
+            }
+            if rng.random_bool(p_b) {
+                let tf = random_term_freq(&mut rng, 0.7);
+                for _ in 0..tf {
+                    tokens.push("bbb".to_string());
+                }
+            }
+            if rng.random_bool(p_c) {
+                let tf = random_term_freq(&mut rng, 0.7);
+                for _ in 0..tf {
+                    tokens.push("ccc".to_string());
+                }
+            }
+
+            // Pad with filler to create varied field lengths (5-30 tokens).
+            let filler_count = rng.random_range(5u32..30u32);
+            for _ in 0..filler_count {
+                tokens.push("filler".to_string());
+            }
+
+            let text = tokens.join(" ");
+            writer.add_document(doc!(body => text)).unwrap();
+        }
+        writer.commit().unwrap();
+    }
+
+    let reader = index
+        .reader_builder()
+        .reload_policy(ReloadPolicy::Manual)
+        .try_into()
+        .unwrap();
+    let searcher = reader.searcher();
+    let query_parser = QueryParser::for_index(&index, vec![body]);
+
+    BenchIndex {
+        searcher,
+        query_parser,
+    }
+}
+
+fn main() {
+    // Scenarios: (label, p_a, p_b, p_c)
+    //
+    // "balanced":    all terms ~10% → intersection ~1% of docs
+    // "skewed":      one common (50%), one rare (2%) → intersection ~1%
+    // "very_skewed": one very common (80%), one very rare (0.5%) → intersection ~0.4%
+    // "three_balanced": three terms ~20% each → intersection ~0.8%
+    // "three_skewed":   50% / 10% / 2% → intersection ~0.1%
+    let scenarios: Vec<(&str, f64, f64, f64)> = vec![
+        ("balanced_10%_10%", 0.10, 0.10, 0.0),
+        ("skewed_50%_2%", 0.50, 0.02, 0.0),
+        ("very_skewed_80%_0.5%", 0.80, 0.005, 0.0),
+        ("three_balanced_20%_20%_20%", 0.20, 0.20, 0.20),
+        ("three_skewed_50%_10%_2%", 0.50, 0.10, 0.02),
+    ];
+
+    let mut runner = BenchRunner::new();
+
+    for (label, p_a, p_b, p_c) in &scenarios {
+        let bench_index = build_index(*p_a, *p_b, *p_c);
+
+        let mut group = runner.new_group();
+        group.set_name(format!("intersection — {label}"));
+
+        // Two-term intersection
+        if *p_a > 0.0 && *p_b > 0.0 {
+            let query_str = "+aaa +bbb";
+            let query = bench_index.query_parser.parse_query(query_str).unwrap();
+            let searcher = bench_index.searcher.clone();
+            group.register(format!("{query_str} top10"), move |_| {
+                let collector = TopDocs::with_limit(10).order_by_score();
+                black_box(searcher.search(&query, &collector).unwrap());
+                1usize
+            });
+        }
+
+        // Three-term intersection
+        if *p_c > 0.0 {
+            let query_str = "+aaa +bbb +ccc";
+            let query = bench_index.query_parser.parse_query(query_str).unwrap();
+            let searcher = bench_index.searcher.clone();
+            group.register(format!("{query_str} top10"), move |_| {
+                let collector = TopDocs::with_limit(10).order_by_score();
+                black_box(searcher.search(&query, &collector).unwrap());
+                1usize
+            });
+        }
+
+        group.run();
+    }
+}
--- a/benches/query_parser_nested.rs
+++ b/benches/query_parser_nested.rs
@@ -0,0 +1,35 @@
+// Benchmark for the query grammar parsing deeply nested queries.
+//
+// Regression guard for https://github.com/quickwit-oss/tantivy/issues/2498:
+// at depth 20/21 the old parser took 0.87 s / 1.72 s respectively because
+// `ast()` retried `occur_leaf` on backtrack, giving O(2^n) time. With the
+// fix parsing is linear and completes in microseconds.
+//
+// Run with: `cargo bench --bench query_parser_nested`.
+
+use binggan::{black_box, BenchRunner};
+use tantivy::query_grammar::parse_query;
+
+fn nested_query(depth: usize, leading_plus: bool) -> String {
+    let leading = "(".repeat(depth);
+    let trailing = ")".repeat(depth);
+    let prefix = if leading_plus { "+" } else { "" };
+    format!("{prefix}{leading}title:test{trailing}")
+}
+
+fn main() {
+    let mut runner = BenchRunner::new();
+
+    for depth in [20, 21] {
+        for leading_plus in [false, true] {
+            let query = nested_query(depth, leading_plus);
+            let label = format!(
+                "parse_nested_depth_{depth}_{}",
+                if leading_plus { "plus" } else { "plain" },
+            );
+            runner.bench_function(&label, move |_| {
+                black_box(parse_query(black_box(&query)).unwrap());
+            });
+        }
+    }
+}
--- a/benches/str_search_and_get.rs
+++ b/benches/str_search_and_get.rs
@@ -45,7 +45,7 @@ fn build_shared_indices(num_docs: usize, distribution: &str) -> BenchIndex {
        match distribution {
            "dense_random" => {
                for _doc_id in 0..num_docs {
-                    let suffix = rng.gen_range(0u64..1000u64);
+                    let suffix = rng.random_range(0u64..1000u64);
                    let str_val = format!("str_{:03}", suffix);

                    writer
@@ -71,7 +71,7 @@ fn build_shared_indices(num_docs: usize, distribution: &str) -> BenchIndex {
            }
            "sparse_random" => {
                for _doc_id in 0..num_docs {
-                    let suffix = rng.gen_range(0u64..1000000u64);
+                    let suffix = rng.random_range(0u64..1000000u64);
                    let str_val = format!("str_{:07}", suffix);

                    writer
--- a/bitpacker/Cargo.toml
+++ b/bitpacker/Cargo.toml
@@ -1,6 +1,6 @@
 [package]
 name = "tantivy-bitpacker"
-version = "0.9.0"
+version = "0.10.0"
 edition = "2024"
 authors = ["Paul Masurel <paul.masurel@gmail.com>"]
 license = "MIT"
@@ -18,5 +18,10 @@ homepage = "https://github.com/quickwit-oss/tantivy"
 bitpacking = { version = "0.9.2", default-features = false, features = ["bitpacker1x"] }

 [dev-dependencies]
+binggan = "0.17.0"
 rand = "0.9"
 proptest = "1"
+
+[[bench]]
+name = "bench"
+harness = false
--- a/bitpacker/benches/bench.rs
+++ b/bitpacker/benches/bench.rs
@@ -1,65 +1,110 @@
-#![feature(test)]
+use std::cell::RefCell;

-extern crate test;
+use binggan::{BenchRunner, black_box};
+use rand::rng;
+use rand::seq::IteratorRandom;
+use tantivy_bitpacker::{BitPacker, BitUnpacker, BlockedBitpacker};

-#[cfg(test)]
-mod tests {
-    use rand::rng;
-    use rand::seq::IteratorRandom;
-    use tantivy_bitpacker::{BitPacker, BitUnpacker, BlockedBitpacker};
-    use test::Bencher;
+fn create_bitpacked_data(bit_width: u8, num_els: u32) -> Vec<u8> {
+    let mut bitpacker = BitPacker::new();
+    let mut buffer = Vec::new();
+    for _ in 0..num_els {
+        bitpacker.write(0u64, bit_width, &mut buffer).unwrap();
+        bitpacker.flush(&mut buffer).unwrap();
+    }
+    buffer
+}

-    #[inline(never)]
-    fn create_bitpacked_data(bit_width: u8, num_els: u32) -> Vec<u8> {
-        let mut bitpacker = BitPacker::new();
-        let mut buffer = Vec::new();
-        for _ in 0..num_els {
-            // the values do not matter.
-            bitpacker.write(0u64, bit_width, &mut buffer).unwrap();
-            bitpacker.flush(&mut buffer).unwrap();
+const N: usize = 100_000;
+const MAX_VAL: u64 = 1_000;
+const BIT_WIDTH: u8 = 10; // 2^10 = 1024 > MAX_VAL
+
+fn create_packed_data() -> (BitUnpacker, Vec<u8>) {
+    let mut bitpacker = BitPacker::new();
+    let mut data = Vec::new();
+    for i in 0..N as u64 {
+        let val = i * MAX_VAL / N as u64;
+        bitpacker.write(val, BIT_WIDTH, &mut data).unwrap();
+    }
+    bitpacker.close(&mut data).unwrap();
+    (BitUnpacker::new(BIT_WIDTH), data)
+}
+
+fn bench_bitpacking() {
+    let mut runner = BenchRunner::new();
+    let bit_width = 3;
+    let num_els = 1_000_000u32;
+    let bit_unpacker = BitUnpacker::new(bit_width);
+    let data = create_bitpacked_data(bit_width, num_els);
+    let idxs: Vec<u32> = (0..num_els).choose_multiple(&mut rng(), 100_000);
+    runner.bench_function("bitpacking_read", move |_| {
+        let mut out = 0u64;
+        for &idx in &idxs {
+            out = out.wrapping_add(bit_unpacker.get(idx, &data[..]));
        }
-        buffer
-    }
+        black_box(out);
+    });
+}

-    #[bench]
-    fn bench_bitpacking_read(b: &mut Bencher) {
-        let bit_width = 3;
-        let num_els = 1_000_000u32;
-        let bit_unpacker = BitUnpacker::new(bit_width);
-        let data = create_bitpacked_data(bit_width, num_els);
-        let idxs: Vec<u32> = (0..num_els).choose_multiple(&mut rng(), 100_000);
-        b.iter(|| {
-            let mut out = 0u64;
-            for &idx in &idxs {
-                out = out.wrapping_add(bit_unpacker.get(idx, &data[..]));
-            }
-            out
-        });
+fn bench_blocked_bitpacker() {
+    let mut runner = BenchRunner::new();
+    let mut blocked_bitpacker = BlockedBitpacker::new();
+    for val in 0..=21500 {
+        blocked_bitpacker.add(val * val);
    }
-
-    #[bench]
-    fn bench_blockedbitp_read(b: &mut Bencher) {
+    runner.bench_function("blockedbitp_read", move |_| {
+        let mut out = 0u64;
+        for val in 0..=21500 {
+            out = out.wrapping_add(blocked_bitpacker.get(val));
+        }
+        black_box(out);
+    });
+    runner.bench_function("blockedbitp_create", |_| {
        let mut blocked_bitpacker = BlockedBitpacker::new();
        for val in 0..=21500 {
            blocked_bitpacker.add(val * val);
        }
-        b.iter(|| {
-            let mut out = 0u64;
-            for val in 0..=21500 {
-                out = out.wrapping_add(blocked_bitpacker.get(val));
-            }
-            out
-        });
-    }
-
-    #[bench]
-    fn bench_blockedbitp_create(b: &mut Bencher) {
-        b.iter(|| {
-            let mut blocked_bitpacker = BlockedBitpacker::new();
-            for val in 0..=21500 {
-                blocked_bitpacker.add(val * val);
-            }
-            blocked_bitpacker
-        });
-    }
+        black_box(blocked_bitpacker);
+    });
+}
+
+fn bench_filter_vec() {
+    let mut runner = BenchRunner::new();
+
+    let (unpacker, data) = create_packed_data();
+    let positions = RefCell::new(Vec::with_capacity(N));
+    runner.bench_function("filter_vec_dense", move |_| {
+        unpacker.get_ids_for_value_range(
+            250..=750,
+            0..N as u32,
+            &data,
+            &mut positions.borrow_mut(),
+        );
+        black_box(positions.borrow().len());
+    });
+
+    let (unpacker, data) = create_packed_data();
+    let positions = RefCell::new(Vec::with_capacity(N));
+    runner.bench_function("filter_vec_sparse", move |_| {
+        unpacker.get_ids_for_value_range(0..=50, 0..N as u32, &data, &mut positions.borrow_mut());
+        black_box(positions.borrow().len());
+    });
+
+    let (unpacker, data) = create_packed_data();
+    let positions = RefCell::new(Vec::with_capacity(N));
+    runner.bench_function("filter_vec_full", move |_| {
+        unpacker.get_ids_for_value_range(
+            0..=MAX_VAL,
+            0..N as u32,
+            &data,
+            &mut positions.borrow_mut(),
+        );
+        black_box(positions.borrow().len());
+    });
+}
+
+fn main() {
+    bench_bitpacking();
+    bench_blocked_bitpacker();
+    bench_filter_vec();
 }
--- a/bitpacker/src/filter_vec/mod.rs
+++ b/bitpacker/src/filter_vec/mod.rs
@@ -1,8 +1,17 @@
+#[cfg(all(target_arch = "aarch64", not(target_vendor = "apple")))]
+use std::arch::is_aarch64_feature_detected;
 use std::ops::RangeInclusive;

 #[cfg(target_arch = "x86_64")]
 mod avx2;

+#[cfg(target_arch = "aarch64")]
+mod neon;
+
+// SVE intrinsics are not exposed on aarch64-apple-darwin.
+#[cfg(all(target_arch = "aarch64", not(target_vendor = "apple")))]
+mod sve;
+
 mod scalar;

 #[derive(Clone, Copy, Eq, PartialEq, Debug)]
@@ -10,6 +19,10 @@ mod scalar;
 enum FilterImplPerInstructionSet {
    #[cfg(target_arch = "x86_64")]
    AVX2 = 0u8,
+    #[cfg(all(target_arch = "aarch64", not(target_vendor = "apple")))]
+    SVE = 3u8,
+    #[cfg(target_arch = "aarch64")]
+    Neon = 2u8,
    Scalar = 1u8,
 }

@@ -19,29 +32,57 @@ impl FilterImplPerInstructionSet {
        match *self {
            #[cfg(target_arch = "x86_64")]
            FilterImplPerInstructionSet::AVX2 => is_x86_feature_detected!("avx2"),
+            #[cfg(all(target_arch = "aarch64", not(target_vendor = "apple")))]
+            FilterImplPerInstructionSet::SVE => is_aarch64_feature_detected!("sve"),
+            // TIL Neon is required on aarch 64.
+            #[cfg(target_arch = "aarch64")]
+            FilterImplPerInstructionSet::Neon => true,
            FilterImplPerInstructionSet::Scalar => true,
        }
    }
 }

-// List of available implementation in preferred order.
+// List of available implementations in preferred order.
 #[cfg(target_arch = "x86_64")]
 const IMPLS: [FilterImplPerInstructionSet; 2] = [
    FilterImplPerInstructionSet::AVX2,
    FilterImplPerInstructionSet::Scalar,
 ];

-#[cfg(not(target_arch = "x86_64"))]
+// Non-Apple aarch64: try SVE, NEON, Scalar.
+#[cfg(all(target_arch = "aarch64", not(target_vendor = "apple")))]
+const IMPLS: [FilterImplPerInstructionSet; 3] = [
+    FilterImplPerInstructionSet::SVE,
+    FilterImplPerInstructionSet::Neon,
+    FilterImplPerInstructionSet::Scalar,
+];
+
+// Apple aarch64 (M-series): SVE not available; use NEON or Scalar.
+#[cfg(all(target_arch = "aarch64", target_vendor = "apple"))]
+const IMPLS: [FilterImplPerInstructionSet; 2] = [
+    FilterImplPerInstructionSet::Neon,
+    FilterImplPerInstructionSet::Scalar,
+];
+
+#[cfg(not(any(target_arch = "x86_64", target_arch = "aarch64")))]
 const IMPLS: [FilterImplPerInstructionSet; 1] = [FilterImplPerInstructionSet::Scalar];

 impl FilterImplPerInstructionSet {
    #[inline]
-    #[allow(unused_variables)] // on non-x86_64, code is unused.
+    #[allow(unused_variables)]
    fn from(code: u8) -> FilterImplPerInstructionSet {
        #[cfg(target_arch = "x86_64")]
        if code == FilterImplPerInstructionSet::AVX2 as u8 {
            return FilterImplPerInstructionSet::AVX2;
        }
+        #[cfg(all(target_arch = "aarch64", not(target_vendor = "apple")))]
+        if code == FilterImplPerInstructionSet::SVE as u8 {
+            return FilterImplPerInstructionSet::SVE;
+        }
+        #[cfg(target_arch = "aarch64")]
+        if code == FilterImplPerInstructionSet::Neon as u8 {
+            return FilterImplPerInstructionSet::Neon;
+        }
        FilterImplPerInstructionSet::Scalar
    }

@@ -50,6 +91,13 @@ impl FilterImplPerInstructionSet {
        match self {
            #[cfg(target_arch = "x86_64")]
            FilterImplPerInstructionSet::AVX2 => avx2::filter_vec_in_place(range, offset, output),
+            #[cfg(all(target_arch = "aarch64", not(target_vendor = "apple")))]
+            // SAFETY: SVE availability was verified by is_available() before selecting this impl.
+            FilterImplPerInstructionSet::SVE => unsafe {
+                sve::filter_vec_in_place(range, offset, output)
+            },
+            #[cfg(target_arch = "aarch64")]
+            FilterImplPerInstructionSet::Neon => neon::filter_vec_in_place(range, offset, output),
            FilterImplPerInstructionSet::Scalar => {
                scalar::filter_vec_in_place(range, offset, output)
            }
@@ -57,6 +105,12 @@ impl FilterImplPerInstructionSet {
    }
 }

+fn available_impls() -> impl Iterator<Item = FilterImplPerInstructionSet> {
+    IMPLS
+        .into_iter()
+        .filter(FilterImplPerInstructionSet::is_available)
+}
+
 #[inline]
 fn get_best_available_instruction_set() -> FilterImplPerInstructionSet {
    use std::sync::atomic::{AtomicU8, Ordering};
@@ -64,10 +118,7 @@ fn get_best_available_instruction_set() -> FilterImplPerInstructionSet {
    let instruction_set_byte: u8 = INSTRUCTION_SET_BYTE.load(Ordering::Relaxed);
    if instruction_set_byte == u8::MAX {
        // Let's initialize the instruction set and cache it.
-        let instruction_set = IMPLS
-            .into_iter()
-            .find(FilterImplPerInstructionSet::is_available)
-            .unwrap();
+        let instruction_set = available_impls().next().unwrap();
        INSTRUCTION_SET_BYTE.store(instruction_set as u8, Ordering::Relaxed);
        return instruction_set;
    }
@@ -80,12 +131,12 @@ pub fn filter_vec_in_place(range: RangeInclusive<u32>, offset: u32, output: &mut

 #[cfg(test)]
 mod tests {
+    use proptest::strategy::Strategy;
+
    use super::*;

    #[test]
    fn test_get_best_available_instruction_set() {
-        // This does not test much unfortunately.
-        // We just make sure the function returns without crashing and returns the same result.
        let instruction_set = get_best_available_instruction_set();
        assert_eq!(get_best_available_instruction_set(), instruction_set);
    }
@@ -102,6 +153,31 @@ mod tests {
        }
    }

+    #[cfg(all(target_arch = "aarch64", not(target_vendor = "apple")))]
+    #[test]
+    fn test_instruction_set_to_code_from_code() {
+        for instruction_set in [
+            FilterImplPerInstructionSet::SVE,
+            FilterImplPerInstructionSet::Neon,
+            FilterImplPerInstructionSet::Scalar,
+        ] {
+            let code = instruction_set as u8;
+            assert_eq!(instruction_set, FilterImplPerInstructionSet::from(code));
+        }
+    }
+
+    #[cfg(all(target_arch = "aarch64", target_vendor = "apple"))]
+    #[test]
+    fn test_instruction_set_to_code_from_code() {
+        for instruction_set in [
+            FilterImplPerInstructionSet::Neon,
+            FilterImplPerInstructionSet::Scalar,
+        ] {
+            let code = instruction_set as u8;
+            assert_eq!(instruction_set, FilterImplPerInstructionSet::from(code));
+        }
+    }
+
    fn test_filter_impl_empty_aux(filter_impl: FilterImplPerInstructionSet) {
        let mut output = vec![];
        filter_impl.filter_vec_in_place(0..=u32::MAX, 0, &mut output);
@@ -126,11 +202,20 @@ mod tests {
        assert_eq!(&output, &[1, 3, 4, 5, 6, 7, 8]);
    }

+    fn test_filter_impl_empty_range_aux(filter_impl: FilterImplPerInstructionSet) {
+        // start > end: RangeInclusive::contains always returns false; output must be empty.
+        // The SVE path's wrapping_sub would otherwise produce a huge range_width.
+        let mut output = vec![3, 2, 1, 5, 11, 2, 5, 10, 2];
+        filter_impl.filter_vec_in_place(10..=5, 0, &mut output);
+        assert_eq!(&output, &[]);
+    }
+
    fn test_filter_impl_test_suite(filter_impl: FilterImplPerInstructionSet) {
        test_filter_impl_empty_aux(filter_impl);
        test_filter_impl_simple_aux(filter_impl);
        test_filter_impl_simple_aux_shifted(filter_impl);
        test_filter_impl_simple_outside_i32_range(filter_impl);
+        test_filter_impl_empty_range_aux(filter_impl);
    }

    #[test]
@@ -141,25 +226,60 @@ mod tests {
        }
    }

+    #[test]
+    #[cfg(all(target_arch = "aarch64", not(target_vendor = "apple")))]
+    fn test_filter_implementation_sve() {
+        if FilterImplPerInstructionSet::SVE.is_available() {
+            test_filter_impl_test_suite(FilterImplPerInstructionSet::SVE);
+        }
+    }
+
+    #[test]
+    #[cfg(target_arch = "aarch64")]
+    fn test_filter_implementation_neon() {
+        test_filter_impl_test_suite(FilterImplPerInstructionSet::Neon);
+    }
+
    #[test]
    fn test_filter_implementation_scalar() {
        test_filter_impl_test_suite(FilterImplPerInstructionSet::Scalar);
    }

-    #[cfg(target_arch = "x86_64")]
+    fn max_val_strategy() -> impl proptest::strategy::Strategy<Value = u32> {
+        proptest::prop_oneof![
+            0u32..10u32,
+            255u32..258u32,
+            proptest::prelude::Just(1u32 << 25),
+            proptest::prelude::Just(u32::MAX - 1),
+            proptest::prelude::Just(u32::MAX),
+        ]
+    }
+
+    fn vals_strategy() -> impl proptest::strategy::Strategy<Value = Vec<u32>> {
+        proptest::prop_oneof![
+            proptest::collection::vec(proptest::prelude::any::<u32>(), 0..300),
+            max_val_strategy()
+                .prop_flat_map(|max_val| { proptest::collection::vec(0..=max_val, 0..300) })
+        ]
+    }
+
    proptest::proptest! {
        #[test]
-        fn test_filter_compare_scalar_and_avx2_impl_proptest(
-            start in proptest::prelude::any::<u32>(),
-            end in proptest::prelude::any::<u32>(),
+        fn test_filter_compare_scalar_and_impls_impl_proptest(
+            start in 0u32..400u32,
+            end in 0u32..400u32,
            offset in 0u32..2u32,
-            mut vals in proptest::collection::vec(0..u32::MAX, 0..30)) {
-            if FilterImplPerInstructionSet::AVX2.is_available() {
-                let mut vals_clone = vals.clone();
-                FilterImplPerInstructionSet::AVX2.filter_vec_in_place(start..=end, offset, &mut vals);
-                FilterImplPerInstructionSet::Scalar.filter_vec_in_place(start..=end, offset, &mut vals_clone);
-                assert_eq!(&vals, &vals_clone);
-            }
+            vals in vals_strategy()) {
+                for implementation in available_impls() {
+                    if implementation == FilterImplPerInstructionSet::Scalar {
+                        continue;
+                    }
+                    let mut impl_output = vals.clone();
+                    let mut scalar_output = vals.clone();
+                    implementation.filter_vec_in_place(start..=end, offset, &mut impl_output);
+                    FilterImplPerInstructionSet::Scalar.filter_vec_in_place(start..=end, offset, &mut scalar_output);
+                    assert_eq!(&impl_output, &scalar_output);
+                }
       }
    }
 }
--- a/bitpacker/src/filter_vec/neon.rs
+++ b/bitpacker/src/filter_vec/neon.rs
@@ -0,0 +1,118 @@
+use std::arch::aarch64::*;
+use std::ops::RangeInclusive;
+
+const NUM_LANES: usize = 4;
+
+// Compacts matching lanes to the front using a byte-level shuffle.
+// `mask` is a 4-bit value: bit k=1 means lane k should appear in the output.
+#[inline]
+#[target_feature(enable = "neon")]
+unsafe fn compact(data: uint32x4_t, mask: u8) -> uint32x4_t {
+    unsafe {
+        // SAFETY: mask is always in [0, 15] by construction (max sum of [1,2,4,8]).
+        // BYTE_SHUFFLE_TABLE has 16 entries, so this is always in bounds.
+        let shuffle = BYTE_SHUFFLE_TABLE.get_unchecked(mask as usize);
+        let shuffle_vec = vld1q_u8(shuffle.as_ptr());
+        vreinterpretq_u32_u8(vqtbl1q_u8(vreinterpretq_u8_u32(data), shuffle_vec))
+    }
+}
+
+// Safe (not unsafe) because NEON is mandatory on aarch64: no runtime feature check needed.
+#[inline(never)]
+pub fn filter_vec_in_place(range: RangeInclusive<u32>, offset: u32, output: &mut Vec<u32>) {
+    let num_words = output.len() / NUM_LANES;
+    let mut output_len = unsafe {
+        filter_vec_neon_aux(
+            output.as_ptr(),
+            range.clone(),
+            output.as_mut_ptr(),
+            offset,
+            num_words,
+        )
+    };
+    let remainder_start = num_words * NUM_LANES;
+    for i in remainder_start..output.len() {
+        let val = output[i];
+        output[output_len] = offset + i as u32;
+        output_len += if range.contains(&val) { 1 } else { 0 };
+    }
+    output.truncate(output_len);
+}
+
+#[target_feature(enable = "neon")]
+unsafe fn filter_vec_neon_aux(
+    input: *const u32,
+    range: RangeInclusive<u32>,
+    output: *mut u32,
+    offset: u32,
+    num_words: usize,
+) -> usize {
+    unsafe {
+        let mut input = input;
+        let mut output_tail = output;
+        let range_start_simd = vdupq_n_u32(*range.start());
+        let range_end_simd = vdupq_n_u32(*range.end());
+        let mut ids = vld1q_u32([offset, offset + 1, offset + 2, offset + 3].as_ptr());
+        let shift = vdupq_n_u32(NUM_LANES as u32);
+        let bit_weights = vld1q_u32([1u32, 2, 4, 8].as_ptr());
+
+        for _ in 0..num_words {
+            let word = vld1q_u32(input);
+
+            // Unsigned compares: CMHS (compare higher or same) tests `word >= start`
+            // and `end >= word`. ANDing both gives the inside-range mask directly,
+            // which is cheaper than computing `outside` and then negating.
+            let ge_start = vcgeq_u32(word, range_start_simd);
+            let le_end = vcleq_u32(word, range_end_simd);
+            // inside[k] = 0xFFFFFFFF if val[k] is in range, 0 otherwise.
+            let inside = vandq_u32(ge_start, le_end);
+
+            // Build the 4-bit mask: AND bit_weights with the inside lane mask, so each
+            // inside lane contributes its bit_weight (1, 2, 4, or 8). Summing yields the
+            // 4-bit mask in one addv.
+            let inside_bits = vandq_u32(bit_weights, inside);
+            let mask = vaddvq_u32(inside_bits) as u8;
+            // mask is mathematically bounded: max value is 1+2+4+8=15 (all lanes match)
+            debug_assert!(mask <= 15, "mask must fit in 4 bits: {}", mask);
+
+            // Count of matching lanes = popcount(mask). Derives the count directly from
+            // the mask instead of running a parallel SIMD reduction over `outside`.
+            let added_len = mask.count_ones() as usize;
+
+            // Safe because mask is guaranteed to be in [0, 15]
+            let filtered_ids = compact(ids, mask);
+            vst1q_u32(output_tail, filtered_ids);
+            output_tail = output_tail.add(added_len);
+            ids = vaddq_u32(ids, shift);
+            input = input.add(NUM_LANES);
+        }
+
+        output_tail.offset_from(output) as usize
+    }
+}
+
+// Byte shuffle patterns to compact matching lanes to the front of the vector.
+// Index is a 4-bit mask: bit k=1 means lane k (bytes 4k..4k+3) is in-range.
+// The j-th set bit determines which input lane goes to output position j.
+const BYTE_SHUFFLE_TABLE: [[u8; 16]; 16] = [
+    [
+        16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16,
+    ], // 0b0000: none
+    [0, 1, 2, 3, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16], // 0b0001: lane 0
+    [4, 5, 6, 7, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16], // 0b0010: lane 1
+    [0, 1, 2, 3, 4, 5, 6, 7, 16, 16, 16, 16, 16, 16, 16, 16],     // 0b0011: lanes 0,1
+    [8, 9, 10, 11, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16], // 0b0100: lane 2
+    [0, 1, 2, 3, 8, 9, 10, 11, 16, 16, 16, 16, 16, 16, 16, 16],   // 0b0101: lanes 0,2
+    [4, 5, 6, 7, 8, 9, 10, 11, 16, 16, 16, 16, 16, 16, 16, 16],   // 0b0110: lanes 1,2
+    [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 16, 16, 16, 16],       // 0b0111: lanes 0,1,2
+    [
+        12, 13, 14, 15, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16,
+    ], // 0b1000: lane 3
+    [0, 1, 2, 3, 12, 13, 14, 15, 16, 16, 16, 16, 16, 16, 16, 16], // 0b1001: lanes 0,3
+    [4, 5, 6, 7, 12, 13, 14, 15, 16, 16, 16, 16, 16, 16, 16, 16], // 0b1010: lanes 1,3
+    [0, 1, 2, 3, 4, 5, 6, 7, 12, 13, 14, 15, 16, 16, 16, 16],     // 0b1011: lanes 0,1,3
+    [8, 9, 10, 11, 12, 13, 14, 15, 16, 16, 16, 16, 16, 16, 16, 16], // 0b1100: lanes 2,3
+    [0, 1, 2, 3, 8, 9, 10, 11, 12, 13, 14, 15, 16, 16, 16, 16],   // 0b1101: lanes 0,2,3
+    [4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 16, 16, 16],   // 0b1110: lanes 1,2,3
+    [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15],       // 0b1111: all lanes
+];
--- a/bitpacker/src/filter_vec/sve.rs
+++ b/bitpacker/src/filter_vec/sve.rs
@@ -0,0 +1,260 @@
+use std::ops::RangeInclusive;
+
+// SVE vector length (in u32 lanes) is not a compile-time constant; query at runtime.
+// Safe to call only when SVE is confirmed available via is_aarch64_feature_detected!("sve").
+#[target_feature(enable = "sve")]
+unsafe fn num_lanes() -> usize {
+    let vl: usize;
+    unsafe {
+        core::arch::asm!(
+            "cntw {vl}",
+            vl = out(reg) vl,
+            options(nostack, nomem, preserves_flags),
+        );
+    }
+    vl
+}
+
+// SAFETY: caller must ensure SVE is available (checked via is_aarch64_feature_detected!("sve")).
+// Unlike NEON, SVE is optional on aarch64 and not guaranteed by the target architecture.
+pub unsafe fn filter_vec_in_place(range: RangeInclusive<u32>, offset: u32, output: &mut Vec<u32>) {
+    if range.start() > range.end() {
+        output.clear();
+        return;
+    }
+    let vl = unsafe { num_lanes() };
+    let num_words = output.len() / vl;
+    let range_start = *range.start();
+    // Unsigned subtraction trick: val ∈ [lo, hi] ↔ (val - lo) ≤ᵤ (hi - lo).
+    // Values below lo wrap around to large u32, so the single unsigned ≤ excludes them.
+    let range_width = range.end().wrapping_sub(range_start);
+    let mut output_len = unsafe {
+        filter_vec_sve_aux(
+            output.as_ptr(),
+            range_start,
+            range_width,
+            output.as_mut_ptr(),
+            offset,
+            num_words,
+            vl,
+        )
+    };
+    let remainder_start = num_words * vl;
+    for i in remainder_start..output.len() {
+        let val = output[i];
+        output[output_len] = offset + i as u32;
+        output_len += if range.contains(&val) { 1 } else { 0 };
+    }
+    output.truncate(output_len);
+}
+
+// Register allocation for the asm! blocks:
+//   z0        ids_a (index vector for first half of each pair, advances by step2 each iter)
+//   z1        range_width broadcast
+//   z2        range_start broadcast
+//   z3        step2 broadcast (2 * vl)
+//   z4        ids_b (index vector for second half, = ids_a + step, advances by step2)
+//   z5        scratch: loaded word_a, then compacted_a
+//   z6        scratch: loaded word_b, then compacted_b
+//   p0        all-true predicate (ptrue p0.s)
+//   p1        in-range mask for word_a
+//   p2        in-range mask for word_b
+#[target_feature(enable = "sve")]
+unsafe fn filter_vec_sve_aux(
+    input: *const u32,
+    range_start: u32,
+    range_width: u32,
+    output: *mut u32,
+    offset: u32,
+    num_words: usize,
+    vl: usize,
+) -> usize {
+    let num_pairs = num_words / 2;
+    let mut input_ptr = input;
+    let mut output_tail = output;
+
+    if num_pairs > 0 {
+        unsafe {
+            // We rely on asm! because the SVE intrinsics are not available in stable Rust.
+            // The code that follows was generated by Rustc nightly based on the intrinsics version
+            // at the bottom of this file.
+            core::arch::asm!(
+                // --- Setup ---
+                // All-true predicate for 32-bit lanes.
+                "ptrue p0.s",
+                // ids_a = [offset, offset+1, offset+2, ...]
+                "index z0.s, {offset:w}, #1",
+                // Broadcast scalars into SVE vectors.
+                "mov z1.s, {range_width:w}",
+                "mov z2.s, {range_start:w}",
+                // vl_gpr = number of 32-bit lanes (cntw).
+                "cntw {vl_gpr}",
+                // step2_bytes will first hold 2*vl (for the step2 vector), then 2*VL in bytes.
+                "lsl {step2_bytes}, {vl_gpr}, #1",
+                // z4 = step = [vl, vl, ...]; will become ids_b after the add below.
+                "mov z4.s, {vl_gpr:w}",
+                // z3 = step2 = [2*vl, 2*vl, ...], used to advance both id vectors each iter.
+                "mov z3.s, {step2_bytes:w}",
+                // Repurpose step2_bytes to hold the byte stride for advancing the input pointer
+                // by two full SVE vectors per iteration.
+                "rdvl {step2_bytes}, #2",
+                // ids_b = ids_a + step = [offset+vl, offset+vl+1, ...]
+                "add z4.s, z0.s, z4.s",
+
+                // --- Main loop: process two SVE vectors (ids_a and ids_b) per iteration ---
+                "0:",
+                // Load two consecutive SVE vectors from input.
+                "ld1w {{z5.s}}, p0/z, [{input}]",
+                "ld1w {{z6.s}}, p0/z, [{input}, #1, mul vl]",
+                // Advance input pointer by 2 * VL bytes.
+                "add {input}, {input}, {step2_bytes}",
+                // Unsigned shift: subtract range_start so in-range check becomes a single cmpu ≤.
+                "sub z5.s, z5.s, z2.s",
+                "sub z6.s, z6.s, z2.s",
+                // in_range: shifted value ≤ range_width  (unsigned, so values below lo also fail).
+                "cmphs p1.s, p0/z, z1.s, z5.s",
+                "cmphs p2.s, p0/z, z1.s, z6.s",
+                // Count matching lanes; both cntp calls have independent inputs for OOO parallelism.
+                "cntp {cnt_a}, p0, p1.s",
+                "compact z5.s, p1, z0.s",
+                "compact z6.s, p2, z4.s",
+                "cntp {cnt_b}, p0, p2.s",
+                // Advance id vectors for the next iteration.
+                "add z0.s, z0.s, z3.s",
+                "add z4.s, z4.s, z3.s",
+                // Store compacted ids. Only the first cnt_a / cnt_b slots are valid; the rest
+                // will be overwritten by subsequent iterations before the final truncate.
+                "str z5, [{out}]",
+                "st1w {{z6.s}}, p0, [{out}, {cnt_a}, lsl #2]",
+                "add {out}, {out}, {cnt_a}, lsl #2",
+                "add {out}, {out}, {cnt_b}, lsl #2",
+                "subs {pairs}, {pairs}, #1",
+                "b.ne 0b",
+
+                // --- Operands ---
+                input       = inout(reg) input_ptr,
+                out         = inout(reg) output_tail,
+                pairs       = inout(reg) num_pairs => _,
+                offset      = in(reg) offset,
+                range_start = in(reg) range_start,
+                range_width = in(reg) range_width,
+                vl_gpr      = out(reg) _,
+                step2_bytes = out(reg) _,
+                cnt_a       = out(reg) _,
+                cnt_b       = out(reg) _,
+                out("p0") _, out("p1") _, out("p2") _,
+                out("v0") _, out("v1") _, out("v2") _, out("v3") _,
+                out("v4") _, out("v5") _, out("v6") _,
+                options(nostack),
+            );
+        }
+    }
+
+    // Handle an odd trailing vector.
+    if num_words % 2 == 1 {
+        // ids_a for the odd word starts at offset + num_pairs * 2 * vl.
+        // input_ptr was advanced by the main loop and now points at the odd word.
+        let odd_offset =
+            offset.wrapping_add((num_pairs as u32).wrapping_mul(2).wrapping_mul(vl as u32));
+        unsafe {
+            core::arch::asm!(
+                "ptrue p0.s",
+                "index z0.s, {odd_offset:w}, #1",
+                "mov z1.s, {range_width:w}",
+                "mov z2.s, {range_start:w}",
+                "ld1w {{z3.s}}, p0/z, [{input}]",
+                "sub z3.s, z3.s, z2.s",
+                "cmphs p1.s, p0/z, z1.s, z3.s",
+                "cntp {cnt}, p0, p1.s",
+                "compact z0.s, p1, z0.s",
+                "str z0, [{out}]",
+                "add {out}, {out}, {cnt}, lsl #2",
+                odd_offset  = in(reg) odd_offset,
+                range_width = in(reg) range_width,
+                range_start = in(reg) range_start,
+                input       = in(reg) input_ptr,
+                out         = inout(reg) output_tail,
+                cnt         = out(reg) _,
+                out("p0") _, out("p1") _,
+                out("v0") _, out("v1") _, out("v2") _, out("v3") _,
+                options(nostack),
+            );
+        }
+    }
+
+    unsafe { output_tail.offset_from(output) as usize }
+}
+
+// SVE implements with intrinsics.
+//
+// #[target_feature(enable = "sve")]
+// unsafe fn filter_vec_sve_aux(
+//     input: *const u32,
+//     range_start: u32,
+//     range_width: u32,
+//     output: *mut u32,
+//     offset: u32,
+//     num_words: usize,
+//     vl: usize,
+// ) -> usize {
+//     unsafe {
+//         let all_true = svptrue_b32();
+//         let range_start_simd = svdup_n_u32(range_start);
+//         let range_width_simd = svdup_n_u32(range_width);
+//         // ids_a covers [offset .. offset+vl), ids_b covers the next vl ids.
+//         // Keeping them separate breaks the loop-carried dependency through ids so
+//         // both compact/cntp chains are fully independent within each unrolled body.
+//         let mut ids_a = svindex_u32(offset, 1);
+//         let step = svdup_n_u32(vl as u32);
+//         let step2 = svdup_n_u32(2 * vl as u32);
+//         let mut ids_b = svadd_u32_x(all_true, ids_a, step);
+
+//         let mut input = input;
+//         let mut output_tail = output;
+
+//         // Unrolled ×2: both cntp calls have independent inputs and execute in parallel.
+//         // The two output_tail updates are sequential but together cost 4+1+1=6 cy per
+//         // pair vs 5+5=10 cy for two scalar iterations, breaking the cntp latency chain.
+//         let num_pairs = num_words / 2;
+//         for _ in 0..num_pairs {
+//             let word_a = svld1_u32(all_true, input);
+//             let word_b = svld1_u32(all_true, input.add(vl));
+
+//             let shifted_a = svsub_u32_x(all_true, word_a, range_start_simd);
+//             let shifted_b = svsub_u32_x(all_true, word_b, range_start_simd);
+
+//             let in_range_a = svcmple_u32(all_true, shifted_a, range_width_simd);
+//             let in_range_b = svcmple_u32(all_true, shifted_b, range_width_simd);
+
+//             let compacted_a = svcompact_u32(in_range_a, ids_a);
+//             let compacted_b = svcompact_u32(in_range_b, ids_b);
+//             // cntp_a and cntp_b have independent inputs: OOO engine issues them in parallel.
+//             let added_len_a = svcntp_b32(all_true, in_range_a) as usize;
+//             let added_len_b = svcntp_b32(all_true, in_range_b) as usize;
+
+//             // Write the full vector — only the first added_len slots are valid.
+//             // Subsequent iterations overwrite the trailing zeros before truncate.
+//             svst1_u32(all_true, output_tail, compacted_a);
+//             output_tail = output_tail.add(added_len_a);
+//             svst1_u32(all_true, output_tail, compacted_b);
+//             output_tail = output_tail.add(added_len_b);
+
+//             ids_a = svadd_u32_x(all_true, ids_a, step2);
+//             ids_b = svadd_u32_x(all_true, ids_b, step2);
+//             input = input.add(2 * vl);
+//         }
+
+//         // Handle an odd trailing word.
+//         if num_words % 2 == 1 {
+//             let word = svld1_u32(all_true, input);
+//             let shifted = svsub_u32_x(all_true, word, range_start_simd);
+//             let in_range = svcmple_u32(all_true, shifted, range_width_simd);
+//             let added_len = svcntp_b32(all_true, in_range) as usize;
+//             let compacted_ids = svcompact_u32(in_range, ids_a);
+//             svst1_u32(all_true, output_tail, compacted_ids);
+//             output_tail = output_tail.add(added_len);
+//         }
+
+//         output_tail.offset_from(output) as usize
+//     }
+// }
--- a/columnar/Cargo.toml
+++ b/columnar/Cargo.toml
@@ -1,6 +1,6 @@
 [package]
 name = "tantivy-columnar"
-version = "0.6.0"
+version = "0.7.0"
 edition = "2024"
 license = "MIT"
 homepage = "https://github.com/quickwit-oss/tantivy"
@@ -12,10 +12,10 @@ categories = ["database-implementations", "data-structures", "compression"]
 itertools = "0.14.0"
 fastdivide = "0.4.0"

-stacker = { version= "0.6", path = "../stacker", package="tantivy-stacker"}
-sstable = { version= "0.6", path = "../sstable", package = "tantivy-sstable" }
-common = { version= "0.10", path = "../common", package = "tantivy-common" }
-tantivy-bitpacker = { version= "0.9", path = "../bitpacker/" }
+stacker = { version= "0.7", path = "../stacker", package="tantivy-stacker"}
+sstable = { version= "0.7", path = "../sstable", package = "tantivy-sstable" }
+common = { version= "0.11", path = "../common", package = "tantivy-common" }
+tantivy-bitpacker = { version= "0.10", path = "../bitpacker/" }
 serde = "1.0.152"
 downcast-rs = "2.0.1"

@@ -23,7 +23,7 @@ downcast-rs = "2.0.1"
 proptest = "1"
 more-asserts = "0.3.1"
 rand = "0.9"
-binggan = "0.14.0"
+binggan = "0.17.0"

 [[bench]]
 name = "bench_merge"
--- a/columnar/src/block_accessor.rs
+++ b/columnar/src/block_accessor.rs
@@ -33,14 +33,14 @@ impl<T: PartialOrd + Copy + std::fmt::Debug + Send + Sync + 'static + Default>
        &mut self,
        docs: &[u32],
        accessor: &Column<T>,
-        missing: Option<T>,
+        missing_opt: Option<T>,
    ) {
        self.fetch_block(docs, accessor);
        // no missing values
        if accessor.index.get_cardinality().is_full() {
            return;
        }
-        let Some(missing) = missing else {
+        let Some(missing) = missing_opt else {
            return;
        };

@@ -58,6 +58,78 @@ impl<T: PartialOrd + Copy + std::fmt::Debug + Send + Sync + 'static + Default>
        }
    }

+    /// Like `fetch_block_with_missing`, but deduplicates (doc_id, value) pairs
+    /// so that each unique value per document is returned only once.
+    ///
+    /// This is necessary for correct document counting in aggregations,
+    /// where multi-valued fields can produce duplicate entries that inflate counts.
+    #[inline]
+    pub fn fetch_block_with_missing_unique_per_doc(
+        &mut self,
+        docs: &[u32],
+        accessor: &Column<T>,
+        missing: Option<T>,
+    ) where
+        T: Ord,
+    {
+        self.fetch_block_with_missing(docs, accessor, missing);
+        if accessor.index.get_cardinality().is_multivalue() {
+            self.dedup_docid_val_pairs();
+        }
+    }
+
+    /// Removes duplicate (doc_id, value) pairs from the caches.
+    ///
+    /// After `fetch_block`, entries are sorted by doc_id, but values within
+    /// the same doc may not be sorted (e.g. `(0,1), (0,2), (0,1)`).
+    /// We group consecutive entries by doc_id, sort values within each group
+    /// if it has more than 2 elements, then deduplicate adjacent pairs.
+    ///
+    /// Skips entirely if no doc_id appears more than once in the block.
+    fn dedup_docid_val_pairs(&mut self)
+    where T: Ord {
+        if self.docid_cache.len() <= 1 {
+            return;
+        }
+
+        // Quick check: if no consecutive doc_ids are equal, no dedup needed.
+        let has_multivalue = self.docid_cache.windows(2).any(|w| w[0] == w[1]);
+        if !has_multivalue {
+            return;
+        }
+
+        // Sort values within each doc_id group so duplicates become adjacent.
+        let mut start = 0;
+        while start < self.docid_cache.len() {
+            let doc = self.docid_cache[start];
+            let mut end = start + 1;
+            while end < self.docid_cache.len() && self.docid_cache[end] == doc {
+                end += 1;
+            }
+            if end - start > 2 {
+                self.val_cache[start..end].sort();
+            }
+            start = end;
+        }
+
+        // Now duplicates are adjacent — deduplicate in place.
+        let mut write = 0;
+        for read in 1..self.docid_cache.len() {
+            if self.docid_cache[read] != self.docid_cache[write]
+                || self.val_cache[read] != self.val_cache[write]
+            {
+                write += 1;
+                if write != read {
+                    self.docid_cache[write] = self.docid_cache[read];
+                    self.val_cache[write] = self.val_cache[read];
+                }
+            }
+        }
+        let new_len = write + 1;
+        self.docid_cache.truncate(new_len);
+        self.val_cache.truncate(new_len);
+    }
+
    #[inline]
    pub fn iter_vals(&self) -> impl Iterator<Item = T> + '_ {
        self.val_cache.iter().cloned()
@@ -119,6 +191,7 @@ where F: FnMut(u32) {
 }

 #[cfg(test)]
+#[allow(clippy::field_reassign_with_default)]
 mod tests {
    use super::*;

@@ -163,4 +236,56 @@ mod tests {

        assert_eq!(missing_docs, vec![1, 2, 3, 4, 5]);
    }
+
+    #[test]
+    fn test_dedup_docid_val_pairs_consecutive() {
+        let mut accessor = ColumnBlockAccessor::<u64>::default();
+        accessor.docid_cache = vec![0, 0, 2, 3];
+        accessor.val_cache = vec![10, 10, 10, 10];
+        accessor.dedup_docid_val_pairs();
+        assert_eq!(accessor.docid_cache, vec![0, 2, 3]);
+        assert_eq!(accessor.val_cache, vec![10, 10, 10]);
+    }
+
+    #[test]
+    fn test_dedup_docid_val_pairs_non_consecutive() {
+        // (0,1), (0,2), (0,1) — duplicate value not adjacent
+        let mut accessor = ColumnBlockAccessor::<u64>::default();
+        accessor.docid_cache = vec![0, 0, 0];
+        accessor.val_cache = vec![1, 2, 1];
+        accessor.dedup_docid_val_pairs();
+        assert_eq!(accessor.docid_cache, vec![0, 0]);
+        assert_eq!(accessor.val_cache, vec![1, 2]);
+    }
+
+    #[test]
+    fn test_dedup_docid_val_pairs_multi_doc() {
+        // doc 0: values [3, 1, 3], doc 1: values [5, 5]
+        let mut accessor = ColumnBlockAccessor::<u64>::default();
+        accessor.docid_cache = vec![0, 0, 0, 1, 1];
+        accessor.val_cache = vec![3, 1, 3, 5, 5];
+        accessor.dedup_docid_val_pairs();
+        assert_eq!(accessor.docid_cache, vec![0, 0, 1]);
+        assert_eq!(accessor.val_cache, vec![1, 3, 5]);
+    }
+
+    #[test]
+    fn test_dedup_docid_val_pairs_no_duplicates() {
+        let mut accessor = ColumnBlockAccessor::<u64>::default();
+        accessor.docid_cache = vec![0, 0, 1];
+        accessor.val_cache = vec![1, 2, 3];
+        accessor.dedup_docid_val_pairs();
+        assert_eq!(accessor.docid_cache, vec![0, 0, 1]);
+        assert_eq!(accessor.val_cache, vec![1, 2, 3]);
+    }
+
+    #[test]
+    fn test_dedup_docid_val_pairs_single_element() {
+        let mut accessor = ColumnBlockAccessor::<u64>::default();
+        accessor.docid_cache = vec![0];
+        accessor.val_cache = vec![1];
+        accessor.dedup_docid_val_pairs();
+        assert_eq!(accessor.docid_cache, vec![0]);
+        assert_eq!(accessor.val_cache, vec![1]);
+    }
 }
--- a/columnar/src/column_values/mod.rs
+++ b/columnar/src/column_values/mod.rs
@@ -31,7 +31,7 @@ pub use u64_based::{
    serialize_and_load_u64_based_column_values, serialize_u64_based_column_values,
 };
 pub use u128_based::{
-    CompactSpaceU64Accessor, open_u128_as_compact_u64, open_u128_mapped,
+    CompactHit, CompactSpaceU64Accessor, open_u128_as_compact_u64, open_u128_mapped,
    serialize_column_values_u128,
 };
 pub use vec_column::VecColumn;
--- a/columnar/src/column_values/u128_based/compact_space/mod.rs
+++ b/columnar/src/column_values/u128_based/compact_space/mod.rs
@@ -292,6 +292,19 @@ impl BinarySerializable for IPCodecParams {
    }
 }

+/// Represents the result of looking up a u128 value in the compact space.
+///
+/// If a value is outside the compact space, the next compact value is returned.
+#[derive(Debug, Clone, Copy, PartialEq, Eq)]
+pub enum CompactHit {
+    /// The value exists in the compact space
+    Exact(u32),
+    /// The value does not exist in the compact space, but the next higher value does
+    Next(u32),
+    /// The value is greater than the maximum compact value
+    AfterLast,
+}
+
 /// Exposes the compact space compressed values as u64.
 ///
 /// This allows faster access to the values, as u64 is faster to work with than u128.
@@ -309,6 +322,11 @@ impl CompactSpaceU64Accessor {
    pub fn compact_to_u128(&self, compact: u32) -> u128 {
        self.0.compact_to_u128(compact)
    }
+
+    /// Finds the next compact space value for a given u128 value.
+    pub fn u128_to_next_compact(&self, value: u128) -> CompactHit {
+        self.0.u128_to_next_compact(value)
+    }
 }

 impl ColumnValues<u64> for CompactSpaceU64Accessor {
@@ -441,6 +459,21 @@ impl CompactSpaceDecompressor {
        self.params.compact_space.u128_to_compact(value)
    }

+    /// Finds the next compact space value for a given u128 value.
+    pub fn u128_to_next_compact(&self, value: u128) -> CompactHit {
+        match self.u128_to_compact(value) {
+            Ok(compact) => CompactHit::Exact(compact),
+            Err(pos) => {
+                if pos >= self.params.compact_space.ranges_mapping.len() {
+                    CompactHit::AfterLast
+                } else {
+                    let next_range = &self.params.compact_space.ranges_mapping[pos];
+                    CompactHit::Next(next_range.compact_start)
+                }
+            }
+        }
+    }
+
    fn compact_to_u128(&self, compact: u32) -> u128 {
        self.params.compact_space.compact_to_u128(compact)
    }
@@ -823,6 +856,41 @@ mod tests {
        let _data = test_aux_vals(vals);
    }

+    #[test]
+    fn test_u128_to_next_compact() {
+        let vals = &[100u128, 200u128, 1_000_000_000u128, 1_000_000_100u128];
+        let mut data = test_aux_vals(vals);
+
+        let _header = U128Header::deserialize(&mut data);
+        let decomp = CompactSpaceDecompressor::open(data).unwrap();
+
+        // Test value that's already in a range
+        let compact_100 = decomp.u128_to_compact(100).unwrap();
+        assert_eq!(
+            decomp.u128_to_next_compact(100),
+            CompactHit::Exact(compact_100)
+        );
+
+        // Test value between two ranges
+        let compact_million = decomp.u128_to_compact(1_000_000_000).unwrap();
+        assert_eq!(
+            decomp.u128_to_next_compact(250),
+            CompactHit::Next(compact_million)
+        );
+
+        // Test value before the first range
+        assert_eq!(
+            decomp.u128_to_next_compact(50),
+            CompactHit::Next(compact_100)
+        );
+
+        // Test value after the last range
+        assert_eq!(
+            decomp.u128_to_next_compact(10_000_000_000),
+            CompactHit::AfterLast
+        );
+    }
+
    use proptest::prelude::*;

    fn num_strategy() -> impl Strategy<Value = u128> {
--- a/columnar/src/column_values/u128_based/mod.rs
+++ b/columnar/src/column_values/u128_based/mod.rs
@@ -7,7 +7,7 @@ mod compact_space;

 use common::{BinarySerializable, OwnedBytes, VInt};
 pub use compact_space::{
-    CompactSpaceCompressor, CompactSpaceDecompressor, CompactSpaceU64Accessor,
+    CompactHit, CompactSpaceCompressor, CompactSpaceDecompressor, CompactSpaceU64Accessor,
 };

 use crate::column_values::monotonic_map_column;
--- a/columnar/src/lib.rs
+++ b/columnar/src/lib.rs
@@ -59,7 +59,7 @@ pub struct RowAddr {
    pub row_id: RowId,
 }

-pub use sstable::Dictionary;
+pub use sstable::{Dictionary, TermOrdHit};
 pub type Streamer<'a> = sstable::Streamer<'a, VoidSSTable>;

 pub use common::DateTime;
--- a/common/Cargo.toml
+++ b/common/Cargo.toml
@@ -1,6 +1,6 @@
 [package]
 name = "tantivy-common"
-version = "0.10.0"
+version = "0.11.0"
 authors = ["Paul Masurel <paul@quickwit.io>", "Pascal Seitz <pascal@quickwit.io>"]
 license = "MIT"
 edition = "2024"
@@ -15,11 +15,10 @@ repository = "https://github.com/quickwit-oss/tantivy"
 byteorder = "1.4.3"
 ownedbytes = { version= "0.9", path="../ownedbytes" }
 async-trait = "0.1"
-time = { version = "0.3.10", features = ["serde-well-known"] }
+time = { version = "0.3.47", features = ["serde-well-known"] }
 serde = { version = "1.0.136", features = ["derive"] }

 [dev-dependencies]
-binggan = "0.14.0"
+binggan = "0.17.0"
 proptest = "1.0.0"
 rand = "0.9"
-
--- a/common/src/bitset.rs
+++ b/common/src/bitset.rs
@@ -47,6 +47,9 @@ impl TinySet {
        TinySet(val)
    }

+    /// An empty `TinySet` constant.
+    pub const EMPTY: TinySet = TinySet(0u64);
+
    /// Returns an empty `TinySet`.
    #[inline]
    pub fn empty() -> TinySet {
@@ -153,7 +156,22 @@ impl TinySet {
            None
        } else {
            let lowest = self.0.trailing_zeros();
-            self.0 ^= TinySet::singleton(lowest).0;
+            // Kernighan's trick: `n &= n - 1` clears the lowest set bit
+            // without depending on `lowest`. This lets the CPU execute
+            // `trailing_zeros` and the bit-clear in parallel instead of
+            // serializing them.
+            //
+            // The previous form `self.0 ^= 1 << lowest` needs the result of
+            // `trailing_zeros` before it can shift, creating a dependency chain:
+            //   ARM64: rbit → clz → lsl → eor
+            //   x86:   tzcnt → btc
+            //
+            // With Kernighan's trick the clear path is independent of the count:
+            //   ARM64: sub → and  (trailing_zeros runs in parallel)
+            //   x86:   blsr       (tzcnt runs in parallel)
+            //
+            // https://godbolt.org/z/fnfrP1T5f
+            self.0 &= self.0 - 1;
            Some(lowest)
        }
    }
@@ -178,13 +196,11 @@ impl TinySet {
 #[derive(Clone)]
 pub struct BitSet {
    tinysets: Box<[TinySet]>,
-    len: u64,
    max_value: u32,
 }
 impl std::fmt::Debug for BitSet {
    fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
        f.debug_struct("BitSet")
-            .field("len", &self.len)
            .field("max_value", &self.max_value)
            .finish()
    }
@@ -212,7 +228,6 @@ impl BitSet {
        let tinybitsets = vec![TinySet::empty(); num_buckets as usize].into_boxed_slice();
        BitSet {
            tinysets: tinybitsets,
-            len: 0,
            max_value,
        }
    }
@@ -230,7 +245,6 @@ impl BitSet {
        }
        BitSet {
            tinysets: tinybitsets,
-            len: max_value as u64,
            max_value,
        }
    }
@@ -249,17 +263,19 @@ impl BitSet {

    /// Intersect with tinysets
    fn intersect_update_with_iter(&mut self, other: impl Iterator<Item = TinySet>) {
-        self.len = 0;
        for (left, right) in self.tinysets.iter_mut().zip(other) {
            *left = left.intersect(right);
-            self.len += left.len() as u64;
        }
    }

    /// Returns the number of elements in the `BitSet`.
    #[inline]
    pub fn len(&self) -> usize {
-        self.len as usize
+        self.tinysets
+            .iter()
+            .copied()
+            .map(|tinyset| tinyset.len())
+            .sum::<u32>() as usize
    }

    /// Inserts an element in the `BitSet`
@@ -268,7 +284,7 @@ impl BitSet {
        // we do not check saturated els.
        let higher = el / 64u32;
        let lower = el % 64u32;
-        self.len += u64::from(self.tinysets[higher as usize].insert_mut(lower));
+        self.tinysets[higher as usize].insert_mut(lower);
    }

    /// Inserts an element in the `BitSet`
@@ -277,7 +293,7 @@ impl BitSet {
        // we do not check saturated els.
        let higher = el / 64u32;
        let lower = el % 64u32;
-        self.len -= u64::from(self.tinysets[higher as usize].remove_mut(lower));
+        self.tinysets[higher as usize].remove_mut(lower);
    }

    /// Returns true iff the elements is in the `BitSet`.
@@ -299,6 +315,9 @@ impl BitSet {
            .map(|delta_bucket| bucket + delta_bucket as u32)
    }

+    /// Returns the maximum number of elements in the bitset.
+    ///
+    /// Warning: The largest element the bitset can contain is `max_value - 1`.
    #[inline]
    pub fn max_value(&self) -> u32 {
        self.max_value
--- a/common/src/file_slice.rs
+++ b/common/src/file_slice.rs
@@ -121,7 +121,7 @@ pub struct FileSlice {

 impl fmt::Debug for FileSlice {
    fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
-        write!(f, "FileSlice({:?}, {:?})", &self.data, self.range)
+        write!(f, "FileSlice({:?}, {:?})", self.data, self.range)
    }
 }

--- a/common/src/writer.rs
+++ b/common/src/writer.rs
@@ -62,7 +62,9 @@ impl<W: TerminatingWrite> TerminatingWrite for CountingWriter<W> {
 pub struct AntiCallToken(());

 /// Trait used to indicate when no more write need to be done on a writer
-pub trait TerminatingWrite: Write + Send + Sync {
+///
+/// Thread-safety is enforced at the call sites that require it.
+pub trait TerminatingWrite: Write {
    /// Indicate that the writer will no longer be used. Internally call terminate_ref.
    fn terminate(mut self) -> io::Result<()>
    where Self: Sized {
--- a/examples/iterating_docs_and_positions.rs
+++ b/examples/iterating_docs_and_positions.rs
@@ -91,46 +91,10 @@ fn main() -> tantivy::Result<()> {
        }
    }

-    // A `Term` is a text token associated with a field.
-    // Let's go through all docs containing the term `title:the` and access their position
-    let term_the = Term::from_field_text(title, "the");
-
-    // Some other powerful operations (especially `.skip_to`) may be useful to consume these
+    // Some other powerful operations (especially `.seek`) may be useful to consume these
    // posting lists rapidly.
    // You can check for them in the [`DocSet`](https://docs.rs/tantivy/~0/tantivy/trait.DocSet.html) trait
    // and the [`Postings`](https://docs.rs/tantivy/~0/tantivy/trait.Postings.html) trait

-    // Also, for some VERY specific high performance use case like an OLAP analysis of logs,
-    // you can get better performance by accessing directly the blocks of doc ids.
-    for segment_reader in searcher.segment_readers() {
-        // A segment contains different data structure.
-        // Inverted index stands for the combination of
-        // - the term dictionary
-        // - the inverted lists associated with each terms and their positions
-        let inverted_index = segment_reader.inverted_index(title)?;
-
-        // This segment posting object is like a cursor over the documents matching the term.
-        // The `IndexRecordOption` arguments tells tantivy we will be interested in both term
-        // frequencies and positions.
-        //
-        // If you don't need all this information, you may get better performance by decompressing
-        // less information.
-        if let Some(mut block_segment_postings) =
-            inverted_index.read_block_postings(&term_the, IndexRecordOption::Basic)?
-        {
-            loop {
-                let docs = block_segment_postings.docs();
-                if docs.is_empty() {
-                    break;
-                }
-                // Once again these docs MAY contains deleted documents as well.
-                let docs = block_segment_postings.docs();
-                // Prints `Docs [0, 2].`
-                println!("Docs {docs:?}");
-                block_segment_postings.advance();
-            }
-        }
-    }
-
    Ok(())
 }
--- a/query-grammar/Cargo.toml
+++ b/query-grammar/Cargo.toml
@@ -1,6 +1,6 @@
 [package]
 name = "tantivy-query-grammar"
-version = "0.25.0"
+version = "0.26.0"
 authors = ["Paul Masurel <paul.masurel@gmail.com>"]
 license = "MIT"
 categories = ["database-implementations", "data-structures"]
--- a/query-grammar/src/query_grammar.rs
+++ b/query-grammar/src/query_grammar.rs
@@ -704,7 +704,11 @@ fn regex(inp: &str) -> IResult<&str, UserInputLeaf> {
                many1(alt((preceded(char('\\'), char('/')), none_of("/")))),
                char('/'),
            ),
-            peek(alt((multispace1, eof))),
+            peek(alt((
+                value((), multispace1),
+                value((), char(')')),
+                value((), eof),
+            ))),
        ),
        |elements| UserInputLeaf::Regex {
            field: None,
@@ -721,8 +725,12 @@ fn regex_infallible(inp: &str) -> JResult<&str, UserInputLeaf> {
            opt_i_err(char('/'), "missing delimiter /"),
        ),
        opt_i_err(
-            peek(alt((multispace1, eof))),
-            "expected whitespace or end of input",
+            peek(alt((
+                value((), multispace1),
+                value((), char(')')),
+                value((), eof),
+            ))),
+            "expected whitespace, closing parenthesis, or end of input",
        ),
    )(inp)
    {
@@ -1037,18 +1045,43 @@ fn operand_leaf(inp: &str) -> IResult<&str, (Option<BinaryOperand>, Option<Occur
 }

 fn ast(inp: &str) -> IResult<&str, UserInputAst> {
-    let boolean_expr = map_res(
-        separated_pair(occur_leaf, multispace1, many1(operand_leaf)),
-        |(left, right)| aggregate_binary_expressions(left, right),
-    );
-    let single_leaf = map(occur_leaf, |(occur, ast)| {
-        if occur == Some(Occur::MustNot) {
-            ast.unary(Occur::MustNot)
-        } else {
-            ast
-        }
-    });
-    delimited(multispace0, alt((boolean_expr, single_leaf)), multispace0)(inp)
+    // Parse `occur_leaf` once, then conditionally extend into a boolean
+    // expression. The previous implementation used `alt((boolean_expr,
+    // single_leaf))` which, when the input was a single leaf with no
+    // following operand, would parse `occur_leaf` once for `boolean_expr`,
+    // fail at `multispace1`, backtrack, then re-parse `occur_leaf` for
+    // `single_leaf`. With recursively-nested groups like `(+(+(+a)))`, that
+    // doubling at every level produced O(2^n) parse time. Parsing once and
+    // peeking ahead for the operand keeps it O(n).
+    delimited(
+        multispace0,
+        |inp| {
+            let (rest, first) = occur_leaf(inp)?;
+            // Only fall back on `Err::Error` (recoverable), mirroring
+            // `alt`'s behaviour. `Err::Failure` and `Err::Incomplete`
+            // must propagate so cut points and streaming needs are not
+            // accidentally swallowed if they are ever introduced in the
+            // operand parsers.
+            match preceded(multispace1, many1(operand_leaf))(rest) {
+                Ok((rest, more)) => {
+                    let combined = aggregate_binary_expressions(first, more)
+                        .map_err(|_| nom::Err::Error(Error::new(inp, ErrorKind::MapRes)))?;
+                    Ok((rest, combined))
+                }
+                Err(nom::Err::Error(_)) => {
+                    let (occur, ast) = first;
+                    let single = if occur == Some(Occur::MustNot) {
+                        ast.unary(Occur::MustNot)
+                    } else {
+                        ast
+                    };
+                    Ok((rest, single))
+                }
+                Err(e) => Err(e),
+            }
+        },
+        multispace0,
+    )(inp)
 }

 fn ast_infallible(inp: &str) -> JResult<&str, UserInputAst> {
@@ -1707,6 +1740,10 @@ mod test {
        test_parse_query_to_ast_helper("foo:(A OR B)", "(?\"foo\":A ?\"foo\":B)");
        test_parse_query_to_ast_helper("foo:(A* OR B*)", "(?\"foo\":A* ?\"foo\":B*)");
        test_parse_query_to_ast_helper("foo:(*A OR *B)", "(?\"foo\":*A ?\"foo\":*B)");
+
+        // Regexes between parentheses
+        test_parse_query_to_ast_helper("foo:(/A.*/)", "\"foo\":/A.*/");
+        test_parse_query_to_ast_helper("foo:(/A.*/ OR /B.*/)", "(?\"foo\":/A.*/ ?\"foo\":/B.*/)");
    }

    #[test]
@@ -1879,4 +1916,23 @@ mod test {
            r#"(+"field":'happy tax payer' +"other_field":1)"#,
        );
    }
+
+    // Regression test for https://github.com/quickwit-oss/tantivy/issues/2498:
+    // deeply nested parenthesized queries used to take O(2^n) time because the
+    // top-level `ast()` parser tried `boolean_expr` first and re-parsed the
+    // inner `occur_leaf` when it backtracked to `single_leaf`. Depth 60 would
+    // take ~10^18 operations under the regression; with the fix it parses
+    // instantly. We use `test_parse_query_to_ast_helper` so this test would
+    // never finish if the regression returned.
+    #[test]
+    fn test_parse_deeply_nested_query() {
+        let depth = 60;
+        let leading: String = "(".repeat(depth);
+        let trailing: String = ")".repeat(depth);
+        let query = format!("{leading}title:test{trailing}");
+        test_parse_query_to_ast_helper(&query, r#""title":test"#);
+
+        let query_with_plus = format!("+{leading}title:test{trailing}");
+        test_parse_query_to_ast_helper(&query_with_plus, r#""title":test"#);
+    }
 }
--- a/query-grammar/src/user_input_ast.rs
+++ b/query-grammar/src/user_input_ast.rs
@@ -66,6 +66,7 @@ impl UserInputLeaf {
            }
            UserInputLeaf::Range { field, .. } if field.is_none() => *field = Some(default_field),
            UserInputLeaf::Set { field, .. } if field.is_none() => *field = Some(default_field),
+            UserInputLeaf::Regex { field, .. } if field.is_none() => *field = Some(default_field),
            _ => (), // field was already set, do nothing
        }
    }
--- a/src/aggregation/agg_data.rs
+++ b/src/aggregation/agg_data.rs
@@ -10,17 +10,18 @@ use crate::aggregation::accessor_helpers::{
 };
 use crate::aggregation::agg_req::{Aggregation, AggregationVariants, Aggregations};
 use crate::aggregation::bucket::{
-    build_segment_filter_collector, build_segment_range_collector, FilterAggReqData,
-    HistogramAggReqData, HistogramBounds, IncludeExcludeParam, MissingTermAggReqData,
-    RangeAggReqData, SegmentHistogramCollector, TermMissingAgg, TermsAggReqData, TermsAggregation,
+    build_segment_filter_collector, build_segment_range_collector, CompositeAggReqData,
+    CompositeAggregation, CompositeSourceAccessors, FilterAggReqData, HistogramAggReqData,
+    HistogramBounds, IncludeExcludeParam, MissingTermAggReqData, RangeAggReqData,
+    SegmentHistogramCollector, TermMissingAgg, TermsAggReqData, TermsAggregation,
    TermsAggregationInternal,
 };
 use crate::aggregation::metric::{
    build_segment_stats_collector, AverageAggregation, CardinalityAggReqData,
    CardinalityAggregationReq, CountAggregation, ExtendedStatsAggregation, MaxAggregation,
    MetricAggReqData, MinAggregation, SegmentCardinalityCollector, SegmentExtendedStatsCollector,
-    SegmentPercentilesCollector, StatsAggregation, StatsType, SumAggregation, TopHitsAggReqData,
-    TopHitsSegmentCollector,
+    SegmentPercentilesCollector, StatsAggregation, StatsType, SumAggregation, TermOrdSet,
+    TopHitsAggReqData, TopHitsSegmentCollector, BITSET_MAX_TERM_ORD,
 };
 use crate::aggregation::segment_agg_result::{
    GenericSegmentAggregationResultsCollector, SegmentAggregationCollector,
@@ -73,6 +74,12 @@ impl AggregationsSegmentCtx {
        self.per_request.filter_req_data.push(Some(Box::new(data)));
        self.per_request.filter_req_data.len() - 1
    }
+    pub(crate) fn push_composite_req_data(&mut self, data: CompositeAggReqData) -> usize {
+        self.per_request
+            .composite_req_data
+            .push(Some(Box::new(data)));
+        self.per_request.composite_req_data.len() - 1
+    }

    #[inline]
    pub(crate) fn get_term_req_data(&self, idx: usize) -> &TermsAggReqData {
@@ -108,6 +115,12 @@ impl AggregationsSegmentCtx {
            .as_deref()
            .expect("range_req_data slot is empty (taken)")
    }
+    #[inline]
+    pub(crate) fn get_composite_req_data(&self, idx: usize) -> &CompositeAggReqData {
+        self.per_request.composite_req_data[idx]
+            .as_deref()
+            .expect("composite_req_data slot is empty (taken)")
+    }

    // ---------- mutable getters ----------

@@ -181,6 +194,25 @@ impl AggregationsSegmentCtx {
        debug_assert!(self.per_request.filter_req_data[idx].is_none());
        self.per_request.filter_req_data[idx] = Some(value);
    }
+
+    /// Move out the Composite request at `idx`.
+    #[inline]
+    pub(crate) fn take_composite_req_data(&mut self, idx: usize) -> Box<CompositeAggReqData> {
+        self.per_request.composite_req_data[idx]
+            .take()
+            .expect("composite_req_data slot is empty (taken)")
+    }
+
+    /// Put back a Composite request into an empty slot at `idx`.
+    #[inline]
+    pub(crate) fn put_back_composite_req_data(
+        &mut self,
+        idx: usize,
+        value: Box<CompositeAggReqData>,
+    ) {
+        debug_assert!(self.per_request.composite_req_data[idx].is_none());
+        self.per_request.composite_req_data[idx] = Some(value);
+    }
 }

 /// Each type of aggregation has its own request data struct. This struct holds
@@ -208,6 +240,8 @@ pub struct PerRequestAggSegCtx {
    pub top_hits_req_data: Vec<TopHitsAggReqData>,
    /// MissingTermAggReqData contains the request data for a missing term aggregation.
    pub missing_term_req_data: Vec<MissingTermAggReqData>,
+    /// CompositeAggReqData contains the request data for a composite aggregation.
+    pub composite_req_data: Vec<Option<Box<CompositeAggReqData>>>,

    /// Request tree used to build collectors.
    pub agg_tree: Vec<AggRefNode>,
@@ -255,6 +289,11 @@ impl PerRequestAggSegCtx {
                .iter()
                .map(|t| t.get_memory_consumption())
                .sum::<usize>()
+            + self
+                .composite_req_data
+                .iter()
+                .map(|b| b.as_ref().map(|d| d.get_memory_consumption()).unwrap_or(0))
+                .sum::<usize>()
            + self.agg_tree.len() * std::mem::size_of::<AggRefNode>()
    }

@@ -291,6 +330,11 @@ impl PerRequestAggSegCtx {
                .expect("filter_req_data slot is empty (taken)")
                .name
                .as_str(),
+            AggKind::Composite => self.composite_req_data[idx]
+                .as_deref()
+                .expect("composite_req_data slot is empty (taken)")
+                .name
+                .as_str(),
        }
    }

@@ -369,12 +413,38 @@ pub(crate) fn build_segment_agg_collector(
        }
        AggKind::Cardinality => {
            let req_data = &mut req.get_cardinality_req_data_mut(node.idx_in_req_data);
-            Ok(Box::new(SegmentCardinalityCollector::from_req(
-                req_data.column_type,
-                node.idx_in_req_data,
-                req_data.accessor.clone(),
-                req_data.missing_value_for_accessor,
-            )))
+            // For str columns, choose the per-bucket entries representation
+            // based on the segment's column.max_value():
+            //   * small (< BITSET_MAX_TERM_ORD): `BitSet`, pre-allocated, no promotion machinery.
+            //   * large: `TermOrdSet` (sparse FxHashSet that promotes to a paged bitset).
+            // For non-str columns the `entries` field is unused (values go
+            // straight into the HLL sketch); we still pick `TermOrdSet`
+            // because its empty Sparse(FxHashSet) costs nothing.
+            let is_str = req_data.column_type == ColumnType::Str;
+            let max_term_ord_inclusive = if is_str {
+                req_data.accessor.max_value()
+            } else {
+                0
+            };
+            let collector: Box<dyn SegmentAggregationCollector> =
+                if is_str && max_term_ord_inclusive < BITSET_MAX_TERM_ORD {
+                    Box::new(SegmentCardinalityCollector::<BitSet>::from_req(
+                        req_data.column_type,
+                        node.idx_in_req_data,
+                        req_data.accessor.clone(),
+                        req_data.missing_value_for_accessor,
+                        max_term_ord_inclusive,
+                    ))
+                } else {
+                    Box::new(SegmentCardinalityCollector::<TermOrdSet>::from_req(
+                        req_data.column_type,
+                        node.idx_in_req_data,
+                        req_data.accessor.clone(),
+                        req_data.missing_value_for_accessor,
+                        max_term_ord_inclusive,
+                    ))
+                };
+            Ok(collector)
        }
        AggKind::StatsKind(stats_type) => {
            let req_data = &mut req.per_request.stats_metric_req_data[node.idx_in_req_data];
@@ -417,6 +487,11 @@ pub(crate) fn build_segment_agg_collector(
        )?)),
        AggKind::Range => Ok(build_segment_range_collector(req, node)?),
        AggKind::Filter => build_segment_filter_collector(req, node),
+        AggKind::Composite => Ok(Box::new(
+            crate::aggregation::bucket::SegmentCompositeCollector::from_req_and_validate(
+                req, node,
+            )?,
+        )),
    }
 }

@@ -447,6 +522,7 @@ pub enum AggKind {
    DateHistogram,
    Range,
    Filter,
+    Composite,
 }

 impl AggKind {
@@ -462,6 +538,7 @@ impl AggKind {
            AggKind::DateHistogram => "DateHistogram",
            AggKind::Range => "Range",
            AggKind::Filter => "Filter",
+            AggKind::Composite => "Composite",
        }
    }
 }
@@ -709,6 +786,14 @@ fn build_nodes(
                children,
            }])
        }
+        AggregationVariants::Composite(composite_req) => Ok(vec![build_composite_node(
+            agg_name,
+            reader,
+            segment_ordinal,
+            data,
+            &req.sub_aggregation,
+            composite_req,
+        )?]),
        AggregationVariants::Filter(filter_req) => {
            // Build the query and evaluator upfront
            let schema = reader.schema();
@@ -743,6 +828,35 @@ fn build_nodes(
    }
 }

+fn build_composite_node(
+    agg_name: &str,
+    reader: &SegmentReader,
+    _segment_ordinal: SegmentOrdinal,
+    data: &mut AggregationsSegmentCtx,
+    sub_aggs: &Aggregations,
+    req: &CompositeAggregation,
+) -> crate::Result<AggRefNode> {
+    let mut composite_accessors = Vec::with_capacity(req.sources.len());
+    for source in &req.sources {
+        let source_after_key_opt = req.after.get(source.name()).map(|k| &k.0);
+        let source_accessor =
+            CompositeSourceAccessors::build_for_source(reader, source, source_after_key_opt)?;
+        composite_accessors.push(source_accessor);
+    }
+    let agg = CompositeAggReqData {
+        name: agg_name.to_string(),
+        req: req.clone(),
+        composite_accessors,
+    };
+    let idx = data.push_composite_req_data(agg);
+    let children = build_children(sub_aggs, reader, _segment_ordinal, data)?;
+    Ok(AggRefNode {
+        kind: AggKind::Composite,
+        idx_in_req_data: idx,
+        children,
+    })
+}
+
 fn build_children(
    aggs: &Aggregations,
    reader: &SegmentReader,
@@ -897,8 +1011,12 @@ fn build_terms_or_cardinality_nodes(
                    let str_col = str_dict_column
                        .as_ref()
                        .expect("str_dict_column must exist for string column");
-                    allowed_term_ids =
-                        build_allowed_term_ids_for_str(str_col, &req.include, &req.exclude)?;
+                    allowed_term_ids = build_allowed_term_ids_for_str(
+                        str_col,
+                        &req.include,
+                        &req.exclude,
+                        missing.is_some(),
+                    )?;
                };
                let idx_in_req_data = data.push_term_req_data(TermsAggReqData {
                    accessor,
@@ -914,10 +1032,20 @@ fn build_terms_or_cardinality_nodes(
                (idx_in_req_data, AggKind::Terms)
            }
            TermsOrCardinalityRequest::Cardinality(ref req) => {
+                // `str_dict_column` is computed once per field; for JSON paths
+                // with mixed types it's `Some` even on the numeric req_data.
+                // Cardinality only consults it for the str column path, so
+                // gate by column_type to avoid driving non-str collectors
+                // through the coupon-cache path.
+                let str_dict_column_for_req = if column_type == ColumnType::Str {
+                    str_dict_column.clone()
+                } else {
+                    None
+                };
                let idx_in_req_data = data.push_cardinality_req_data(CardinalityAggReqData {
                    accessor,
                    column_type,
-                    str_dict_column: str_dict_column.clone(),
+                    str_dict_column: str_dict_column_for_req,
                    missing_value_for_accessor,
                    name: agg_name.to_string(),
                    req: req.clone(),
@@ -937,16 +1065,21 @@ fn build_terms_or_cardinality_nodes(

 /// Builds a single BitSet of allowed term ordinals for a string dictionary column according to
 /// include/exclude parameters.
+///
+/// When `reserve_missing_sentinel` is true, the bitset will have 1 additional slot for the missing
+/// term ordinal
 fn build_allowed_term_ids_for_str(
    str_col: &StrColumn,
    include: &Option<IncludeExcludeParam>,
    exclude: &Option<IncludeExcludeParam>,
+    reserve_missing_sentinel: bool,
 ) -> crate::Result<Option<BitSet>> {
    let mut allowed: Option<BitSet> = None;
-    let num_terms = str_col.dictionary().num_terms() as u32;
+    let missing_sentinel_adjustment = if reserve_missing_sentinel { 1 } else { 0 };
+    let allowed_capacity = str_col.dictionary().num_terms() as u32 + missing_sentinel_adjustment;
    if let Some(include) = include {
        // add matches
-        allowed = Some(BitSet::with_max_value(num_terms));
+        allowed = Some(BitSet::with_max_value(allowed_capacity));
        let allowed = allowed.as_mut().unwrap();
        for_each_matching_term_ord(str_col, include, |ord| allowed.insert(ord))?;
    };
@@ -954,7 +1087,7 @@ fn build_allowed_term_ids_for_str(
    if let Some(exclude) = exclude {
        if allowed.is_none() {
            // Start with all terms allowed
-            allowed = Some(BitSet::with_max_value_and_full(num_terms));
+            allowed = Some(BitSet::with_max_value_and_full(allowed_capacity));
        }
        let allowed = allowed.as_mut().unwrap();
        for_each_matching_term_ord(str_col, exclude, |ord| allowed.remove(ord))?;
--- a/src/aggregation/agg_req.rs
+++ b/src/aggregation/agg_req.rs
@@ -32,8 +32,8 @@ use rustc_hash::FxHashMap;
 use serde::{Deserialize, Serialize};

 use super::bucket::{
-    DateHistogramAggregationReq, FilterAggregation, HistogramAggregation, RangeAggregation,
-    TermsAggregation,
+    CompositeAggregation, DateHistogramAggregationReq, FilterAggregation, HistogramAggregation,
+    RangeAggregation, TermsAggregation,
 };
 use super::metric::{
    AverageAggregation, CardinalityAggregationReq, CountAggregation, ExtendedStatsAggregation,
@@ -115,6 +115,71 @@ pub fn get_fast_field_names(aggs: &Aggregations) -> HashSet<String> {
    fast_field_names
 }

+/// Validates that all fields referenced in the aggregation request exist in the schema
+/// and are configured as fast fields.
+///
+/// This is a convenience function for upfront validation before executing aggregations.
+/// Returns an error if any field doesn't exist or is not a fast field.
+///
+/// Validation is intentionally opt-in rather than baked into aggregation execution: the
+/// default lenient behavior (returning empty results for missing fields) supports
+/// schema evolution and federated queries where the same request runs against segments
+/// or indices with different schemas.
+///
+/// # Example
+/// ```
+/// use tantivy::aggregation::agg_req::{Aggregations, validate_aggregation_fields_exist};
+/// use tantivy::schema::{Schema, FAST};
+/// use tantivy::Index;
+///
+/// # fn main() -> tantivy::Result<()> {
+/// // Create a simple index
+/// let mut schema_builder = Schema::builder();
+/// schema_builder.add_f64_field("price", FAST);
+/// let schema = schema_builder.build();
+/// let index = Index::create_in_ram(schema);
+///
+/// // Parse aggregation request
+/// let agg_req: Aggregations = serde_json::from_str(r#"{
+///     "avg_price": { "avg": { "field": "price" } }
+/// }"#)?;
+///
+/// let reader = index.reader()?;
+/// let searcher = reader.searcher();
+///
+/// // Validate fields before executing
+/// for segment_reader in searcher.segment_readers() {
+///     validate_aggregation_fields_exist(&agg_req, segment_reader)?;
+/// }
+/// # Ok(())
+/// # }
+/// ```
+pub fn validate_aggregation_fields_exist(
+    aggs: &Aggregations,
+    reader: &crate::SegmentReader,
+) -> crate::Result<()> {
+    let field_names = get_fast_field_names(aggs);
+    let schema = reader.schema();
+
+    for field_name in field_names {
+        // Check if the field is either directly in the schema or could be part of a json field
+        // present in the schema, and verify it's a fast field.
+        if let Some((field, _path)) = schema.find_field(&field_name) {
+            let field_type = schema.get_field_entry(field).field_type();
+            if !field_type.is_fast() {
+                return Err(crate::TantivyError::SchemaError(format!(
+                    "Field '{}' is not a fast field. Aggregations require fast fields.",
+                    field_name
+                )));
+            }
+        } else {
+            return Err(crate::TantivyError::FieldNotFound(field_name));
+        }
+    }
+
+    Ok(())
+}
+
 #[derive(Clone, Debug, PartialEq, Serialize, Deserialize)]
 /// All aggregation types.
 pub enum AggregationVariants {
@@ -134,6 +199,9 @@ pub enum AggregationVariants {
    /// Filter documents into a single bucket.
    #[serde(rename = "filter")]
    Filter(FilterAggregation),
+    /// Multi-dimensional, paginable bucket aggregation.
+    #[serde(rename = "composite")]
+    Composite(CompositeAggregation),

    // Metric aggregation types
    /// Computes the average of the extracted values.
@@ -180,6 +248,11 @@ impl AggregationVariants {
            AggregationVariants::Histogram(histogram) => vec![histogram.field.as_str()],
            AggregationVariants::DateHistogram(histogram) => vec![histogram.field.as_str()],
            AggregationVariants::Filter(filter) => filter.get_fast_field_names(),
+            AggregationVariants::Composite(composite) => composite
+                .sources
+                .iter()
+                .map(|source| source.field())
+                .collect(),
            AggregationVariants::Average(avg) => vec![avg.field_name()],
            AggregationVariants::Count(count) => vec![count.field_name()],
            AggregationVariants::Max(max) => vec![max.field_name()],
@@ -214,6 +287,12 @@ impl AggregationVariants {
            _ => None,
        }
    }
+    pub(crate) fn as_composite(&self) -> Option<&CompositeAggregation> {
+        match &self {
+            AggregationVariants::Composite(composite) => Some(composite),
+            _ => None,
+        }
+    }
    pub(crate) fn as_percentile(&self) -> Option<&PercentilesAggregationReq> {
        match &self {
            AggregationVariants::Percentiles(percentile_req) => Some(percentile_req),
--- a/src/aggregation/agg_result.rs
+++ b/src/aggregation/agg_result.rs
@@ -9,11 +9,12 @@ use rustc_hash::FxHashMap;
 use serde::{Deserialize, Serialize};

 use super::bucket::GetDocCount;
+use super::intermediate_agg_result::CompositeIntermediateKey;
 use super::metric::{
-    AverageMetricResult, CardinalityMetricResult, ExtendedStats, PercentilesMetricResult,
-    SingleMetricResult, Stats, TopHitsMetricResult,
+    ExtendedStats, PercentilesMetricResult, SingleMetricResult, Stats, TopHitsMetricResult,
 };
 use super::{AggregationError, Key};
+use crate::aggregation::bucket::AfterKey;
 use crate::TantivyError;

 #[derive(Clone, Default, Debug, PartialEq, Serialize, Deserialize)]
@@ -82,8 +83,8 @@ impl AggregationResult {
 #[serde(untagged)]
 /// MetricResult
 pub enum MetricResult {
-    /// Average metric result with sum and count for multi-step merging.
-    Average(AverageMetricResult),
+    /// Average metric result.
+    Average(SingleMetricResult),
    /// Count metric result.
    Count(SingleMetricResult),
    /// Max metric result.
@@ -100,8 +101,8 @@ pub enum MetricResult {
    Percentiles(PercentilesMetricResult),
    /// Top hits metric result
    TopHits(TopHitsMetricResult),
-    /// Cardinality metric result with HLL sketch for multi-step merging.
-    Cardinality(CardinalityMetricResult),
+    /// Cardinality metric result
+    Cardinality(SingleMetricResult),
 }

 impl MetricResult {
@@ -120,7 +121,7 @@ impl MetricResult {
            MetricResult::TopHits(_) => Err(TantivyError::AggregationError(
                AggregationError::InvalidRequest("top_hits can't be used to order".to_string()),
            )),
-            MetricResult::Cardinality(card) => Ok(card.value), // CardinalityMetricResult.value
+            MetricResult::Cardinality(card) => Ok(card.value),
        }
    }
 }
@@ -159,6 +160,14 @@ pub enum BucketResult {
    },
    /// This is the filter result - a single bucket with sub-aggregations
    Filter(FilterBucketResult),
+    /// This is the composite result
+    Composite {
+        /// The buckets
+        buckets: Vec<CompositeBucketEntry>,
+        /// The key to start after when paginating
+        #[serde(skip_serializing_if = "FxHashMap::is_empty")]
+        after_key: FxHashMap<String, AfterKey>,
+    },
 }

 impl BucketResult {
@@ -180,6 +189,9 @@ impl BucketResult {
                // Only count sub-aggregation buckets
                filter_result.sub_aggregations.get_bucket_count()
            }
+            BucketResult::Composite { buckets, .. } => {
+                buckets.iter().map(|bucket| bucket.get_bucket_count()).sum()
+            }
        }
    }
 }
@@ -196,7 +208,8 @@ pub enum BucketEntries<T> {
 }

 impl<T> BucketEntries<T> {
-    fn iter<'a>(&'a self) -> Box<dyn Iterator<Item = &'a T> + 'a> {
+    /// Iterate over all bucket entries.
+    pub fn iter<'a>(&'a self) -> Box<dyn Iterator<Item = &'a T> + 'a> {
        match self {
            BucketEntries::Vec(vec) => Box::new(vec.iter()),
            BucketEntries::HashMap(map) => Box::new(map.values()),
@@ -338,3 +351,87 @@ pub struct FilterBucketResult {
    #[serde(flatten)]
    pub sub_aggregations: AggregationResults,
 }
+
+/// Note the type information loss compared to `CompositeIntermediateKey`.
+/// Pagination is performed using `AfterKey`, which encodes type information.
+#[derive(Clone, Debug, Serialize, Deserialize)]
+#[serde(untagged)]
+pub enum CompositeKey {
+    /// Boolean key
+    Bool(bool),
+    /// String key
+    Str(String),
+    /// `i64` key
+    I64(i64),
+    /// `u64` key
+    U64(u64),
+    /// `f64` key
+    F64(f64),
+    /// Null key
+    Null,
+}
+impl Eq for CompositeKey {}
+impl std::hash::Hash for CompositeKey {
+    fn hash<H: std::hash::Hasher>(&self, state: &mut H) {
+        core::mem::discriminant(self).hash(state);
+        match self {
+            Self::Bool(val) => val.hash(state),
+            Self::Str(text) => text.hash(state),
+            Self::F64(val) => val.to_bits().hash(state),
+            Self::U64(val) => val.hash(state),
+            Self::I64(val) => val.hash(state),
+            Self::Null => {}
+        }
+    }
+}
+impl PartialEq for CompositeKey {
+    fn eq(&self, other: &Self) -> bool {
+        match (self, other) {
+            (Self::Bool(l), Self::Bool(r)) => l == r,
+            (Self::Str(l), Self::Str(r)) => l == r,
+            (Self::F64(l), Self::F64(r)) => l.to_bits() == r.to_bits(),
+            (Self::I64(l), Self::I64(r)) => l == r,
+            (Self::U64(l), Self::U64(r)) => l == r,
+            (Self::Null, Self::Null) => true,
+            _ => false,
+        }
+    }
+}
+impl From<CompositeIntermediateKey> for CompositeKey {
+    fn from(value: CompositeIntermediateKey) -> Self {
+        match value {
+            CompositeIntermediateKey::Str(s) => Self::Str(s),
+            CompositeIntermediateKey::IpAddr(s) => {
+                if let Some(ip) = s.to_ipv4_mapped() {
+                    Self::Str(ip.to_string())
+                } else {
+                    Self::Str(s.to_string())
+                }
+            }
+            CompositeIntermediateKey::F64(f) => Self::F64(f),
+            CompositeIntermediateKey::Bool(f) => Self::Bool(f),
+            CompositeIntermediateKey::U64(f) => Self::U64(f),
+            CompositeIntermediateKey::I64(f) => Self::I64(f),
+            CompositeIntermediateKey::DateTime(f) => Self::I64(f / 1_000_000), // ns to ms
+            CompositeIntermediateKey::Null => Self::Null,
+        }
+    }
+}
+
+/// Composite bucket entry with a multi-dimensional key.
+#[derive(Clone, Debug, PartialEq, Serialize, Deserialize)]
+pub struct CompositeBucketEntry {
+    /// The identifier of the bucket.
+    pub key: FxHashMap<String, CompositeKey>,
+    /// Number of documents in the bucket.
+    pub doc_count: u64,
+    #[serde(flatten)]
+    /// Sub-aggregations in this bucket.
+    pub sub_aggregation: AggregationResults,
+}
+
+impl CompositeBucketEntry {
+    pub(crate) fn get_bucket_count(&self) -> u64 {
+        1 + self.sub_aggregation.get_bucket_count()
+    }
+}
--- a/src/aggregation/agg_tests.rs
+++ b/src/aggregation/agg_tests.rs
@@ -1359,10 +1359,10 @@ fn test_aggregation_on_json_object_mixed_types() {
        &serde_json::json!({
          "rangeagg": {
            "buckets": [
-              { "average_in_range": { "value": -20.5, "sum": -20.5, "count": 1 }, "doc_count": 1, "key": "*-3", "to": 3.0 },
-              { "average_in_range": { "value": 10.0, "sum": 10.0, "count": 1 }, "doc_count": 1, "from": 3.0, "key": "3-19", "to": 19.0 },
-              { "average_in_range": { "value": null, "sum": 0.0, "count": 0 }, "doc_count": 0, "from": 19.0, "key": "19-20", "to": 20.0 },
-              { "average_in_range": { "value": null, "sum": 0.0, "count": 0 }, "doc_count": 0, "from": 20.0, "key": "20-*" }
+              { "average_in_range": { "value": -20.5 }, "doc_count": 1, "key": "*-3", "to": 3.0 },
+              { "average_in_range": { "value": 10.0 }, "doc_count": 1, "from": 3.0, "key": "3-19", "to": 19.0 },
+              { "average_in_range": { "value": null }, "doc_count": 0, "from": 19.0, "key": "19-20", "to": 20.0 },
+              { "average_in_range": { "value": null }, "doc_count": 0, "from": 20.0, "key": "20-*" }
            ]
          },
          "termagg": {
@@ -1436,3 +1436,46 @@ fn test_aggregation_on_json_object_mixed_numerical_segments() {
        )
    );
 }
+
+#[test]
+fn test_aggregation_field_validation_helper() {
+    // Test the standalone validation helper function for field validation
+    let index = get_test_index_2_segments(false).unwrap();
+    let reader = index.reader().unwrap();
+    let searcher = reader.searcher();
+    let segment_reader = searcher.segment_reader(0);
+
+    // Test with invalid field
+    let agg_req: Aggregations = serde_json::from_str(
+        r#"{
+        "avg_test": {
+            "avg": { "field": "nonexistent_field" }
+        }
+    }"#,
+    )
+    .unwrap();
+
+    let result =
+        crate::aggregation::agg_req::validate_aggregation_fields_exist(&agg_req, segment_reader);
+    assert!(result.is_err());
+    match result {
+        Err(crate::TantivyError::FieldNotFound(field_name)) => {
+            assert_eq!(field_name, "nonexistent_field");
+        }
+        _ => panic!("Expected FieldNotFound error, got: {:?}", result),
+    }
+
+    // Test with valid field
+    let agg_req: Aggregations = serde_json::from_str(
+        r#"{
+        "avg_test": {
+            "avg": { "field": "score" }
+        }
+    }"#,
+    )
+    .unwrap();
+
+    let result =
+        crate::aggregation::agg_req::validate_aggregation_fields_exist(&agg_req, segment_reader);
+    assert!(result.is_ok());
+}
--- a/src/aggregation/bucket/composite/accessors.rs
+++ b/src/aggregation/bucket/composite/accessors.rs
@@ -0,0 +1,518 @@
+use std::net::Ipv6Addr;
+
+use columnar::column_values::{CompactHit, CompactSpaceU64Accessor};
+use columnar::{Column, ColumnType, MonotonicallyMappableToU64, StrColumn, TermOrdHit};
+
+use crate::aggregation::accessor_helpers::get_numeric_or_date_column_types;
+use crate::aggregation::bucket::composite::numeric_types::num_proj;
+use crate::aggregation::bucket::composite::numeric_types::num_proj::ProjectedNumber;
+use crate::aggregation::bucket::composite::ToTypePaginationOrder;
+use crate::aggregation::bucket::{
+    parse_into_milliseconds, CalendarInterval, CompositeAggregation, CompositeAggregationSource,
+    MissingOrder, Order,
+};
+use crate::aggregation::intermediate_agg_result::CompositeIntermediateKey;
+use crate::{SegmentReader, TantivyError};
+
+/// Contains all information required by the SegmentCompositeCollector to perform the
+/// composite aggregation on a segment.
+pub struct CompositeAggReqData {
+    /// The name of the aggregation.
+    pub name: String,
+    /// The normalized term aggregation request.
+    pub req: CompositeAggregation,
+    /// Accessors for each source, each source can have multiple accessors (columns).
+    pub composite_accessors: Vec<CompositeSourceAccessors>,
+}
+
+impl CompositeAggReqData {
+    /// Estimate the memory consumption of this struct in bytes.
+    pub fn get_memory_consumption(&self) -> usize {
+        std::mem::size_of::<Self>()
+            + self.composite_accessors.len() * std::mem::size_of::<CompositeSourceAccessors>()
+    }
+}
+
+/// Accessors for a single column in a composite source.
+pub struct CompositeAccessor {
+    /// The fast field column
+    pub column: Column<u64>,
+    /// The column type
+    pub column_type: ColumnType,
+    /// Term dictionary if the column type is Str
+    ///
+    /// Only used by term sources
+    pub str_dict_column: Option<StrColumn>,
+    /// Parsed date interval for date histogram sources
+    pub date_histogram_interval: PrecomputedDateInterval,
+}
+
+/// Accessors to all the columns that belong to the field of a composite source.
+pub struct CompositeSourceAccessors {
+    /// The accessors for this source
+    pub accessors: Vec<CompositeAccessor>,
+    /// The key after which to start collecting results. Applies to the first
+    /// column of the source.
+    pub after_key: PrecomputedAfterKey,
+
+    /// The column index the after_key applies to. The after_key only applies to
+    /// one column. Columns before should be skipped. Columns after should be
+    /// kept without comparison to the after_key.
+    pub after_key_accessor_idx: usize,
+
+    /// Whether to skip missing values because of the after_key. Skipping only
+    /// applies if the value for previous columns were exactly equal to the
+    /// corresponding after keys (is_on_after_key).
+    pub skip_missing: bool,
+
+    /// The after key was set to null to indicate that the last collected key
+    /// was a missing value.
+    pub is_after_key_explicit_missing: bool,
+}
+
+impl CompositeSourceAccessors {
+    /// Creates a new set of accessors for the composite source.
+    ///
+    /// Precomputes some values to make collection faster.
+    pub fn build_for_source(
+        reader: &SegmentReader,
+        source: &CompositeAggregationSource,
+        // First option is None when no after key was set in the query, the
+        // second option is None when the after key was set but its value for
+        // this source was set to `null`
+        source_after_key_opt: Option<&CompositeIntermediateKey>,
+    ) -> crate::Result<Self> {
+        let is_after_key_explicit_missing = source_after_key_opt
+            .map(|after_key| matches!(after_key, CompositeIntermediateKey::Null))
+            .unwrap_or(false);
+        let mut skip_missing = false;
+        if let Some(CompositeIntermediateKey::Null) = source_after_key_opt {
+            if !source.missing_bucket() {
+                return Err(TantivyError::InvalidArgument(
+                    "the 'after' key for a source cannot be null when 'missing_bucket' is false"
+                        .to_string(),
+                ));
+            }
+        } else if source_after_key_opt.is_some() {
+            // if missing buckets come first and we have a non null after key, we skip missing
+            if MissingOrder::First == source.missing_order() {
+                skip_missing = true;
+            }
+            if MissingOrder::Default == source.missing_order() && Order::Asc == source.order() {
+                skip_missing = true;
+            }
+        };
+
+        match source {
+            CompositeAggregationSource::Terms(source) => {
+                let allowed_column_types = [
+                    ColumnType::I64,
+                    ColumnType::U64,
+                    ColumnType::F64,
+                    ColumnType::Str,
+                    ColumnType::DateTime,
+                    ColumnType::Bool,
+                    ColumnType::IpAddr,
+                    // ColumnType::Bytes Unsupported
+                ];
+                let mut columns_and_types = reader
+                    .fast_fields()
+                    .u64_lenient_for_type_all(Some(&allowed_column_types), &source.field)?;
+
+                // Sort columns by their pagination order and determine which to skip
+                columns_and_types.sort_by_key(|(_, col_type): &(Column, ColumnType)| {
+                    col_type.column_pagination_order()
+                });
+                if source.order == Order::Desc {
+                    columns_and_types.reverse();
+                }
+                let after_key_accessor_idx = find_first_column_to_collect(
+                    &columns_and_types,
+                    source_after_key_opt,
+                    source.missing_order,
+                    source.order,
+                )?;
+
+                let source_collectors: Vec<CompositeAccessor> = columns_and_types
+                    .into_iter()
+                    .map(|(column, column_type)| {
+                        Ok(CompositeAccessor {
+                            column,
+                            column_type,
+                            str_dict_column: reader.fast_fields().str(&source.field)?,
+                            date_histogram_interval: PrecomputedDateInterval::NotApplicable,
+                        })
+                    })
+                    .collect::<crate::Result<_>>()?;
+
+                let after_key = if let Some(first_col) =
+                    source_collectors.get(after_key_accessor_idx)
+                {
+                    match source_after_key_opt {
+                        Some(after_key) => PrecomputedAfterKey::precompute(
+                            first_col,
+                            after_key,
+                            &source.field,
+                            source.missing_order,
+                            source.order,
+                        )?,
+                        None => {
+                            precompute_missing_after_key(false, source.missing_order, source.order)
+                        }
+                    }
+                } else {
+                    // if no columns, we don't care about the after_key
+                    PrecomputedAfterKey::Next(0)
+                };
+
+                Ok(CompositeSourceAccessors {
+                    accessors: source_collectors,
+                    is_after_key_explicit_missing,
+                    skip_missing,
+                    after_key,
+                    after_key_accessor_idx,
+                })
+            }
+            CompositeAggregationSource::Histogram(source) => {
+                let column_and_types: Vec<(Column, ColumnType)> =
+                    reader.fast_fields().u64_lenient_for_type_all(
+                        Some(get_numeric_or_date_column_types()),
+                        &source.field,
+                    )?;
+                let source_collectors: Vec<CompositeAccessor> = column_and_types
+                    .into_iter()
+                    .map(|(column, column_type)| {
+                        Ok(CompositeAccessor {
+                            column,
+                            column_type,
+                            str_dict_column: None,
+                            date_histogram_interval: PrecomputedDateInterval::NotApplicable,
+                        })
+                    })
+                    .collect::<crate::Result<_>>()?;
+                let after_key = match source_after_key_opt {
+                    Some(CompositeIntermediateKey::F64(key)) => {
+                        let normalized_key = *key / source.interval;
+                        num_proj::f64_to_i64(normalized_key).into()
+                    }
+                    Some(CompositeIntermediateKey::Null) => {
+                        precompute_missing_after_key(true, source.missing_order, source.order)
+                    }
+                    None => precompute_missing_after_key(true, source.missing_order, source.order),
+                    _ => {
+                        return Err(crate::TantivyError::InvalidArgument(
+                            "After key type invalid for interval composite source".to_string(),
+                        ));
+                    }
+                };
+                Ok(CompositeSourceAccessors {
+                    accessors: source_collectors,
+                    is_after_key_explicit_missing,
+                    skip_missing,
+                    after_key,
+                    after_key_accessor_idx: 0,
+                })
+            }
+            CompositeAggregationSource::DateHistogram(source) => {
+                let column_and_types = reader
+                    .fast_fields()
+                    .u64_lenient_for_type_all(Some(&[ColumnType::DateTime]), &source.field)?;
+                let date_histogram_interval =
+                    PrecomputedDateInterval::from_date_histogram_source_intervals(
+                        &source.fixed_interval,
+                        source.calendar_interval,
+                    )?;
+                let source_collectors: Vec<CompositeAccessor> = column_and_types
+                    .into_iter()
+                    .map(|(column, column_type)| {
+                        Ok(CompositeAccessor {
+                            column,
+                            column_type,
+                            str_dict_column: None,
+                            date_histogram_interval,
+                        })
+                    })
+                    .collect::<crate::Result<_>>()?;
+                let after_key = match source_after_key_opt {
+                    Some(CompositeIntermediateKey::DateTime(key)) => {
+                        PrecomputedAfterKey::Exact(key.to_u64())
+                    }
+                    Some(CompositeIntermediateKey::Null) => {
+                        precompute_missing_after_key(true, source.missing_order, source.order)
+                    }
+                    None => precompute_missing_after_key(true, source.missing_order, source.order),
+                    _ => {
+                        return Err(crate::TantivyError::InvalidArgument(
+                            "After key type invalid for interval composite source".to_string(),
+                        ));
+                    }
+                };
+                Ok(CompositeSourceAccessors {
+                    accessors: source_collectors,
+                    is_after_key_explicit_missing,
+                    skip_missing,
+                    after_key,
+                    after_key_accessor_idx: 0,
+                })
+            }
+        }
+    }
+}
+
+/// Finds the index of the first column we should start collecting from to
+/// resume the pagination from the after_key.
+fn find_first_column_to_collect<T>(
+    sorted_columns: &[(T, ColumnType)],
+    after_key_opt: Option<&CompositeIntermediateKey>,
+    missing_order: MissingOrder,
+    order: Order,
+) -> crate::Result<usize> {
+    let after_key = match after_key_opt {
+        None => return Ok(0), // No pagination, start from beginning
+        Some(key) => key,
+    };
+    // Handle null after_key (we were on a missing value last time)
+    if matches!(after_key, CompositeIntermediateKey::Null) {
+        return match (missing_order, order) {
+            // Missing values come first, so all columns remain
+            (MissingOrder::First, _) | (MissingOrder::Default, Order::Asc) => Ok(0),
+            // Missing values come last, so all columns are done
+            (MissingOrder::Last, _) | (MissingOrder::Default, Order::Desc) => {
+                Ok(sorted_columns.len())
+            }
+        };
+    }
+    // Find the first column whose type order matches or follows the after_key's
+    // type in the pagination sequence
+    let after_key_column_order = after_key.column_pagination_order();
+    for (idx, (_, col_type)) in sorted_columns.iter().enumerate() {
+        let col_order = col_type.column_pagination_order();
+        let is_first_to_collect = match order {
+            Order::Asc => col_order >= after_key_column_order,
+            Order::Desc => col_order <= after_key_column_order,
+        };
+        if is_first_to_collect {
+            return Ok(idx);
+        }
+    }
+    // All columns are before the after_key, nothing left to collect
+    Ok(sorted_columns.len())
+}
+
+fn precompute_missing_after_key(
+    is_after_key_explicit_missing: bool,
+    missing_order: MissingOrder,
+    order: Order,
+) -> PrecomputedAfterKey {
+    let after_last = PrecomputedAfterKey::AfterLast;
+    let before_first = PrecomputedAfterKey::Next(0);
+    match (is_after_key_explicit_missing, missing_order, order) {
+        (true, MissingOrder::First, Order::Asc) => before_first,
+        (true, MissingOrder::First, Order::Desc) => after_last,
+        (true, MissingOrder::Last, Order::Asc) => after_last,
+        (true, MissingOrder::Last, Order::Desc) => before_first,
+        (true, MissingOrder::Default, Order::Asc) => before_first,
+        (true, MissingOrder::Default, Order::Desc) => after_last,
+        (false, _, Order::Asc) => before_first,
+        (false, _, Order::Desc) => after_last,
+    }
+}
+
+/// A parsed representation of the date interval for date histogram sources
+#[derive(Clone, Copy, Debug)]
+pub enum PrecomputedDateInterval {
+    /// This is not a date histogram source
+    NotApplicable,
+    /// Source was configured with a fixed interval
+    FixedNanoseconds(i64),
+    /// Source was configured with a calendar interval
+    Calendar(CalendarInterval),
+}
+
+impl PrecomputedDateInterval {
+    /// Validates the date histogram source interval fields and parses a date interval from them.
+    pub fn from_date_histogram_source_intervals(
+        fixed_interval: &Option<String>,
+        calendar_interval: Option<CalendarInterval>,
+    ) -> crate::Result<Self> {
+        match (fixed_interval, calendar_interval) {
+            (Some(_), Some(_)) | (None, None) => Err(TantivyError::InvalidArgument(
+                "date histogram source must one and only one of fixed_interval or \
+                 calendar_interval set"
+                    .to_string(),
+            )),
+            (Some(fixed_interval), None) => {
+                let fixed_interval_ms = parse_into_milliseconds(fixed_interval)?;
+                Ok(PrecomputedDateInterval::FixedNanoseconds(
+                    fixed_interval_ms * 1_000_000,
+                ))
+            }
+            (None, Some(calendar_interval)) => {
+                Ok(PrecomputedDateInterval::Calendar(calendar_interval))
+            }
+        }
+    }
+}
+
+/// The after key projected to the u64 column space
+///
+/// Some column types (term, IP) might not have an exact representation of the
+/// specified after key
+#[derive(Debug)]
+pub enum PrecomputedAfterKey {
+    /// The after key could be exactly represented in the column space.
+    Exact(u64),
+    /// The after key could not be exactly represented exactly represented, so
+    /// this is the next closest one.
+    Next(u64),
+    /// The after key could not be represented in the column space, it is
+    /// greater than all value
+    AfterLast,
+}
+
+impl From<CompactHit> for PrecomputedAfterKey {
+    fn from(hit: CompactHit) -> Self {
+        match hit {
+            CompactHit::Exact(ord) => PrecomputedAfterKey::Exact(ord as u64),
+            CompactHit::Next(ord) => PrecomputedAfterKey::Next(ord as u64),
+            CompactHit::AfterLast => PrecomputedAfterKey::AfterLast,
+        }
+    }
+}
+
+impl From<TermOrdHit> for PrecomputedAfterKey {
+    fn from(hit: TermOrdHit) -> Self {
+        match hit {
+            TermOrdHit::Exact(ord) => PrecomputedAfterKey::Exact(ord),
+            // TermOrdHit represents AfterLast as Next(u64::MAX), we keep it as is
+            TermOrdHit::Next(ord) => PrecomputedAfterKey::Next(ord),
+        }
+    }
+}
+
+impl<T: MonotonicallyMappableToU64> From<ProjectedNumber<T>> for PrecomputedAfterKey {
+    fn from(num: ProjectedNumber<T>) -> Self {
+        match num {
+            ProjectedNumber::Exact(number) => PrecomputedAfterKey::Exact(number.to_u64()),
+            ProjectedNumber::Next(number) => PrecomputedAfterKey::Next(number.to_u64()),
+            ProjectedNumber::AfterLast => PrecomputedAfterKey::AfterLast,
+        }
+    }
+}
+
+// /!\ These operators only makes sense if both values are in the same column space
+impl PrecomputedAfterKey {
+    pub fn equals(&self, column_value: u64) -> bool {
+        match self {
+            PrecomputedAfterKey::Exact(v) => *v == column_value,
+            PrecomputedAfterKey::Next(_) => false,
+            PrecomputedAfterKey::AfterLast => false,
+        }
+    }
+
+    pub fn gt(&self, column_value: u64) -> bool {
+        match self {
+            PrecomputedAfterKey::Exact(v) => *v > column_value,
+            PrecomputedAfterKey::Next(v) => *v > column_value,
+            PrecomputedAfterKey::AfterLast => true,
+        }
+    }
+
+    pub fn lt(&self, column_value: u64) -> bool {
+        match self {
+            PrecomputedAfterKey::Exact(v) => *v < column_value,
+            // a value equal to the next is greater than the after key
+            PrecomputedAfterKey::Next(v) => *v <= column_value,
+            PrecomputedAfterKey::AfterLast => false,
+        }
+    }
+
+    fn precompute_ip_addr(column: &Column<u64>, key: &Ipv6Addr) -> crate::Result<Self> {
+        let compact_space_accessor = column
+            .values
+            .clone()
+            .downcast_arc::<CompactSpaceU64Accessor>()
+            .map_err(|_| {
+                TantivyError::AggregationError(crate::aggregation::AggregationError::InternalError(
+                    "type mismatch: could not downcast to CompactSpaceU64Accessor".to_string(),
+                ))
+            })?;
+        let ip_u128 = key.to_bits();
+        let ip_next_compact = compact_space_accessor.u128_to_next_compact(ip_u128);
+        Ok(ip_next_compact.into())
+    }
+
+    fn precompute_term_ord(
+        str_dict_column: &Option<StrColumn>,
+        key: &str,
+        field: &str,
+    ) -> crate::Result<Self> {
+        let dict = str_dict_column
+            .as_ref()
+            .expect("dictionary missing for str accessor")
+            .dictionary();
+        let next_ord = dict.term_ord_or_next(key).map_err(|_| {
+            TantivyError::InvalidArgument(format!(
+                "failed to lookup after_key '{}' for field '{}'",
+                key, field
+            ))
+        })?;
+        Ok(next_ord.into())
+    }
+
+    /// Projects the after key into the column space of the given accessor.
+    ///
+    /// The computed after key will not take care of skipping entire columns
+    /// when the after key type is ordered after the accessor's type, that
+    /// should be performed earlier.
+    pub fn precompute(
+        composite_accessor: &CompositeAccessor,
+        source_after_key: &CompositeIntermediateKey,
+        field: &str,
+        missing_order: MissingOrder,
+        order: Order,
+    ) -> crate::Result<Self> {
+        use CompositeIntermediateKey as CIKey;
+        let precomputed_key = match (composite_accessor.column_type, source_after_key) {
+            (ColumnType::Bytes, _) => panic!("unsupported"),
+            // null after key
+            (_, CIKey::Null) => precompute_missing_after_key(false, missing_order, order),
+            // numerical
+            (ColumnType::I64, CIKey::I64(k)) => PrecomputedAfterKey::Exact(k.to_u64()),
+            (ColumnType::I64, CIKey::U64(k)) => num_proj::u64_to_i64(*k).into(),
+            (ColumnType::I64, CIKey::F64(k)) => num_proj::f64_to_i64(*k).into(),
+            (ColumnType::U64, CIKey::I64(k)) => num_proj::i64_to_u64(*k).into(),
+            (ColumnType::U64, CIKey::U64(k)) => PrecomputedAfterKey::Exact(*k),
+            (ColumnType::U64, CIKey::F64(k)) => num_proj::f64_to_u64(*k).into(),
+            (ColumnType::F64, CIKey::I64(k)) => num_proj::i64_to_f64(*k).into(),
+            (ColumnType::F64, CIKey::U64(k)) => num_proj::u64_to_f64(*k).into(),
+            (ColumnType::F64, CIKey::F64(k)) => PrecomputedAfterKey::Exact(k.to_u64()),
+            // boolean
+            (ColumnType::Bool, CIKey::Bool(key)) => PrecomputedAfterKey::Exact(key.to_u64()),
+            // string
+            (ColumnType::Str, CIKey::Str(key)) => PrecomputedAfterKey::precompute_term_ord(
+                &composite_accessor.str_dict_column,
+                key,
+                field,
+            )?,
+            // date time
+            (ColumnType::DateTime, CIKey::DateTime(key)) => {
+                PrecomputedAfterKey::Exact(key.to_u64())
+            }
+            // ip address
+            (ColumnType::IpAddr, CIKey::IpAddr(key)) => {
+                PrecomputedAfterKey::precompute_ip_addr(&composite_accessor.column, key)?
+            }
+            // assume the column's type is ordered after the after_key's type
+            _ => PrecomputedAfterKey::keep_all(order),
+        };
+        Ok(precomputed_key)
+    }
+
+    fn keep_all(order: Order) -> Self {
+        match order {
+            Order::Asc => PrecomputedAfterKey::Next(0),
+            Order::Desc => PrecomputedAfterKey::Next(u64::MAX),
+        }
+    }
+}
--- a/src/aggregation/bucket/composite/calendar_interval.rs
+++ b/src/aggregation/bucket/composite/calendar_interval.rs
@@ -0,0 +1,136 @@
+use time::convert::{Day, Nanosecond};
+use time::{Time, UtcDateTime};
+
+const NS_IN_DAY: i64 = Nanosecond::per_t::<i128>(Day) as i64;
+
+/// Computes the timestamp in nanoseconds corresponding to the beginning of the
+/// year (January 1st at midnight UTC).
+pub(super) fn try_year_bucket(timestamp_ns: i64) -> crate::Result<i64> {
+    year_bucket_using_time_crate(timestamp_ns).map_err(|e| {
+        crate::TantivyError::InvalidArgument(format!(
+            "Failed to compute year bucket for timestamp {}: {e}",
+            timestamp_ns
+        ))
+    })
+}
+
+/// Computes the timestamp in nanoseconds corresponding to the beginning of the
+/// month (1st at midnight UTC).
+pub(super) fn try_month_bucket(timestamp_ns: i64) -> crate::Result<i64> {
+    month_bucket_using_time_crate(timestamp_ns).map_err(|e| {
+        crate::TantivyError::InvalidArgument(format!(
+            "Failed to compute month bucket for timestamp {}: {e}",
+            timestamp_ns
+        ))
+    })
+}
+
+/// Computes the timestamp in nanoseconds corresponding to the beginning of the
+/// week (Monday at midnight UTC).
+pub(super) fn week_bucket(timestamp_ns: i64) -> i64 {
+    // 1970-01-01 was a Thursday (weekday = 4)
+    let days_since_epoch = timestamp_ns.div_euclid(NS_IN_DAY);
+    // Find the weekday: 0=Monday, ..., 6=Sunday
+    let weekday = (days_since_epoch + 3).rem_euclid(7);
+    let monday_days_since_epoch = days_since_epoch - weekday;
+    monday_days_since_epoch * NS_IN_DAY
+}
+
+fn year_bucket_using_time_crate(timestamp_ns: i64) -> Result<i64, time::Error> {
+    let timestamp_ns = UtcDateTime::from_unix_timestamp_nanos(timestamp_ns as i128)?
+        .replace_ordinal(1)?
+        .replace_time(Time::MIDNIGHT)
+        .unix_timestamp_nanos();
+    Ok(timestamp_ns as i64)
+}
+
+fn month_bucket_using_time_crate(timestamp_ns: i64) -> Result<i64, time::Error> {
+    let timestamp_ns = UtcDateTime::from_unix_timestamp_nanos(timestamp_ns as i128)?
+        .replace_day(1)?
+        .replace_time(Time::MIDNIGHT)
+        .unix_timestamp_nanos();
+    Ok(timestamp_ns as i64)
+}
+
+#[cfg(test)]
+mod tests {
+    use time::format_description::well_known::Iso8601;
+    use time::UtcDateTime;
+
+    use super::*;
+
+    fn ts_ns(iso: &str) -> i64 {
+        UtcDateTime::parse(iso, &Iso8601::DEFAULT)
+            .unwrap()
+            .unix_timestamp_nanos() as i64
+    }
+
+    #[test]
+    fn test_year_bucket() {
+        let ts = ts_ns("1970-01-01T00:00:00Z");
+        let res = try_year_bucket(ts).unwrap();
+        assert_eq!(res, ts_ns("1970-01-01T00:00:00Z"));
+
+        let ts = ts_ns("1970-06-01T10:00:01.010Z");
+        let res = try_year_bucket(ts).unwrap();
+        assert_eq!(res, ts_ns("1970-01-01T00:00:00Z"));
+
+        let ts = ts_ns("2008-12-31T23:59:59.999999999Z"); // leap year
+        let res = try_year_bucket(ts).unwrap();
+        assert_eq!(res, ts_ns("2008-01-01T00:00:00Z"));
+
+        let ts = ts_ns("2008-01-01T00:00:00Z"); // leap year
+        let res = try_year_bucket(ts).unwrap();
+        assert_eq!(res, ts_ns("2008-01-01T00:00:00Z"));
+
+        let ts = ts_ns("2010-12-31T23:59:59.999999999Z");
+        let res = try_year_bucket(ts).unwrap();
+        assert_eq!(res, ts_ns("2010-01-01T00:00:00Z"));
+
+        let ts = ts_ns("1972-06-01T00:10:00Z");
+        let res = try_year_bucket(ts).unwrap();
+        assert_eq!(res, ts_ns("1972-01-01T00:00:00Z"));
+    }
+
+    #[test]
+    fn test_month_bucket() {
+        let ts = ts_ns("1970-01-15T00:00:00Z");
+        let res = try_month_bucket(ts).unwrap();
+        assert_eq!(res, ts_ns("1970-01-01T00:00:00Z"));
+
+        let ts = ts_ns("1970-02-01T00:00:00Z");
+        let res = try_month_bucket(ts).unwrap();
+        assert_eq!(res, ts_ns("1970-02-01T00:00:00Z"));
+
+        let ts = ts_ns("2000-01-31T23:59:59.999999999Z");
+        let res = try_month_bucket(ts).unwrap();
+        assert_eq!(res, ts_ns("2000-01-01T00:00:00Z"));
+    }
+
+    #[test]
+    fn test_week_bucket() {
+        let ts = ts_ns("1970-01-05T00:00:00Z"); // Monday
+        let res = week_bucket(ts);
+        assert_eq!(res, ts_ns("1970-01-05T00:00:00Z"));
+
+        let ts = ts_ns("1970-01-05T23:59:59Z"); // Monday
+        let res = week_bucket(ts);
+        assert_eq!(res, ts_ns("1970-01-05T00:00:00Z"));
+
+        let ts = ts_ns("1970-01-07T01:13:00Z"); // Wednesday
+        let res = week_bucket(ts);
+        assert_eq!(res, ts_ns("1970-01-05T00:00:00Z"));
+
+        let ts = ts_ns("1970-01-11T23:59:59.999999999Z"); // Sunday
+        let res = week_bucket(ts);
+        assert_eq!(res, ts_ns("1970-01-05T00:00:00Z"));
+
+        let ts = ts_ns("2025-10-16T10:41:59.010Z"); // Thursday
+        let res = week_bucket(ts);
+        assert_eq!(res, ts_ns("2025-10-13T00:00:00Z"));
+
+        let ts = ts_ns("1970-01-01T00:00:00Z"); // Thursday
+        let res = week_bucket(ts);
+        assert_eq!(res, ts_ns("1969-12-29T00:00:00Z")); // Negative
+    }
+}
--- a/src/aggregation/bucket/composite/collector.rs
+++ b/src/aggregation/bucket/composite/collector.rs
@@ -0,0 +1,660 @@
+use std::fmt::Debug;
+use std::mem;
+use std::net::Ipv6Addr;
+
+use columnar::column_values::CompactSpaceU64Accessor;
+use columnar::{
+    Column, ColumnType, Dictionary, MonotonicallyMappableToU128, MonotonicallyMappableToU64,
+    NumericalValue, StrColumn,
+};
+use rustc_hash::FxHashMap;
+use smallvec::SmallVec;
+
+use crate::aggregation::agg_data::{
+    build_segment_agg_collectors, AggRefNode, AggregationsSegmentCtx,
+};
+use crate::aggregation::bucket::composite::accessors::{
+    CompositeAccessor, CompositeAggReqData, PrecomputedDateInterval,
+};
+use crate::aggregation::bucket::composite::calendar_interval;
+use crate::aggregation::bucket::composite::map::{DynArrayHeapMap, MAX_DYN_ARRAY_SIZE};
+use crate::aggregation::bucket::{
+    CalendarInterval, CompositeAggregationSource, MissingOrder, Order,
+};
+use crate::aggregation::buffered_sub_aggs::{BufferedSubAggs, HighCardSubAggBuffer};
+use crate::aggregation::intermediate_agg_result::{
+    CompositeIntermediateKey, IntermediateAggregationResult, IntermediateAggregationResults,
+    IntermediateBucketResult, IntermediateCompositeBucketEntry, IntermediateCompositeBucketResult,
+};
+use crate::aggregation::segment_agg_result::{BucketIdProvider, SegmentAggregationCollector};
+use crate::aggregation::BucketId;
+use crate::TantivyError;
+
+#[derive(Clone, Debug)]
+struct CompositeBucketCollector {
+    count: u32,
+    bucket_id: BucketId,
+}
+
+/// Compact sortable representation of a single source value within a composite key.
+///
+/// The struct encodes both the column identity and the fast field value in a way
+/// that preserves the desired sort order via the derived `Ord` implementation
+/// (fields are compared top-to-bottom: `sort_key` first, then `encoded_value`).
+///
+/// ## `sort_key` encoding
+/// - `0` — missing value, sorted first
+/// - `1..=254` — present value; the original accessor index is `sort_key - 1`
+/// - `u8::MAX` (255) — missing value, sorted last
+///
+/// ## `encoded_value` encoding
+/// - `0` when the field is missing
+/// - The raw u64 fast-field representation when order is ascending
+/// - Bitwise NOT of the raw u64 when order is descending
+#[derive(Clone, Copy, Debug, PartialEq, Eq, PartialOrd, Ord, Default, Hash)]
+struct InternalValueRepr {
+    /// Column index biased by +1 (so 0 and u8::MAX are reserved for missing sentinels).
+    sort_key: u8,
+    /// Fast field value, possibly bit-flipped for descending order.
+    encoded_value: u64,
+}
+
+impl InternalValueRepr {
+    #[inline]
+    fn new_term(raw: u64, accessor_idx: u8, order: Order) -> Self {
+        let encoded_value = match order {
+            Order::Asc => raw,
+            Order::Desc => !raw,
+        };
+        InternalValueRepr {
+            sort_key: accessor_idx + 1,
+            encoded_value,
+        }
+    }
+
+    /// For histogram sources the column index is irrelevant (always 1).
+    #[inline]
+    fn new_histogram(raw: u64, order: Order) -> Self {
+        let encoded_value = match order {
+            Order::Asc => raw,
+            Order::Desc => !raw,
+        };
+        InternalValueRepr {
+            sort_key: 1,
+            encoded_value,
+        }
+    }
+
+    #[inline]
+    fn new_missing(order: Order, missing_order: MissingOrder) -> Self {
+        let sort_key = match (missing_order, order) {
+            (MissingOrder::First, _) | (MissingOrder::Default, Order::Asc) => 0,
+            (MissingOrder::Last, _) | (MissingOrder::Default, Order::Desc) => u8::MAX,
+        };
+        InternalValueRepr {
+            sort_key,
+            encoded_value: 0,
+        }
+    }
+
+    /// Decode back to `(accessor_idx, raw_value)`.
+    /// Returns `None` when the value represents a missing field.
+    #[inline]
+    fn decode(self, order: Order) -> Option<(u8, u64)> {
+        if self.sort_key == 0 || self.sort_key == u8::MAX {
+            return None;
+        }
+        let raw = match order {
+            Order::Asc => self.encoded_value,
+            Order::Desc => !self.encoded_value,
+        };
+        Some((self.sort_key - 1, raw))
+    }
+}
+
+/// The collector puts values from the fast field into the correct buckets and
+/// does a conversion to the correct datatype.
+#[derive(Debug)]
+pub struct SegmentCompositeCollector {
+    /// One DynArrayHeapMap per parent bucket.
+    parent_buckets: Vec<DynArrayHeapMap<InternalValueRepr, CompositeBucketCollector>>,
+    accessor_idx: usize,
+    sub_agg: Option<BufferedSubAggs<HighCardSubAggBuffer>>,
+    bucket_id_provider: BucketIdProvider,
+    /// Number of sources, needed when creating new DynArrayHeapMaps.
+    num_sources: usize,
+}
+
+impl SegmentAggregationCollector for SegmentCompositeCollector {
+    fn add_intermediate_aggregation_result(
+        &mut self,
+        agg_data: &AggregationsSegmentCtx,
+        results: &mut IntermediateAggregationResults,
+        parent_bucket_id: BucketId,
+    ) -> crate::Result<()> {
+        let name = agg_data
+            .get_composite_req_data(self.accessor_idx)
+            .name
+            .clone();
+
+        let buckets = self.add_intermediate_bucket_result(agg_data, parent_bucket_id)?;
+        results.push(
+            name,
+            IntermediateAggregationResult::Bucket(IntermediateBucketResult::Composite { buckets }),
+        )?;
+
+        Ok(())
+    }
+
+    fn collect(
+        &mut self,
+        parent_bucket_id: BucketId,
+        docs: &[crate::DocId],
+        agg_data: &mut AggregationsSegmentCtx,
+    ) -> crate::Result<()> {
+        let mem_pre = self.get_memory_consumption(parent_bucket_id);
+        let composite_agg_data = agg_data.take_composite_req_data(self.accessor_idx);
+
+        for doc in docs {
+            let mut visitor = CompositeKeyVisitor {
+                doc_id: *doc,
+                composite_agg_data: &composite_agg_data,
+                buckets: &mut self.parent_buckets[parent_bucket_id as usize],
+                sub_agg: &mut self.sub_agg,
+                bucket_id_provider: &mut self.bucket_id_provider,
+                sub_level_values: SmallVec::new(),
+            };
+            visitor.visit(0, true)?;
+        }
+        agg_data.put_back_composite_req_data(self.accessor_idx, composite_agg_data);
+
+        if let Some(sub_agg) = &mut self.sub_agg {
+            sub_agg.check_flush_local(agg_data)?;
+        }
+
+        let mem_delta = self.get_memory_consumption(parent_bucket_id) - mem_pre;
+        if mem_delta > 0 {
+            agg_data.context.limits.add_memory_consumed(mem_delta)?;
+        }
+
+        Ok(())
+    }
+
+    fn flush(&mut self, agg_data: &mut AggregationsSegmentCtx) -> crate::Result<()> {
+        if let Some(sub_agg) = &mut self.sub_agg {
+            sub_agg.flush(agg_data)?;
+        }
+        Ok(())
+    }
+
+    fn prepare_max_bucket(
+        &mut self,
+        max_bucket: BucketId,
+        _agg_data: &AggregationsSegmentCtx,
+    ) -> crate::Result<()> {
+        let required_len = max_bucket as usize + 1;
+        while self.parent_buckets.len() < required_len {
+            let map = DynArrayHeapMap::try_new(self.num_sources)?;
+            self.parent_buckets.push(map);
+        }
+        Ok(())
+    }
+
+    fn compute_metric_value(
+        &self,
+        _bucket_id: BucketId,
+        _sub_agg_name: &str,
+        _sub_agg_property: &str,
+        _agg_data: &AggregationsSegmentCtx,
+    ) -> Option<f64> {
+        // Composite is a multi-bucket agg with no single value to extract.
+        None
+    }
+}
+
+impl SegmentCompositeCollector {
+    fn get_memory_consumption(&self, parent_bucket_id: BucketId) -> u64 {
+        self.parent_buckets[parent_bucket_id as usize].memory_consumption()
+    }
+
+    pub(crate) fn from_req_and_validate(
+        req_data: &mut AggregationsSegmentCtx,
+        node: &AggRefNode,
+    ) -> crate::Result<Self> {
+        validate_req(req_data, node.idx_in_req_data)?;
+
+        let has_sub_aggregations = !node.children.is_empty();
+        let sub_agg = if has_sub_aggregations {
+            let sub_agg_collector = build_segment_agg_collectors(req_data, &node.children)?;
+            Some(BufferedSubAggs::new(sub_agg_collector))
+        } else {
+            None
+        };
+
+        let composite_req_data = req_data.get_composite_req_data(node.idx_in_req_data);
+        let num_sources = composite_req_data.req.sources.len();
+
+        Ok(SegmentCompositeCollector {
+            parent_buckets: vec![DynArrayHeapMap::try_new(num_sources)?],
+            accessor_idx: node.idx_in_req_data,
+            sub_agg,
+            bucket_id_provider: BucketIdProvider::default(),
+            num_sources,
+        })
+    }
+
+    #[inline]
+    fn add_intermediate_bucket_result(
+        &mut self,
+        agg_data: &AggregationsSegmentCtx,
+        parent_bucket_id: BucketId,
+    ) -> crate::Result<IntermediateCompositeBucketResult> {
+        let empty_map = DynArrayHeapMap::try_new(self.num_sources)?;
+        let heap_map = mem::replace(
+            &mut self.parent_buckets[parent_bucket_id as usize],
+            empty_map,
+        );
+
+        let mut dict: FxHashMap<Vec<CompositeIntermediateKey>, IntermediateCompositeBucketEntry> =
+            Default::default();
+        dict.reserve(heap_map.size());
+        let composite_data = agg_data.get_composite_req_data(self.accessor_idx);
+        for (key_internal_repr, agg) in heap_map.into_iter() {
+            let key = resolve_key(&key_internal_repr, composite_data)?;
+            let mut sub_aggregation_res = IntermediateAggregationResults::default();
+            if let Some(sub_agg) = &mut self.sub_agg {
+                sub_agg
+                    .get_sub_agg_collector()
+                    .add_intermediate_aggregation_result(
+                        agg_data,
+                        &mut sub_aggregation_res,
+                        agg.bucket_id,
+                    )?;
+            }
+
+            dict.insert(
+                key,
+                IntermediateCompositeBucketEntry {
+                    doc_count: agg.count,
+                    sub_aggregation: sub_aggregation_res,
+                },
+            );
+        }
+
+        Ok(IntermediateCompositeBucketResult {
+            entries: dict,
+            target_size: composite_data.req.size,
+            orders: composite_data
+                .req
+                .sources
+                .iter()
+                .map(|source| match source {
+                    CompositeAggregationSource::Terms(t) => (t.order, t.missing_order),
+                    CompositeAggregationSource::Histogram(h) => (h.order, h.missing_order),
+                    CompositeAggregationSource::DateHistogram(d) => (d.order, d.missing_order),
+                })
+                .collect(),
+        })
+    }
+}
+
+fn validate_req(req_data: &mut AggregationsSegmentCtx, accessor_idx: usize) -> crate::Result<()> {
+    let composite_data = req_data.get_composite_req_data(accessor_idx);
+    let req = &composite_data.req;
+    if req.sources.is_empty() {
+        return Err(TantivyError::InvalidArgument(
+            "composite aggregation must have at least one source".to_string(),
+        ));
+    }
+    if req.size == 0 {
+        return Err(TantivyError::InvalidArgument(
+            "composite aggregation 'size' must be > 0".to_string(),
+        ));
+    }
+
+    if composite_data.composite_accessors.len() > MAX_DYN_ARRAY_SIZE {
+        return Err(TantivyError::InvalidArgument(format!(
+            "composite aggregation source supports maximum {MAX_DYN_ARRAY_SIZE} sources",
+        )));
+    }
+
+    let column_types_for_sources = composite_data.composite_accessors.iter().map(|item| {
+        item.accessors
+            .iter()
+            .map(|a| a.column_type)
+            .collect::<Vec<_>>()
+    });
+
+    for column_types in column_types_for_sources {
+        if column_types.contains(&ColumnType::Bytes) {
+            return Err(TantivyError::InvalidArgument(
+                "composite aggregation does not support 'bytes' field type".to_string(),
+            ));
+        }
+    }
+    Ok(())
+}
+
+fn collect_bucket_with_limit(
+    doc_id: crate::DocId,
+    limit_num_buckets: usize,
+    buckets: &mut DynArrayHeapMap<InternalValueRepr, CompositeBucketCollector>,
+    key: &[InternalValueRepr],
+    sub_agg: &mut Option<BufferedSubAggs<HighCardSubAggBuffer>>,
+    bucket_id_provider: &mut BucketIdProvider,
+) {
+    let mut record_in_bucket = |bucket: &mut CompositeBucketCollector| {
+        bucket.count += 1;
+        if let Some(sub_agg) = sub_agg {
+            sub_agg.push(bucket.bucket_id, doc_id);
+        }
+    };
+
+    // We still have room for buckets, just insert
+    if buckets.size() < limit_num_buckets {
+        let bucket = buckets.get_or_insert_with(key, || CompositeBucketCollector {
+            count: 0,
+            bucket_id: bucket_id_provider.next_bucket_id(),
+        });
+        record_in_bucket(bucket);
+        return;
+    }
+
+    // Map is full, but we can still update the bucket if it already exists
+    if let Some(bucket) = buckets.get_mut(key) {
+        record_in_bucket(bucket);
+        return;
+    }
+
+    // Check if the item qualifies to enter the top-k, and evict the highest if it does
+    if let Some(highest_key) = buckets.peek_highest() {
+        if key < highest_key {
+            buckets.evict_highest();
+            let bucket = buckets.get_or_insert_with(key, || CompositeBucketCollector {
+                count: 0,
+                bucket_id: bucket_id_provider.next_bucket_id(),
+            });
+            record_in_bucket(bucket);
+        }
+    }
+}
+
+/// Converts the composite key from its internal column space representation
+/// (segment specific) into its intermediate form.
+fn resolve_key(
+    internal_key: &[InternalValueRepr],
+    agg_data: &CompositeAggReqData,
+) -> crate::Result<Vec<CompositeIntermediateKey>> {
+    internal_key
+        .iter()
+        .enumerate()
+        .map(|(idx, val)| {
+            resolve_internal_value_repr(
+                *val,
+                &agg_data.req.sources[idx],
+                &agg_data.composite_accessors[idx].accessors,
+            )
+        })
+        .collect()
+}
+
+fn resolve_internal_value_repr(
+    internal_value_repr: InternalValueRepr,
+    source: &CompositeAggregationSource,
+    composite_accessors: &[CompositeAccessor],
+) -> crate::Result<CompositeIntermediateKey> {
+    let decoded_value_opt = match source {
+        CompositeAggregationSource::Terms(source) => internal_value_repr.decode(source.order),
+        CompositeAggregationSource::Histogram(source) => internal_value_repr.decode(source.order),
+        CompositeAggregationSource::DateHistogram(source) => {
+            internal_value_repr.decode(source.order)
+        }
+    };
+    let Some((decoded_accessor_idx, val)) = decoded_value_opt else {
+        return Ok(CompositeIntermediateKey::Null);
+    };
+    let key = match source {
+        CompositeAggregationSource::Terms(_) => {
+            let CompositeAccessor {
+                column_type,
+                str_dict_column,
+                column,
+                ..
+            } = &composite_accessors[decoded_accessor_idx as usize];
+            resolve_term(val, column_type, str_dict_column, column)?
+        }
+        CompositeAggregationSource::Histogram(source) => {
+            CompositeIntermediateKey::F64(i64::from_u64(val) as f64 * source.interval)
+        }
+        CompositeAggregationSource::DateHistogram(_) => {
+            CompositeIntermediateKey::DateTime(i64::from_u64(val))
+        }
+    };
+
+    Ok(key)
+}
+
+fn resolve_term(
+    val: u64,
+    column_type: &ColumnType,
+    str_dict_column: &Option<StrColumn>,
+    column: &Column,
+) -> crate::Result<CompositeIntermediateKey> {
+    let key = if *column_type == ColumnType::Str {
+        let fallback_dict = Dictionary::empty();
+        let term_dict = str_dict_column
+            .as_ref()
+            .map(|el| el.dictionary())
+            .unwrap_or_else(|| &fallback_dict);
+
+        let mut buffer = Vec::new();
+        term_dict.ord_to_term(val, &mut buffer)?;
+        CompositeIntermediateKey::Str(
+            String::from_utf8(buffer.to_vec()).expect("could not convert to String"),
+        )
+    } else if *column_type == ColumnType::DateTime {
+        let val = i64::from_u64(val);
+        CompositeIntermediateKey::DateTime(val)
+    } else if *column_type == ColumnType::Bool {
+        let val = bool::from_u64(val);
+        CompositeIntermediateKey::Bool(val)
+    } else if *column_type == ColumnType::IpAddr {
+        let compact_space_accessor = column
+            .values
+            .clone()
+            .downcast_arc::<CompactSpaceU64Accessor>()
+            .map_err(|_| {
+                TantivyError::AggregationError(crate::aggregation::AggregationError::InternalError(
+                    "Type mismatch: Could not downcast to CompactSpaceU64Accessor".to_string(),
+                ))
+            })?;
+        let val: u128 = compact_space_accessor.compact_to_u128(val as u32);
+        let val = Ipv6Addr::from_u128(val);
+        CompositeIntermediateKey::IpAddr(val)
+    } else if *column_type == ColumnType::U64 {
+        CompositeIntermediateKey::U64(val)
+    } else if *column_type == ColumnType::I64 {
+        CompositeIntermediateKey::I64(i64::from_u64(val))
+    } else {
+        let val = f64::from_u64(val);
+        let val: NumericalValue = val.into();
+
+        match val.normalize() {
+            NumericalValue::U64(val) => CompositeIntermediateKey::U64(val),
+            NumericalValue::I64(val) => CompositeIntermediateKey::I64(val),
+            NumericalValue::F64(val) => CompositeIntermediateKey::F64(val),
+        }
+    };
+    Ok(key)
+}
+
+/// Browse through the cardinal product obtained by the different values of the doc composite key
+/// sources.
+///
+/// For each of those tuple-key, that are after the limit key, we call collect_bucket_with_limit.
+struct CompositeKeyVisitor<'a> {
+    doc_id: crate::DocId,
+    composite_agg_data: &'a CompositeAggReqData,
+    buckets: &'a mut DynArrayHeapMap<InternalValueRepr, CompositeBucketCollector>,
+    sub_agg: &'a mut Option<BufferedSubAggs<HighCardSubAggBuffer>>,
+    bucket_id_provider: &'a mut BucketIdProvider,
+    sub_level_values: SmallVec<[InternalValueRepr; MAX_DYN_ARRAY_SIZE]>,
+}
+
+impl CompositeKeyVisitor<'_> {
+    /// Depth-first walk of the accessors to build the composite key combinations
+    /// and update the buckets.
+    ///
+    /// `source_idx` is the current source index in the recursion.
+    /// `is_on_after_key` tracks whether we still need to consider the after_key
+    /// for pruning at this level and below.
+    fn visit(&mut self, source_idx: usize, is_on_after_key: bool) -> crate::Result<()> {
+        if source_idx == self.composite_agg_data.req.sources.len() {
+            if !is_on_after_key {
+                collect_bucket_with_limit(
+                    self.doc_id,
+                    self.composite_agg_data.req.size as usize,
+                    self.buckets,
+                    &self.sub_level_values,
+                    self.sub_agg,
+                    self.bucket_id_provider,
+                );
+            }
+            return Ok(());
+        }
+
+        let current_level_accessors = &self.composite_agg_data.composite_accessors[source_idx];
+        let current_level_source = &self.composite_agg_data.req.sources[source_idx];
+        let mut missing = true;
+        for (accessor_idx, accessor) in current_level_accessors.accessors.iter().enumerate() {
+            let values = accessor.column.values_for_doc(self.doc_id);
+            for value in values {
+                missing = false;
+                match current_level_source {
+                    CompositeAggregationSource::Terms(_) => {
+                        let preceeds_after_key_type =
+                            accessor_idx < current_level_accessors.after_key_accessor_idx;
+                        if is_on_after_key && preceeds_after_key_type {
+                            break;
+                        }
+                        let matches_after_key_type =
+                            accessor_idx == current_level_accessors.after_key_accessor_idx;
+
+                        if matches_after_key_type && is_on_after_key {
+                            let should_skip = match current_level_source.order() {
+                                Order::Asc => current_level_accessors.after_key.gt(value),
+                                Order::Desc => current_level_accessors.after_key.lt(value),
+                            };
+                            if should_skip {
+                                continue;
+                            }
+                        }
+                        self.sub_level_values.push(InternalValueRepr::new_term(
+                            value,
+                            accessor_idx as u8,
+                            current_level_source.order(),
+                        ));
+                        let still_on_after_key = matches_after_key_type
+                            && current_level_accessors.after_key.equals(value);
+                        self.visit(source_idx + 1, is_on_after_key && still_on_after_key)?;
+                        self.sub_level_values.pop();
+                    }
+                    CompositeAggregationSource::Histogram(source) => {
+                        let float_value = match accessor.column_type {
+                            ColumnType::U64 => value as f64,
+                            ColumnType::I64 => i64::from_u64(value) as f64,
+                            ColumnType::DateTime => i64::from_u64(value) as f64 / 1_000_000.,
+                            ColumnType::F64 => f64::from_u64(value),
+                            _ => {
+                                panic!(
+                                    "unexpected type {:?}. This should not happen",
+                                    accessor.column_type
+                                )
+                            }
+                        };
+                        let bucket_index = (float_value / source.interval).floor() as i64;
+                        let bucket_value = i64::to_u64(bucket_index);
+                        if is_on_after_key {
+                            let should_skip = match current_level_source.order() {
+                                Order::Asc => current_level_accessors.after_key.gt(bucket_value),
+                                Order::Desc => current_level_accessors.after_key.lt(bucket_value),
+                            };
+                            if should_skip {
+                                continue;
+                            }
+                        }
+                        self.sub_level_values.push(InternalValueRepr::new_histogram(
+                            bucket_value,
+                            current_level_source.order(),
+                        ));
+                        let still_on_after_key =
+                            current_level_accessors.after_key.equals(bucket_value);
+                        self.visit(source_idx + 1, is_on_after_key && still_on_after_key)?;
+                        self.sub_level_values.pop();
+                    }
+                    CompositeAggregationSource::DateHistogram(_) => {
+                        let value_ns = match accessor.column_type {
+                            ColumnType::DateTime => i64::from_u64(value),
+                            _ => {
+                                panic!(
+                                    "unexpected type {:?}. This should not happen",
+                                    accessor.column_type
+                                )
+                            }
+                        };
+                        let bucket_index = match accessor.date_histogram_interval {
+                            PrecomputedDateInterval::FixedNanoseconds(fixed_interval_ns) => {
+                                (value_ns / fixed_interval_ns) * fixed_interval_ns
+                            }
+                            PrecomputedDateInterval::Calendar(CalendarInterval::Year) => {
+                                calendar_interval::try_year_bucket(value_ns)?
+                            }
+                            PrecomputedDateInterval::Calendar(CalendarInterval::Month) => {
+                                calendar_interval::try_month_bucket(value_ns)?
+                            }
+                            PrecomputedDateInterval::Calendar(CalendarInterval::Week) => {
+                                calendar_interval::week_bucket(value_ns)
+                            }
+                            PrecomputedDateInterval::NotApplicable => {
+                                panic!("interval not precomputed for date histogram source")
+                            }
+                        };
+                        let bucket_value = i64::to_u64(bucket_index);
+                        if is_on_after_key {
+                            let should_skip = match current_level_source.order() {
+                                Order::Asc => current_level_accessors.after_key.gt(bucket_value),
+                                Order::Desc => current_level_accessors.after_key.lt(bucket_value),
+                            };
+                            if should_skip {
+                                continue;
+                            }
+                        }
+                        self.sub_level_values.push(InternalValueRepr::new_histogram(
+                            bucket_value,
+                            current_level_source.order(),
+                        ));
+                        let still_on_after_key =
+                            current_level_accessors.after_key.equals(bucket_value);
+                        self.visit(source_idx + 1, is_on_after_key && still_on_after_key)?;
+                        self.sub_level_values.pop();
+                    }
+                };
+            }
+        }
+        if missing && current_level_source.missing_bucket() {
+            if is_on_after_key && current_level_accessors.skip_missing {
+                return Ok(());
+            }
+            self.sub_level_values.push(InternalValueRepr::new_missing(
+                current_level_source.order(),
+                current_level_source.missing_order(),
+            ));
+            self.visit(
+                source_idx + 1,
+                is_on_after_key && current_level_accessors.is_after_key_explicit_missing,
+            )?;
+            self.sub_level_values.pop();
+        }
+        Ok(())
+    }
+}
--- a/src/aggregation/bucket/composite/map.rs
+++ b/src/aggregation/bucket/composite/map.rs
@@ -0,0 +1,329 @@
+use std::collections::BinaryHeap;
+use std::fmt::Debug;
+use std::hash::Hash;
+
+use rustc_hash::FxHashMap;
+use smallvec::SmallVec;
+
+use crate::TantivyError;
+
+/// Map backed by a hash map for fast access and a binary heap to track the
+/// highest key. The key is an array of fixed size S.
+#[derive(Clone, Debug)]
+struct ArrayHeapMap<K: Ord, V, const S: usize> {
+    pub(crate) buckets: FxHashMap<[K; S], V>,
+    pub(crate) heap: BinaryHeap<[K; S]>,
+}
+
+impl<K: Ord, V, const S: usize> Default for ArrayHeapMap<K, V, S> {
+    fn default() -> Self {
+        ArrayHeapMap {
+            buckets: FxHashMap::default(),
+            heap: BinaryHeap::default(),
+        }
+    }
+}
+
+impl<K: Eq + Hash + Clone + Ord, V, const S: usize> ArrayHeapMap<K, V, S> {
+    /// Panics if the length of `key` is not S.
+    fn get_or_insert_with<F: FnOnce() -> V>(&mut self, key: &[K], f: F) -> &mut V {
+        let key_array: &[K; S] = key.try_into().expect("Key length mismatch");
+        self.buckets.entry(key_array.clone()).or_insert_with(|| {
+            self.heap.push(key_array.clone());
+            f()
+        })
+    }
+
+    /// Panics if the length of `key` is not S.
+    fn get_mut(&mut self, key: &[K]) -> Option<&mut V> {
+        let key_array: &[K; S] = key.try_into().expect("Key length mismatch");
+        self.buckets.get_mut(key_array)
+    }
+
+    fn peek_highest(&self) -> Option<&[K]> {
+        self.heap.peek().map(|k_array| k_array.as_slice())
+    }
+
+    fn evict_highest(&mut self) {
+        if let Some(highest) = self.heap.pop() {
+            self.buckets.remove(&highest);
+        }
+    }
+
+    fn memory_consumption(&self) -> u64 {
+        let key_size = std::mem::size_of::<[K; S]>();
+        let map_size = (key_size + std::mem::size_of::<V>()) * self.buckets.capacity();
+        let heap_size = key_size * self.heap.capacity();
+        (map_size + heap_size) as u64
+    }
+}
+
+impl<K: Copy + Ord + Clone + 'static, V: 'static, const S: usize> ArrayHeapMap<K, V, S> {
+    fn into_iter(self) -> Box<dyn Iterator<Item = (SmallVec<[K; MAX_DYN_ARRAY_SIZE]>, V)>> {
+        Box::new(
+            self.buckets
+                .into_iter()
+                .map(|(k, v)| (SmallVec::from_slice(&k), v)),
+        )
+    }
+}
+
+pub(super) const MAX_DYN_ARRAY_SIZE: usize = 16;
+const MAX_DYN_ARRAY_SIZE_PLUS_ONE: usize = MAX_DYN_ARRAY_SIZE + 1;
+
+/// A map optimized for memory footprint, fast access and efficient eviction of
+/// the highest key.
+///
+/// Keys are inlined arrays of size 1 to [MAX_DYN_ARRAY_SIZE] but for a given
+/// instance the key size is fixed. This allows to avoid heap allocations for the
+/// keys.
+#[derive(Clone, Debug)]
+pub(super) struct DynArrayHeapMap<K: Ord, V>(DynArrayHeapMapInner<K, V>);
+
+/// Wrapper around ArrayHeapMap to dynamically dispatch on the array size.
+#[derive(Clone, Debug)]
+enum DynArrayHeapMapInner<K: Ord, V> {
+    Dim1(ArrayHeapMap<K, V, 1>),
+    Dim2(ArrayHeapMap<K, V, 2>),
+    Dim3(ArrayHeapMap<K, V, 3>),
+    Dim4(ArrayHeapMap<K, V, 4>),
+    Dim5(ArrayHeapMap<K, V, 5>),
+    Dim6(ArrayHeapMap<K, V, 6>),
+    Dim7(ArrayHeapMap<K, V, 7>),
+    Dim8(ArrayHeapMap<K, V, 8>),
+    Dim9(ArrayHeapMap<K, V, 9>),
+    Dim10(ArrayHeapMap<K, V, 10>),
+    Dim11(ArrayHeapMap<K, V, 11>),
+    Dim12(ArrayHeapMap<K, V, 12>),
+    Dim13(ArrayHeapMap<K, V, 13>),
+    Dim14(ArrayHeapMap<K, V, 14>),
+    Dim15(ArrayHeapMap<K, V, 15>),
+    Dim16(ArrayHeapMap<K, V, 16>),
+}
+
+impl<K: Ord, V> DynArrayHeapMap<K, V> {
+    /// Creates a new heap map with dynamic array keys of size `key_dimension`.
+    pub(super) fn try_new(key_dimension: usize) -> crate::Result<Self> {
+        let inner = match key_dimension {
+            0 => {
+                return Err(TantivyError::InvalidArgument(
+                    "DynArrayHeapMap dimension must be at least 1".to_string(),
+                ))
+            }
+            1 => DynArrayHeapMapInner::Dim1(ArrayHeapMap::default()),
+            2 => DynArrayHeapMapInner::Dim2(ArrayHeapMap::default()),
+            3 => DynArrayHeapMapInner::Dim3(ArrayHeapMap::default()),
+            4 => DynArrayHeapMapInner::Dim4(ArrayHeapMap::default()),
+            5 => DynArrayHeapMapInner::Dim5(ArrayHeapMap::default()),
+            6 => DynArrayHeapMapInner::Dim6(ArrayHeapMap::default()),
+            7 => DynArrayHeapMapInner::Dim7(ArrayHeapMap::default()),
+            8 => DynArrayHeapMapInner::Dim8(ArrayHeapMap::default()),
+            9 => DynArrayHeapMapInner::Dim9(ArrayHeapMap::default()),
+            10 => DynArrayHeapMapInner::Dim10(ArrayHeapMap::default()),
+            11 => DynArrayHeapMapInner::Dim11(ArrayHeapMap::default()),
+            12 => DynArrayHeapMapInner::Dim12(ArrayHeapMap::default()),
+            13 => DynArrayHeapMapInner::Dim13(ArrayHeapMap::default()),
+            14 => DynArrayHeapMapInner::Dim14(ArrayHeapMap::default()),
+            15 => DynArrayHeapMapInner::Dim15(ArrayHeapMap::default()),
+            16 => DynArrayHeapMapInner::Dim16(ArrayHeapMap::default()),
+            MAX_DYN_ARRAY_SIZE_PLUS_ONE.. => {
+                return Err(TantivyError::InvalidArgument(format!(
+                    "DynArrayHeapMap supports maximum {MAX_DYN_ARRAY_SIZE} dimensions, got \
+                     {key_dimension}",
+                )))
+            }
+        };
+        Ok(DynArrayHeapMap(inner))
+    }
+
+    /// Number of elements in the map. This is not the dimension of the keys.
+    pub(super) fn size(&self) -> usize {
+        match &self.0 {
+            DynArrayHeapMapInner::Dim1(map) => map.buckets.len(),
+            DynArrayHeapMapInner::Dim2(map) => map.buckets.len(),
+            DynArrayHeapMapInner::Dim3(map) => map.buckets.len(),
+            DynArrayHeapMapInner::Dim4(map) => map.buckets.len(),
+            DynArrayHeapMapInner::Dim5(map) => map.buckets.len(),
+            DynArrayHeapMapInner::Dim6(map) => map.buckets.len(),
+            DynArrayHeapMapInner::Dim7(map) => map.buckets.len(),
+            DynArrayHeapMapInner::Dim8(map) => map.buckets.len(),
+            DynArrayHeapMapInner::Dim9(map) => map.buckets.len(),
+            DynArrayHeapMapInner::Dim10(map) => map.buckets.len(),
+            DynArrayHeapMapInner::Dim11(map) => map.buckets.len(),
+            DynArrayHeapMapInner::Dim12(map) => map.buckets.len(),
+            DynArrayHeapMapInner::Dim13(map) => map.buckets.len(),
+            DynArrayHeapMapInner::Dim14(map) => map.buckets.len(),
+            DynArrayHeapMapInner::Dim15(map) => map.buckets.len(),
+            DynArrayHeapMapInner::Dim16(map) => map.buckets.len(),
+        }
+    }
+}
+
+impl<K: Ord + Hash + Clone, V> DynArrayHeapMap<K, V> {
+    /// Get a mutable reference to the value corresponding to `key` or inserts a new
+    /// value created by calling `f`.
+    ///
+    /// Panics if the length of `key` does not match the key dimension of the map.
+    pub(super) fn get_or_insert_with<F: FnOnce() -> V>(&mut self, key: &[K], f: F) -> &mut V {
+        match &mut self.0 {
+            DynArrayHeapMapInner::Dim1(map) => map.get_or_insert_with(key, f),
+            DynArrayHeapMapInner::Dim2(map) => map.get_or_insert_with(key, f),
+            DynArrayHeapMapInner::Dim3(map) => map.get_or_insert_with(key, f),
+            DynArrayHeapMapInner::Dim4(map) => map.get_or_insert_with(key, f),
+            DynArrayHeapMapInner::Dim5(map) => map.get_or_insert_with(key, f),
+            DynArrayHeapMapInner::Dim6(map) => map.get_or_insert_with(key, f),
+            DynArrayHeapMapInner::Dim7(map) => map.get_or_insert_with(key, f),
+            DynArrayHeapMapInner::Dim8(map) => map.get_or_insert_with(key, f),
+            DynArrayHeapMapInner::Dim9(map) => map.get_or_insert_with(key, f),
+            DynArrayHeapMapInner::Dim10(map) => map.get_or_insert_with(key, f),
+            DynArrayHeapMapInner::Dim11(map) => map.get_or_insert_with(key, f),
+            DynArrayHeapMapInner::Dim12(map) => map.get_or_insert_with(key, f),
+            DynArrayHeapMapInner::Dim13(map) => map.get_or_insert_with(key, f),
+            DynArrayHeapMapInner::Dim14(map) => map.get_or_insert_with(key, f),
+            DynArrayHeapMapInner::Dim15(map) => map.get_or_insert_with(key, f),
+            DynArrayHeapMapInner::Dim16(map) => map.get_or_insert_with(key, f),
+        }
+    }
+
+    /// Returns a mutable reference to the value corresponding to `key`.
+    ///
+    /// Panics if the length of `key` does not match the key dimension of the map.
+    pub fn get_mut(&mut self, key: &[K]) -> Option<&mut V> {
+        match &mut self.0 {
+            DynArrayHeapMapInner::Dim1(map) => map.get_mut(key),
+            DynArrayHeapMapInner::Dim2(map) => map.get_mut(key),
+            DynArrayHeapMapInner::Dim3(map) => map.get_mut(key),
+            DynArrayHeapMapInner::Dim4(map) => map.get_mut(key),
+            DynArrayHeapMapInner::Dim5(map) => map.get_mut(key),
+            DynArrayHeapMapInner::Dim6(map) => map.get_mut(key),
+            DynArrayHeapMapInner::Dim7(map) => map.get_mut(key),
+            DynArrayHeapMapInner::Dim8(map) => map.get_mut(key),
+            DynArrayHeapMapInner::Dim9(map) => map.get_mut(key),
+            DynArrayHeapMapInner::Dim10(map) => map.get_mut(key),
+            DynArrayHeapMapInner::Dim11(map) => map.get_mut(key),
+            DynArrayHeapMapInner::Dim12(map) => map.get_mut(key),
+            DynArrayHeapMapInner::Dim13(map) => map.get_mut(key),
+            DynArrayHeapMapInner::Dim14(map) => map.get_mut(key),
+            DynArrayHeapMapInner::Dim15(map) => map.get_mut(key),
+            DynArrayHeapMapInner::Dim16(map) => map.get_mut(key),
+        }
+    }
+
+    /// Returns a reference to the highest key in the map.
+    pub(super) fn peek_highest(&self) -> Option<&[K]> {
+        match &self.0 {
+            DynArrayHeapMapInner::Dim1(map) => map.peek_highest(),
+            DynArrayHeapMapInner::Dim2(map) => map.peek_highest(),
+            DynArrayHeapMapInner::Dim3(map) => map.peek_highest(),
+            DynArrayHeapMapInner::Dim4(map) => map.peek_highest(),
+            DynArrayHeapMapInner::Dim5(map) => map.peek_highest(),
+            DynArrayHeapMapInner::Dim6(map) => map.peek_highest(),
+            DynArrayHeapMapInner::Dim7(map) => map.peek_highest(),
+            DynArrayHeapMapInner::Dim8(map) => map.peek_highest(),
+            DynArrayHeapMapInner::Dim9(map) => map.peek_highest(),
+            DynArrayHeapMapInner::Dim10(map) => map.peek_highest(),
+            DynArrayHeapMapInner::Dim11(map) => map.peek_highest(),
+            DynArrayHeapMapInner::Dim12(map) => map.peek_highest(),
+            DynArrayHeapMapInner::Dim13(map) => map.peek_highest(),
+            DynArrayHeapMapInner::Dim14(map) => map.peek_highest(),
+            DynArrayHeapMapInner::Dim15(map) => map.peek_highest(),
+            DynArrayHeapMapInner::Dim16(map) => map.peek_highest(),
+        }
+    }
+
+    /// Removes the entry with the highest key from the map.
+    pub(super) fn evict_highest(&mut self) {
+        match &mut self.0 {
+            DynArrayHeapMapInner::Dim1(map) => map.evict_highest(),
+            DynArrayHeapMapInner::Dim2(map) => map.evict_highest(),
+            DynArrayHeapMapInner::Dim3(map) => map.evict_highest(),
+            DynArrayHeapMapInner::Dim4(map) => map.evict_highest(),
+            DynArrayHeapMapInner::Dim5(map) => map.evict_highest(),
+            DynArrayHeapMapInner::Dim6(map) => map.evict_highest(),
+            DynArrayHeapMapInner::Dim7(map) => map.evict_highest(),
+            DynArrayHeapMapInner::Dim8(map) => map.evict_highest(),
+            DynArrayHeapMapInner::Dim9(map) => map.evict_highest(),
+            DynArrayHeapMapInner::Dim10(map) => map.evict_highest(),
+            DynArrayHeapMapInner::Dim11(map) => map.evict_highest(),
+            DynArrayHeapMapInner::Dim12(map) => map.evict_highest(),
+            DynArrayHeapMapInner::Dim13(map) => map.evict_highest(),
+            DynArrayHeapMapInner::Dim14(map) => map.evict_highest(),
+            DynArrayHeapMapInner::Dim15(map) => map.evict_highest(),
+            DynArrayHeapMapInner::Dim16(map) => map.evict_highest(),
+        }
+    }
+
+    pub(crate) fn memory_consumption(&self) -> u64 {
+        match &self.0 {
+            DynArrayHeapMapInner::Dim1(map) => map.memory_consumption(),
+            DynArrayHeapMapInner::Dim2(map) => map.memory_consumption(),
+            DynArrayHeapMapInner::Dim3(map) => map.memory_consumption(),
+            DynArrayHeapMapInner::Dim4(map) => map.memory_consumption(),
+            DynArrayHeapMapInner::Dim5(map) => map.memory_consumption(),
+            DynArrayHeapMapInner::Dim6(map) => map.memory_consumption(),
+            DynArrayHeapMapInner::Dim7(map) => map.memory_consumption(),
+            DynArrayHeapMapInner::Dim8(map) => map.memory_consumption(),
+            DynArrayHeapMapInner::Dim9(map) => map.memory_consumption(),
+            DynArrayHeapMapInner::Dim10(map) => map.memory_consumption(),
+            DynArrayHeapMapInner::Dim11(map) => map.memory_consumption(),
+            DynArrayHeapMapInner::Dim12(map) => map.memory_consumption(),
+            DynArrayHeapMapInner::Dim13(map) => map.memory_consumption(),
+            DynArrayHeapMapInner::Dim14(map) => map.memory_consumption(),
+            DynArrayHeapMapInner::Dim15(map) => map.memory_consumption(),
+            DynArrayHeapMapInner::Dim16(map) => map.memory_consumption(),
+        }
+    }
+}
+
+impl<K: Ord + Clone + Copy + 'static, V: 'static> DynArrayHeapMap<K, V> {
+    /// Turns this map into an iterator over key-value pairs.
+    pub fn into_iter(self) -> impl Iterator<Item = (SmallVec<[K; MAX_DYN_ARRAY_SIZE]>, V)> {
+        match self.0 {
+            DynArrayHeapMapInner::Dim1(map) => map.into_iter(),
+            DynArrayHeapMapInner::Dim2(map) => map.into_iter(),
+            DynArrayHeapMapInner::Dim3(map) => map.into_iter(),
+            DynArrayHeapMapInner::Dim4(map) => map.into_iter(),
+            DynArrayHeapMapInner::Dim5(map) => map.into_iter(),
+            DynArrayHeapMapInner::Dim6(map) => map.into_iter(),
+            DynArrayHeapMapInner::Dim7(map) => map.into_iter(),
+            DynArrayHeapMapInner::Dim8(map) => map.into_iter(),
+            DynArrayHeapMapInner::Dim9(map) => map.into_iter(),
+            DynArrayHeapMapInner::Dim10(map) => map.into_iter(),
+            DynArrayHeapMapInner::Dim11(map) => map.into_iter(),
+            DynArrayHeapMapInner::Dim12(map) => map.into_iter(),
+            DynArrayHeapMapInner::Dim13(map) => map.into_iter(),
+            DynArrayHeapMapInner::Dim14(map) => map.into_iter(),
+            DynArrayHeapMapInner::Dim15(map) => map.into_iter(),
+            DynArrayHeapMapInner::Dim16(map) => map.into_iter(),
+        }
+    }
+}
+
+#[cfg(test)]
+mod tests {
+    use super::*;
+
+    #[test]
+    fn test_dyn_array_heap_map() {
+        let mut map = DynArrayHeapMap::<u32, &str>::try_new(2).unwrap();
+        // insert
+        let key1 = [1u32, 2u32];
+        let key2 = [2u32, 1u32];
+        map.get_or_insert_with(&key1, || "a");
+        map.get_or_insert_with(&key2, || "b");
+        assert_eq!(map.size(), 2);
+
+        // evict highest
+        assert_eq!(map.peek_highest(), Some(&key2[..]));
+        map.evict_highest();
+        assert_eq!(map.size(), 1);
+        assert_eq!(map.peek_highest(), Some(&key1[..]));
+
+        // into_iter
+        let mut iter = map.into_iter();
+        let (k, v) = iter.next().unwrap();
+        assert_eq!(k.as_slice(), &key1);
+        assert_eq!(v, "a");
+        assert_eq!(iter.next(), None);
+    }
+}
--- a/src/aggregation/bucket/composite/mod.rs
+++ b/src/aggregation/bucket/composite/mod.rs
--- a/src/aggregation/bucket/composite/numeric_types.rs
+++ b/src/aggregation/bucket/composite/numeric_types.rs
@@ -0,0 +1,460 @@
+/// This module helps comparing numerical values of different types (i64, u64
+/// and f64).
+pub(super) mod num_cmp {
+    use std::cmp::Ordering;
+
+    use crate::TantivyError;
+
+    pub fn cmp_i64_f64(left_i: i64, right_f: f64) -> crate::Result<Ordering> {
+        if right_f.is_nan() {
+            return Err(TantivyError::InvalidArgument(
+                "NaN comparison is not supported".to_string(),
+            ));
+        }
+
+        // If right_f is < i64::MIN then left_i > right_f (i64::MIN=-2^63 can be
+        // exactly represented as f64)
+        if right_f < i64::MIN as f64 {
+            return Ok(Ordering::Greater);
+        }
+        // If right_f is >= i64::MAX then left_i < right_f (i64::MAX=2^63-1 cannot
+        // be exactly represented as f64)
+        if right_f >= i64::MAX as f64 {
+            return Ok(Ordering::Less);
+        }
+
+        // Now right_f is in (i64::MIN, i64::MAX), so `right_f as i64` is
+        // well-defined (truncation toward 0)
+        let right_as_i = right_f as i64;
+
+        let result = match left_i.cmp(&right_as_i) {
+            Ordering::Less => Ordering::Less,
+            Ordering::Greater => Ordering::Greater,
+            Ordering::Equal => {
+                // they have the same integer part, compare the fraction
+                let rem = right_f - (right_as_i as f64);
+                if rem == 0.0 {
+                    Ordering::Equal
+                } else if right_f > 0.0 {
+                    Ordering::Less
+                } else {
+                    Ordering::Greater
+                }
+            }
+        };
+        Ok(result)
+    }
+
+    pub fn cmp_u64_f64(left_u: u64, right_f: f64) -> crate::Result<Ordering> {
+        if right_f.is_nan() {
+            return Err(TantivyError::InvalidArgument(
+                "NaN comparison is not supported".to_string(),
+            ));
+        }
+
+        // Negative floats are always less than any u64 >= 0
+        if right_f < 0.0 {
+            return Ok(Ordering::Greater);
+        }
+
+        // If right_f is >= u64::MAX then left_u < right_f (u64::MAX=2^64-1 cannot be exactly)
+        let max_as_f = u64::MAX as f64;
+        if right_f > max_as_f {
+            return Ok(Ordering::Less);
+        }
+
+        // Now right_f is in (0, u64::MAX), so `right_f as u64` is well-defined
+        // (truncation toward 0)
+        let right_as_u = right_f as u64;
+
+        let result = match left_u.cmp(&right_as_u) {
+            Ordering::Less => Ordering::Less,
+            Ordering::Greater => Ordering::Greater,
+            Ordering::Equal => {
+                // they have the same integer part, compare the fraction
+                let rem = right_f - (right_as_u as f64);
+                if rem == 0.0 {
+                    Ordering::Equal
+                } else {
+                    Ordering::Less
+                }
+            }
+        };
+        Ok(result)
+    }
+
+    pub fn cmp_i64_u64(left_i: i64, right_u: u64) -> Ordering {
+        if left_i < 0 {
+            Ordering::Less
+        } else {
+            let left_as_u = left_i as u64;
+            left_as_u.cmp(&right_u)
+        }
+    }
+}
+
+/// This module helps projecting numerical values to other numerical types.
+/// When the target value space cannot exactly represent the source value, the
+/// next representable value is returned (or AfterLast if the source value is
+/// larger than the largest representable value).
+///
+/// All functions in this module assume that f64 values are not NaN.
+pub(super) mod num_proj {
+    #[derive(Debug, PartialEq)]
+    pub enum ProjectedNumber<T> {
+        Exact(T),
+        Next(T),
+        AfterLast,
+    }
+
+    pub fn i64_to_u64(value: i64) -> ProjectedNumber<u64> {
+        if value < 0 {
+            ProjectedNumber::Next(0)
+        } else {
+            ProjectedNumber::Exact(value as u64)
+        }
+    }
+
+    pub fn u64_to_i64(value: u64) -> ProjectedNumber<i64> {
+        if value > i64::MAX as u64 {
+            ProjectedNumber::AfterLast
+        } else {
+            ProjectedNumber::Exact(value as i64)
+        }
+    }
+
+    pub fn f64_to_u64(value: f64) -> ProjectedNumber<u64> {
+        if value < 0.0 {
+            ProjectedNumber::Next(0)
+        } else if value > u64::MAX as f64 {
+            ProjectedNumber::AfterLast
+        } else if value.fract() == 0.0 {
+            ProjectedNumber::Exact(value as u64)
+        } else {
+            // casting f64 to u64 truncates toward zero
+            ProjectedNumber::Next(value as u64 + 1)
+        }
+    }
+
+    pub fn f64_to_i64(value: f64) -> ProjectedNumber<i64> {
+        if value < (i64::MIN as f64) {
+            ProjectedNumber::Next(i64::MIN)
+        } else if value >= (i64::MAX as f64) {
+            ProjectedNumber::AfterLast
+        } else if value.fract() == 0.0 {
+            ProjectedNumber::Exact(value as i64)
+        } else if value > 0.0 {
+            // casting f64 to i64 truncates toward zero
+            ProjectedNumber::Next(value as i64 + 1)
+        } else {
+            ProjectedNumber::Next(value as i64)
+        }
+    }
+
+    pub fn i64_to_f64(value: i64) -> ProjectedNumber<f64> {
+        let value_f = value as f64;
+        let k_roundtrip = value_f as i64;
+        if k_roundtrip == value {
+            // between -2^53 and 2^53 all i64 are exactly represented as f64
+            ProjectedNumber::Exact(value_f)
+        } else {
+            // for very large/small i64 values, it is approximated to the closest f64
+            if k_roundtrip > value {
+                ProjectedNumber::Next(value_f)
+            } else {
+                ProjectedNumber::Next(value_f.next_up())
+            }
+        }
+    }
+
+    pub fn u64_to_f64(value: u64) -> ProjectedNumber<f64> {
+        let value_f = value as f64;
+        let k_roundtrip = value_f as u64;
+        if k_roundtrip == value {
+            // between 0 and 2^53 all u64 are exactly represented as f64
+            ProjectedNumber::Exact(value_f)
+        } else if k_roundtrip > value {
+            ProjectedNumber::Next(value_f)
+        } else {
+            ProjectedNumber::Next(value_f.next_up())
+        }
+    }
+}
+
+#[cfg(test)]
+mod num_cmp_tests {
+    use std::cmp::Ordering;
+
+    use super::num_cmp::*;
+
+    #[test]
+    fn test_cmp_u64_f64() {
+        // Basic comparisons
+        assert_eq!(cmp_u64_f64(5, 5.0).unwrap(), Ordering::Equal);
+        assert_eq!(cmp_u64_f64(5, 6.0).unwrap(), Ordering::Less);
+        assert_eq!(cmp_u64_f64(6, 5.0).unwrap(), Ordering::Greater);
+        assert_eq!(cmp_u64_f64(0, 0.0).unwrap(), Ordering::Equal);
+        assert_eq!(cmp_u64_f64(0, 0.1).unwrap(), Ordering::Less);
+
+        // Negative float values should always be less than any u64
+        assert_eq!(cmp_u64_f64(0, -0.1).unwrap(), Ordering::Greater);
+        assert_eq!(cmp_u64_f64(5, -5.0).unwrap(), Ordering::Greater);
+        assert_eq!(cmp_u64_f64(u64::MAX, -1e20).unwrap(), Ordering::Greater);
+
+        // Tests with extreme values
+        assert_eq!(cmp_u64_f64(u64::MAX, 1e20).unwrap(), Ordering::Less);
+
+        // Precision edge cases: large u64 that loses precision when converted to f64
+        // => 2^54, exactly represented as f64
+        let large_f64 = 18_014_398_509_481_984.0;
+        let large_u64 = 18_014_398_509_481_984;
+        // prove that large_u64 is exactly represented as f64
+        assert_eq!(large_u64 as f64, large_f64);
+        assert_eq!(cmp_u64_f64(large_u64, large_f64).unwrap(), Ordering::Equal);
+        // => (2^54 + 1) cannot be exactly represented in f64
+        let large_u64_plus_1 = 18_014_398_509_481_985;
+        // prove that it is represented as f64 by large_f64
+        assert_eq!(large_u64_plus_1 as f64, large_f64);
+        assert_eq!(
+            cmp_u64_f64(large_u64_plus_1, large_f64).unwrap(),
+            Ordering::Greater
+        );
+        // => (2^54 - 1) cannot be exactly represented in f64
+        let large_u64_minus_1 = 18_014_398_509_481_983;
+        // prove that it is also represented as f64 by large_f64
+        assert_eq!(large_u64_minus_1 as f64, large_f64);
+        assert_eq!(
+            cmp_u64_f64(large_u64_minus_1, large_f64).unwrap(),
+            Ordering::Less
+        );
+
+        // NaN comparison results in an error
+        assert!(cmp_u64_f64(0, f64::NAN).is_err());
+    }
+
+    #[test]
+    fn test_cmp_i64_f64() {
+        // Basic comparisons
+        assert_eq!(cmp_i64_f64(5, 5.0).unwrap(), Ordering::Equal);
+        assert_eq!(cmp_i64_f64(5, 6.0).unwrap(), Ordering::Less);
+        assert_eq!(cmp_i64_f64(6, 5.0).unwrap(), Ordering::Greater);
+        assert_eq!(cmp_i64_f64(-5, -5.0).unwrap(), Ordering::Equal);
+        assert_eq!(cmp_i64_f64(-5, -4.0).unwrap(), Ordering::Less);
+        assert_eq!(cmp_i64_f64(-4, -5.0).unwrap(), Ordering::Greater);
+        assert_eq!(cmp_i64_f64(-5, 5.0).unwrap(), Ordering::Less);
+        assert_eq!(cmp_i64_f64(5, -5.0).unwrap(), Ordering::Greater);
+        assert_eq!(cmp_i64_f64(0, -0.1).unwrap(), Ordering::Greater);
+        assert_eq!(cmp_i64_f64(0, 0.1).unwrap(), Ordering::Less);
+        assert_eq!(cmp_i64_f64(-1, -0.5).unwrap(), Ordering::Less);
+        assert_eq!(cmp_i64_f64(-1, 0.0).unwrap(), Ordering::Less);
+        assert_eq!(cmp_i64_f64(0, 0.0).unwrap(), Ordering::Equal);
+
+        // Tests with extreme values
+        assert_eq!(cmp_i64_f64(i64::MAX, 1e20).unwrap(), Ordering::Less);
+        assert_eq!(cmp_i64_f64(i64::MIN, -1e20).unwrap(), Ordering::Greater);
+
+        // Precision edge cases: large i64 that loses precision when converted to f64
+        // => 2^54, exactly represented as f64
+        let large_f64 = 18_014_398_509_481_984.0;
+        let large_i64 = 18_014_398_509_481_984;
+        // prove that large_i64 is exactly represented as f64
+        assert_eq!(large_i64 as f64, large_f64);
+        assert_eq!(cmp_i64_f64(large_i64, large_f64).unwrap(), Ordering::Equal);
+        // => (1_i64 << 54) + 1 cannot be exactly represented in f64
+        let large_i64_plus_1 = 18_014_398_509_481_985;
+        // prove that it is represented as f64 by large_f64
+        assert_eq!(large_i64_plus_1 as f64, large_f64);
+        assert_eq!(
+            cmp_i64_f64(large_i64_plus_1, large_f64).unwrap(),
+            Ordering::Greater
+        );
+        // => (1_i64 << 54) - 1 cannot be exactly represented in f64
+        let large_i64_minus_1 = 18_014_398_509_481_983;
+        // prove that it is also represented as f64 by large_f64
+        assert_eq!(large_i64_minus_1 as f64, large_f64);
+        assert_eq!(
+            cmp_i64_f64(large_i64_minus_1, large_f64).unwrap(),
+            Ordering::Less
+        );
+
+        // Same precision edge case but with negative values
+        // => -2^54, exactly represented as f64
+        let large_neg_f64 = -18_014_398_509_481_984.0;
+        let large_neg_i64 = -18_014_398_509_481_984;
+        // prove that large_neg_i64 is exactly represented as f64
+        assert_eq!(large_neg_i64 as f64, large_neg_f64);
+        assert_eq!(
+            cmp_i64_f64(large_neg_i64, large_neg_f64).unwrap(),
+            Ordering::Equal
+        );
+        // => (-2^54 + 1) cannot be exactly represented in f64
+        let large_neg_i64_plus_1 = -18_014_398_509_481_985;
+        // prove that it is represented as f64 by large_neg_f64
+        assert_eq!(large_neg_i64_plus_1 as f64, large_neg_f64);
+        assert_eq!(
+            cmp_i64_f64(large_neg_i64_plus_1, large_neg_f64).unwrap(),
+            Ordering::Less
+        );
+        // => (-2^54 - 1) cannot be exactly represented in f64
+        let large_neg_i64_minus_1 = -18_014_398_509_481_983;
+        // prove that it is also represented as f64 by large_neg_f64
+        assert_eq!(large_neg_i64_minus_1 as f64, large_neg_f64);
+        assert_eq!(
+            cmp_i64_f64(large_neg_i64_minus_1, large_neg_f64).unwrap(),
+            Ordering::Greater
+        );
+
+        // NaN comparison results in an error
+        assert!(cmp_i64_f64(0, f64::NAN).is_err());
+    }
+
+    #[test]
+    fn test_cmp_i64_u64() {
+        // Test with negative i64 values (should always be less than any u64)
+        assert_eq!(cmp_i64_u64(-1, 0), Ordering::Less);
+        assert_eq!(cmp_i64_u64(i64::MIN, 0), Ordering::Less);
+        assert_eq!(cmp_i64_u64(i64::MIN, u64::MAX), Ordering::Less);
+
+        // Test with positive i64 values
+        assert_eq!(cmp_i64_u64(0, 0), Ordering::Equal);
+        assert_eq!(cmp_i64_u64(1, 0), Ordering::Greater);
+        assert_eq!(cmp_i64_u64(1, 1), Ordering::Equal);
+        assert_eq!(cmp_i64_u64(0, 1), Ordering::Less);
+        assert_eq!(cmp_i64_u64(5, 10), Ordering::Less);
+        assert_eq!(cmp_i64_u64(10, 5), Ordering::Greater);
+
+        // Test with values near i64::MAX and u64 conversion
+        assert_eq!(cmp_i64_u64(i64::MAX, i64::MAX as u64), Ordering::Equal);
+        assert_eq!(cmp_i64_u64(i64::MAX, (i64::MAX as u64) + 1), Ordering::Less);
+        assert_eq!(cmp_i64_u64(i64::MAX, u64::MAX), Ordering::Less);
+    }
+}
+
+#[cfg(test)]
+mod num_proj_tests {
+    use super::num_proj::{self, ProjectedNumber};
+
+    #[test]
+    fn test_i64_to_u64() {
+        assert_eq!(num_proj::i64_to_u64(-1), ProjectedNumber::Next(0));
+        assert_eq!(num_proj::i64_to_u64(i64::MIN), ProjectedNumber::Next(0));
+        assert_eq!(num_proj::i64_to_u64(0), ProjectedNumber::Exact(0));
+        assert_eq!(num_proj::i64_to_u64(42), ProjectedNumber::Exact(42));
+        assert_eq!(
+            num_proj::i64_to_u64(i64::MAX),
+            ProjectedNumber::Exact(i64::MAX as u64)
+        );
+    }
+
+    #[test]
+    fn test_u64_to_i64() {
+        assert_eq!(num_proj::u64_to_i64(0), ProjectedNumber::Exact(0));
+        assert_eq!(num_proj::u64_to_i64(42), ProjectedNumber::Exact(42));
+        assert_eq!(
+            num_proj::u64_to_i64(i64::MAX as u64),
+            ProjectedNumber::Exact(i64::MAX)
+        );
+        assert_eq!(
+            num_proj::u64_to_i64((i64::MAX as u64) + 1),
+            ProjectedNumber::AfterLast
+        );
+        assert_eq!(num_proj::u64_to_i64(u64::MAX), ProjectedNumber::AfterLast);
+    }
+
+    #[test]
+    fn test_f64_to_u64() {
+        assert_eq!(num_proj::f64_to_u64(-1e25), ProjectedNumber::Next(0));
+        assert_eq!(num_proj::f64_to_u64(-0.1), ProjectedNumber::Next(0));
+        assert_eq!(num_proj::f64_to_u64(1e20), ProjectedNumber::AfterLast);
+        assert_eq!(
+            num_proj::f64_to_u64(f64::INFINITY),
+            ProjectedNumber::AfterLast
+        );
+        assert_eq!(num_proj::f64_to_u64(0.0), ProjectedNumber::Exact(0));
+        assert_eq!(num_proj::f64_to_u64(42.0), ProjectedNumber::Exact(42));
+        assert_eq!(num_proj::f64_to_u64(0.5), ProjectedNumber::Next(1));
+        assert_eq!(num_proj::f64_to_u64(42.1), ProjectedNumber::Next(43));
+    }
+
+    #[test]
+    fn test_f64_to_i64() {
+        assert_eq!(num_proj::f64_to_i64(-1e20), ProjectedNumber::Next(i64::MIN));
+        assert_eq!(
+            num_proj::f64_to_i64(f64::NEG_INFINITY),
+            ProjectedNumber::Next(i64::MIN)
+        );
+        assert_eq!(num_proj::f64_to_i64(1e20), ProjectedNumber::AfterLast);
+        assert_eq!(
+            num_proj::f64_to_i64(f64::INFINITY),
+            ProjectedNumber::AfterLast
+        );
+        assert_eq!(num_proj::f64_to_i64(0.0), ProjectedNumber::Exact(0));
+        assert_eq!(num_proj::f64_to_i64(42.0), ProjectedNumber::Exact(42));
+        assert_eq!(num_proj::f64_to_i64(-42.0), ProjectedNumber::Exact(-42));
+        assert_eq!(num_proj::f64_to_i64(0.5), ProjectedNumber::Next(1));
+        assert_eq!(num_proj::f64_to_i64(42.1), ProjectedNumber::Next(43));
+        assert_eq!(num_proj::f64_to_i64(-0.5), ProjectedNumber::Next(0));
+        assert_eq!(num_proj::f64_to_i64(-42.1), ProjectedNumber::Next(-42));
+    }
+
+    #[test]
+    fn test_i64_to_f64() {
+        assert_eq!(num_proj::i64_to_f64(0), ProjectedNumber::Exact(0.0));
+        assert_eq!(num_proj::i64_to_f64(42), ProjectedNumber::Exact(42.0));
+        assert_eq!(num_proj::i64_to_f64(-42), ProjectedNumber::Exact(-42.0));
+
+        let max_exact = 9_007_199_254_740_992; // 2^53
+        assert_eq!(
+            num_proj::i64_to_f64(max_exact),
+            ProjectedNumber::Exact(max_exact as f64)
+        );
+
+        // Test values that cannot be exactly represented as f64 (integers above 2^53)
+        let large_i64 = 9_007_199_254_740_993; // 2^53 + 1
+        let closest_f64 = 9_007_199_254_740_992.0;
+        assert_eq!(large_i64 as f64, closest_f64);
+        if let ProjectedNumber::Next(val) = num_proj::i64_to_f64(large_i64) {
+            // Verify that the returned float is different from the direct cast
+            assert!(val > closest_f64);
+            assert!(val - closest_f64 < 2. * f64::EPSILON * closest_f64);
+        } else {
+            panic!("Expected ProjectedNumber::Next for large_i64");
+        }
+
+        // Test with very large negative value
+        let large_neg_i64 = -9_007_199_254_740_993; // -(2^53 + 1)
+        let closest_neg_f64 = -9_007_199_254_740_992.0;
+        assert_eq!(large_neg_i64 as f64, closest_neg_f64);
+        if let ProjectedNumber::Next(val) = num_proj::i64_to_f64(large_neg_i64) {
+            // Verify that the returned float is the closest representable f64
+            assert_eq!(val, closest_neg_f64);
+        } else {
+            panic!("Expected ProjectedNumber::Next for large_neg_i64");
+        }
+    }
+
+    #[test]
+    fn test_u64_to_f64() {
+        assert_eq!(num_proj::u64_to_f64(0), ProjectedNumber::Exact(0.0));
+        assert_eq!(num_proj::u64_to_f64(42), ProjectedNumber::Exact(42.0));
+
+        // Test the largest u64 value that can be exactly represented as f64 (2^53)
+        let max_exact = 9_007_199_254_740_992; // 2^53
+        assert_eq!(
+            num_proj::u64_to_f64(max_exact),
+            ProjectedNumber::Exact(max_exact as f64)
+        );
+
+        // Test values that cannot be exactly represented as f64 (integers above 2^53)
+        let large_u64 = 9_007_199_254_740_993; // 2^53 + 1
+        let closest_f64 = 9_007_199_254_740_992.0;
+        assert_eq!(large_u64 as f64, closest_f64);
+        if let ProjectedNumber::Next(val) = num_proj::u64_to_f64(large_u64) {
+            // Verify that the returned float is different from the direct cast
+            assert!(val > closest_f64);
+            assert!(val - closest_f64 < 2. * f64::EPSILON * closest_f64);
+        } else {
+            panic!("Expected ProjectedNumber::Next for large_u64");
+        }
+    }
+}
--- a/src/aggregation/bucket/filter.rs
+++ b/src/aggregation/bucket/filter.rs
@@ -6,8 +6,8 @@ use serde::{Deserialize, Deserializer, Serialize, Serializer};
 use crate::aggregation::agg_data::{
    build_segment_agg_collectors, AggRefNode, AggregationsSegmentCtx,
 };
-use crate::aggregation::cached_sub_aggs::{
-    CachedSubAggs, HighCardSubAggCache, LowCardSubAggCache, SubAggCache,
+use crate::aggregation::buffered_sub_aggs::{
+    BufferedSubAggs, HighCardSubAggBuffer, LowCardSubAggBuffer, SubAggBuffer,
 };
 use crate::aggregation::intermediate_agg_result::{
    IntermediateAggregationResult, IntermediateAggregationResults, IntermediateBucketResult,
@@ -503,17 +503,17 @@ struct DocCount {
 }

 /// Segment collector for filter aggregation
-pub struct SegmentFilterCollector<C: SubAggCache> {
+pub struct SegmentFilterCollector<B: SubAggBuffer> {
    /// Document counts per parent bucket
    parent_buckets: Vec<DocCount>,
    /// Sub-aggregation collectors
-    sub_aggregations: Option<CachedSubAggs<C>>,
+    sub_aggregations: Option<BufferedSubAggs<B>>,
    bucket_id_provider: BucketIdProvider,
    /// Accessor index for this filter aggregation (to access FilterAggReqData)
    accessor_idx: usize,
 }

-impl<C: SubAggCache> SegmentFilterCollector<C> {
+impl<B: SubAggBuffer> SegmentFilterCollector<B> {
    /// Create a new filter segment collector following the new agg_data pattern
    pub(crate) fn from_req_and_validate(
        req: &mut AggregationsSegmentCtx,
@@ -525,7 +525,7 @@ impl<C: SubAggCache> SegmentFilterCollector<C> {
        } else {
            None
        };
-        let sub_agg_collector = sub_agg_collector.map(CachedSubAggs::new);
+        let sub_agg_collector = sub_agg_collector.map(BufferedSubAggs::new);

        Ok(SegmentFilterCollector {
            parent_buckets: Vec::new(),
@@ -547,16 +547,16 @@ pub(crate) fn build_segment_filter_collector(

    if is_top_level {
        Ok(Box::new(
-            SegmentFilterCollector::<LowCardSubAggCache>::from_req_and_validate(req, node)?,
+            SegmentFilterCollector::<LowCardSubAggBuffer>::from_req_and_validate(req, node)?,
        ))
    } else {
        Ok(Box::new(
-            SegmentFilterCollector::<HighCardSubAggCache>::from_req_and_validate(req, node)?,
+            SegmentFilterCollector::<HighCardSubAggBuffer>::from_req_and_validate(req, node)?,
        ))
    }
 }

-impl<C: SubAggCache> Debug for SegmentFilterCollector<C> {
+impl<B: SubAggBuffer> Debug for SegmentFilterCollector<B> {
    fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
        f.debug_struct("SegmentFilterCollector")
            .field("buckets", &self.parent_buckets)
@@ -566,7 +566,7 @@ impl<C: SubAggCache> Debug for SegmentFilterCollector<C> {
    }
 }

-impl<C: SubAggCache> SegmentAggregationCollector for SegmentFilterCollector<C> {
+impl<B: SubAggBuffer> SegmentAggregationCollector for SegmentFilterCollector<B> {
    fn add_intermediate_aggregation_result(
        &mut self,
        agg_data: &AggregationsSegmentCtx,
@@ -674,6 +674,17 @@ impl<C: SubAggCache> SegmentAggregationCollector for SegmentFilterCollector<C> {
        }
        Ok(())
    }
+
+    fn compute_metric_value(
+        &self,
+        _bucket_id: BucketId,
+        _sub_agg_name: &str,
+        _sub_agg_property: &str,
+        _agg_data: &AggregationsSegmentCtx,
+    ) -> Option<f64> {
+        // TODO: forward into the inner `sub_agg` for nested order paths (`filter.metric`).
+        None
+    }
 }

 /// Intermediate result for filter aggregation
@@ -838,7 +849,7 @@ mod tests {
        let expected = json!({
            "electronics": {
                "doc_count": 2,
-                "avg_price": { "value": 899.0, "sum": 1798.0, "count": 2 }  // (999 + 799) / 2
+                "avg_price": { "value": 899.0 }  // (999 + 799) / 2
            }
        });

@@ -868,7 +879,7 @@ mod tests {
        let expected = json!({
            "furniture": {
                "doc_count": 0,
-                "avg_price": { "value": null, "sum": 0.0, "count": 0 }
+                "avg_price": { "value": null }
            }
        });

@@ -904,7 +915,7 @@ mod tests {
        let expected = json!({
            "electronics": {
                "doc_count": 2,
-                "avg_price": { "value": 899.0, "sum": 1798.0, "count": 2 }
+                "avg_price": { "value": 899.0 }
            },
            "in_stock": {
                "doc_count": 3,  // apple, samsung, penguin
@@ -1000,7 +1011,7 @@ mod tests {
        let expected = json!({
            "premium_electronics": {
                "doc_count": 1,  // Only apple (999) is >= 800 in tantivy's range semantics
-                "avg_rating": { "value": 4.5, "sum": 4.5, "count": 1 }
+                "avg_rating": { "value": 4.5 }
            }
        });

@@ -1032,7 +1043,7 @@ mod tests {
        let expected = json!({
            "in_stock": {
                "doc_count": 3,  // apple, samsung, penguin
-                "avg_price": { "value": 607.67, "sum": 1823.0, "count": 3 }  // (999 + 799 + 25) / 3 ≈ 607.67
+                "avg_price": { "value": 607.67 }  // (999 + 799 + 25) / 3 ≈ 607.67
            },
            "out_of_stock": {
                "doc_count": 1,  // nike
@@ -1183,7 +1194,7 @@ mod tests {
                "doc_count": 4,
                "electronics_branch": {
                    "doc_count": 2,
-                    "avg_price": { "value": 899.0, "sum": 1798.0, "count": 2 }
+                    "avg_price": { "value": 899.0 }
                },
                "in_stock_branch": {
                    "doc_count": 3,
@@ -1259,7 +1270,7 @@ mod tests {
                    "doc_count": 2,  // apple (999), samsung (799)
                    "electronics": {
                        "doc_count": 2,  // both are electronics
-                        "avg_rating": { "value": 4.35, "sum": 8.7, "count": 2 }  // (4.5 + 4.2) / 2
+                        "avg_rating": { "value": 4.35 }  // (4.5 + 4.2) / 2
                    },
                    "in_stock": {
                        "doc_count": 2,  // both are in stock
@@ -1321,12 +1332,12 @@ mod tests {
                        {
                            "key": "samsung",
                            "doc_count": 1,
-                            "avg_price": { "value": 799.0, "sum": 799.0, "count": 1 }
+                            "avg_price": { "value": 799.0 }
                        },
                        {
                            "key": "apple",
                            "doc_count": 1,
-                            "avg_price": { "value": 999.0, "sum": 999.0, "count": 1 }
+                            "avg_price": { "value": 999.0 }
                        }
                    ],
                    "sum_other_doc_count": 0,
@@ -1370,7 +1381,7 @@ mod tests {
                    "sum": 1798.0,
                    "avg": 899.0
                },
-                "rating_avg": { "value": 4.35, "sum": 8.7, "count": 2 },
+                "rating_avg": { "value": 4.35 },
                "count": { "value": 2.0 }
            }
        });
@@ -1411,7 +1422,7 @@ mod tests {
        let expected = json!({
            "electronics": {
                "doc_count": 0,
-                "avg_price": { "value": null, "sum": 0.0, "count": 0 }
+                "avg_price": { "value": null }
            }
        });

@@ -1698,15 +1709,13 @@ mod tests {
        let filter_expected = json!({
            "electronics": {
                "doc_count": 2,
-                "avg_price": { "value": 899.0, "sum": 1798.0, "count": 2 }
+                "avg_price": { "value": 899.0 }
            }
        });

        let separate_expected = json!({
            "result": {
-                "value": 899.0,
-                "sum": 1798.0,
-                "count": 2
+                "value": 899.0
            }
        });

--- a/src/aggregation/bucket/histogram/date_histogram.rs
+++ b/src/aggregation/bucket/histogram/date_histogram.rs
@@ -207,7 +207,7 @@ fn parse_offset_into_milliseconds(input: &str) -> Result<i64, AggregationError>
    }
 }

-fn parse_into_milliseconds(input: &str) -> Result<i64, AggregationError> {
+pub(crate) fn parse_into_milliseconds(input: &str) -> Result<i64, AggregationError> {
    let split_boundary = input
        .as_bytes()
        .iter()
--- a/src/aggregation/bucket/histogram/histogram.rs
+++ b/src/aggregation/bucket/histogram/histogram.rs
@@ -10,7 +10,7 @@ use crate::aggregation::agg_data::{
 };
 use crate::aggregation::agg_req::Aggregations;
 use crate::aggregation::agg_result::BucketEntry;
-use crate::aggregation::cached_sub_aggs::{CachedSubAggs, HighCardCachedSubAggs};
+use crate::aggregation::buffered_sub_aggs::{BufferedSubAggs, HighCardBufferedSubAggs};
 use crate::aggregation::intermediate_agg_result::{
    IntermediateAggregationResult, IntermediateAggregationResults, IntermediateBucketResult,
    IntermediateHistogramBucketEntry,
@@ -258,7 +258,7 @@ pub(crate) struct SegmentHistogramBucketEntry {
 impl SegmentHistogramBucketEntry {
    pub(crate) fn into_intermediate_bucket_entry(
        self,
-        sub_aggregation: &mut Option<HighCardCachedSubAggs>,
+        sub_aggregation: &mut Option<HighCardBufferedSubAggs>,
        agg_data: &AggregationsSegmentCtx,
    ) -> crate::Result<IntermediateHistogramBucketEntry> {
        let mut sub_aggregation_res = IntermediateAggregationResults::default();
@@ -283,6 +283,11 @@ impl SegmentHistogramBucketEntry {
 struct HistogramBuckets {
    pub buckets: FxHashMap<i64, SegmentHistogramBucketEntry>,
 }
+impl HistogramBuckets {
+    fn memory_consumption(&self) -> u64 {
+        self.buckets.capacity() as u64 * std::mem::size_of::<SegmentHistogramBucketEntry>() as u64
+    }
+}

 /// The collector puts values from the fast field into the correct buckets and does a conversion to
 /// the correct datatype.
@@ -291,7 +296,7 @@ pub struct SegmentHistogramCollector {
    /// The buckets containing the aggregation data.
    /// One Histogram bucket per parent bucket id.
    parent_buckets: Vec<HistogramBuckets>,
-    sub_agg: Option<HighCardCachedSubAggs>,
+    sub_agg: Option<HighCardBufferedSubAggs>,
    accessor_idx: usize,
    bucket_id_provider: BucketIdProvider,
 }
@@ -324,7 +329,7 @@ impl SegmentAggregationCollector for SegmentHistogramCollector {
        agg_data: &mut AggregationsSegmentCtx,
    ) -> crate::Result<()> {
        let req = agg_data.take_histogram_req_data(self.accessor_idx);
-        let mem_pre = self.get_memory_consumption();
+        let mem_pre = self.get_memory_consumption(parent_bucket_id);
        let buckets = &mut self.parent_buckets[parent_bucket_id as usize].buckets;

        let bounds = req.bounds;
@@ -358,12 +363,9 @@ impl SegmentAggregationCollector for SegmentHistogramCollector {
        }
        agg_data.put_back_histogram_req_data(self.accessor_idx, req);

-        let mem_delta = self.get_memory_consumption() - mem_pre;
+        let mem_delta = self.get_memory_consumption(parent_bucket_id) - mem_pre;
        if mem_delta > 0 {
-            agg_data
-                .context
-                .limits
-                .add_memory_consumed(mem_delta as u64)?;
+            agg_data.context.limits.add_memory_consumed(mem_delta)?;
        }

        if let Some(sub_agg) = &mut self.sub_agg {
@@ -392,14 +394,24 @@ impl SegmentAggregationCollector for SegmentHistogramCollector {
        }
        Ok(())
    }
+
+    fn compute_metric_value(
+        &self,
+        _bucket_id: BucketId,
+        _sub_agg_name: &str,
+        _sub_agg_property: &str,
+        _agg_data: &AggregationsSegmentCtx,
+    ) -> Option<f64> {
+        // Histogram is a multi-bucket agg with no single value to extract.
+        None
+    }
 }

 impl SegmentHistogramCollector {
-    fn get_memory_consumption(&self) -> usize {
-        let self_mem = std::mem::size_of::<Self>();
-        let buckets_mem = self.parent_buckets.len() * std::mem::size_of::<HistogramBuckets>();
-        self_mem + buckets_mem
+    fn get_memory_consumption(&self, parent_bucket_id: BucketId) -> u64 {
+        self.parent_buckets[parent_bucket_id as usize].memory_consumption()
    }
+
    /// Converts the collector result into a intermediate bucket result.
    fn add_intermediate_bucket_result(
        &mut self,
@@ -444,7 +456,7 @@ impl SegmentHistogramCollector {
            max: f64::MAX,
        });
        req_data.offset = req_data.req.offset.unwrap_or(0.0);
-        let sub_agg = sub_agg.map(CachedSubAggs::new);
+        let sub_agg = sub_agg.map(BufferedSubAggs::new);

        Ok(Self {
            parent_buckets: Default::default(),
@@ -1222,9 +1234,7 @@ mod tests {
            res["histogram"]["buckets"][0],
            json!({
                "avg": {
-                    "value": Value::Null,
-                    "sum": 0.0,
-                    "count": 0
+                    "value": Value::Null
                },
                "doc_count": 0,
                "key": 2.0,
--- a/src/aggregation/bucket/mod.rs
+++ b/src/aggregation/bucket/mod.rs
@@ -22,6 +22,7 @@
 //! - [Range](RangeAggregation)
 //! - [Terms](TermsAggregation)

+mod composite;
 mod filter;
 mod histogram;
 mod range;
@@ -31,6 +32,7 @@ mod term_missing_agg;
 use std::collections::HashMap;
 use std::fmt;

+pub use composite::*;
 pub use filter::*;
 pub use histogram::*;
 pub use range::*;
--- a/src/aggregation/bucket/range.rs
+++ b/src/aggregation/bucket/range.rs
@@ -9,8 +9,9 @@ use crate::aggregation::agg_data::{
    build_segment_agg_collectors, AggRefNode, AggregationsSegmentCtx,
 };
 use crate::aggregation::agg_limits::AggregationLimitsGuard;
-use crate::aggregation::cached_sub_aggs::{
-    CachedSubAggs, HighCardSubAggCache, LowCardCachedSubAggs, LowCardSubAggCache, SubAggCache,
+use crate::aggregation::buffered_sub_aggs::{
+    BufferedSubAggs, HighCardSubAggBuffer, LowCardBufferedSubAggs, LowCardSubAggBuffer,
+    SubAggBuffer,
 };
 use crate::aggregation::intermediate_agg_result::{
    IntermediateAggregationResult, IntermediateAggregationResults, IntermediateBucketResult,
@@ -155,13 +156,13 @@ pub(crate) struct SegmentRangeAndBucketEntry {

 /// The collector puts values from the fast field into the correct buckets and does a conversion to
 /// the correct datatype.
-pub struct SegmentRangeCollector<C: SubAggCache> {
+pub struct SegmentRangeCollector<B: SubAggBuffer> {
    /// The buckets containing the aggregation data.
    /// One for each ParentBucketId
    parent_buckets: Vec<Vec<SegmentRangeAndBucketEntry>>,
    column_type: ColumnType,
    pub(crate) accessor_idx: usize,
-    sub_agg: Option<CachedSubAggs<C>>,
+    sub_agg: Option<BufferedSubAggs<B>>,
    /// Here things get a bit weird. We need to assign unique bucket ids across all
    /// parent buckets. So we keep track of the next available bucket id here.
    /// This allows a kind of flattening of the bucket ids across all parent buckets.
@@ -178,7 +179,7 @@ pub struct SegmentRangeCollector<C: SubAggCache> {
    limits: AggregationLimitsGuard,
 }

-impl<C: SubAggCache> Debug for SegmentRangeCollector<C> {
+impl<B: SubAggBuffer> Debug for SegmentRangeCollector<B> {
    fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
        f.debug_struct("SegmentRangeCollector")
            .field("parent_buckets_len", &self.parent_buckets.len())
@@ -229,7 +230,7 @@ impl SegmentRangeBucketEntry {
    }
 }

-impl<C: SubAggCache> SegmentAggregationCollector for SegmentRangeCollector<C> {
+impl<B: SubAggBuffer> SegmentAggregationCollector for SegmentRangeCollector<B> {
    fn add_intermediate_aggregation_result(
        &mut self,
        agg_data: &AggregationsSegmentCtx,
@@ -327,6 +328,17 @@ impl<C: SubAggCache> SegmentAggregationCollector for SegmentRangeCollector<C> {

        Ok(())
    }
+
+    fn compute_metric_value(
+        &self,
+        _bucket_id: BucketId,
+        _sub_agg_name: &str,
+        _sub_agg_property: &str,
+        _agg_data: &AggregationsSegmentCtx,
+    ) -> Option<f64> {
+        // Range is a multi-bucket agg with no single value to extract.
+        None
+    }
 }
 /// Build a concrete `SegmentRangeCollector` with either a Vec- or HashMap-backed
 /// bucket storage, depending on the column type and aggregation level.
@@ -350,8 +362,8 @@ pub(crate) fn build_segment_range_collector(
    };

    if is_low_card {
-        Ok(Box::new(SegmentRangeCollector::<LowCardSubAggCache> {
-            sub_agg: sub_agg.map(LowCardCachedSubAggs::new),
+        Ok(Box::new(SegmentRangeCollector::<LowCardSubAggBuffer> {
+            sub_agg: sub_agg.map(LowCardBufferedSubAggs::new),
            column_type: field_type,
            accessor_idx,
            parent_buckets: Vec::new(),
@@ -359,8 +371,8 @@ pub(crate) fn build_segment_range_collector(
            limits: agg_data.context.limits.clone(),
        }))
    } else {
-        Ok(Box::new(SegmentRangeCollector::<HighCardSubAggCache> {
-            sub_agg: sub_agg.map(CachedSubAggs::new),
+        Ok(Box::new(SegmentRangeCollector::<HighCardSubAggBuffer> {
+            sub_agg: sub_agg.map(BufferedSubAggs::new),
            column_type: field_type,
            accessor_idx,
            parent_buckets: Vec::new(),
@@ -370,7 +382,7 @@ pub(crate) fn build_segment_range_collector(
    }
 }

-impl<C: SubAggCache> SegmentRangeCollector<C> {
+impl<B: SubAggBuffer> SegmentRangeCollector<B> {
    pub(crate) fn create_new_buckets(
        &mut self,
        agg_data: &AggregationsSegmentCtx,
@@ -554,7 +566,7 @@ mod tests {
    pub fn get_collector_from_ranges(
        ranges: Vec<RangeAggregationRange>,
        field_type: ColumnType,
-    ) -> SegmentRangeCollector<HighCardSubAggCache> {
+    ) -> SegmentRangeCollector<HighCardSubAggBuffer> {
        let req = RangeAggregation {
            field: "dummy".to_string(),
            ranges,
--- a/src/aggregation/bucket/term_agg.rs
+++ b/src/aggregation/bucket/term_agg.rs
@@ -1,5 +1,4 @@
 use std::fmt::Debug;
-use std::io;
 use std::net::Ipv6Addr;

 use columnar::column_values::CompactSpaceU64Accessor;
@@ -17,8 +16,9 @@ use crate::aggregation::agg_data::{
 };
 use crate::aggregation::agg_limits::MemoryConsumption;
 use crate::aggregation::agg_req::Aggregations;
-use crate::aggregation::cached_sub_aggs::{
-    CachedSubAggs, HighCardSubAggCache, LowCardCachedSubAggs, LowCardSubAggCache, SubAggCache,
+use crate::aggregation::buffered_sub_aggs::{
+    BufferedSubAggs, HighCardSubAggBuffer, LowCardBufferedSubAggs, LowCardSubAggBuffer,
+    SubAggBuffer,
 };
 use crate::aggregation::intermediate_agg_result::{
    IntermediateAggregationResult, IntermediateAggregationResults, IntermediateBucketResult,
@@ -352,19 +352,15 @@ pub(crate) fn build_segment_term_collector(
        )));
    }

-    // Validate sub aggregation exists when ordering by sub-aggregation.
-    {
-        if let OrderTarget::SubAggregation(sub_agg_name) = &terms_req_data.req.order.target {
-            let (agg_name, _agg_property) = get_agg_name_and_property(sub_agg_name);
-
-            node.get_sub_agg(agg_name, &req_data.per_request)
-                .ok_or_else(|| {
-                    TantivyError::InvalidArgument(format!(
-                        "could not find aggregation with name {agg_name} in metric \
-                         sub_aggregations"
-                    ))
-                })?;
-        }
+    // Validate that the referenced sub-aggregation exists when ordering by one.
+    if let OrderTarget::SubAggregation(sub_agg_name) = &terms_req_data.req.order.target {
+        let (agg_name, _agg_property) = get_agg_name_and_property(sub_agg_name);
+        node.get_sub_agg(agg_name, &req_data.per_request)
+            .ok_or_else(|| {
+                TantivyError::InvalidArgument(format!(
+                    "could not find aggregation with name {agg_name} in metric sub_aggregations"
+                ))
+            })?;
    }

    // Build sub-aggregation blueprint if there are children.
@@ -391,7 +387,7 @@ pub(crate) fn build_segment_term_collector(
    // Decide which bucket storage is best suited for this aggregation.
    if is_top_level && max_term_id < MAX_NUM_TERMS_FOR_VEC && !has_sub_aggregations {
        let term_buckets = VecTermBucketsNoAgg::new(max_term_id + 1, &mut bucket_id_provider);
-        let collector: SegmentTermCollector<_, HighCardSubAggCache> = SegmentTermCollector {
+        let collector: SegmentTermCollector<_, HighCardSubAggBuffer> = SegmentTermCollector {
            parent_buckets: vec![term_buckets],
            sub_agg: None,
            bucket_id_provider,
@@ -401,8 +397,8 @@ pub(crate) fn build_segment_term_collector(
        Ok(Box::new(collector))
    } else if is_top_level && max_term_id < MAX_NUM_TERMS_FOR_VEC {
        let term_buckets = VecTermBuckets::new(max_term_id + 1, &mut bucket_id_provider);
-        let sub_agg = sub_agg_collector.map(LowCardCachedSubAggs::new);
-        let collector: SegmentTermCollector<_, LowCardSubAggCache> = SegmentTermCollector {
+        let sub_agg = sub_agg_collector.map(LowCardBufferedSubAggs::new);
+        let collector: SegmentTermCollector<_, LowCardSubAggBuffer> = SegmentTermCollector {
            parent_buckets: vec![term_buckets],
            sub_agg,
            bucket_id_provider,
@@ -414,8 +410,8 @@ pub(crate) fn build_segment_term_collector(
        let term_buckets: PagedTermMap =
            PagedTermMap::new(max_term_id + 1, &mut bucket_id_provider);
        // Build sub-aggregation blueprint (flat pairs)
-        let sub_agg = sub_agg_collector.map(CachedSubAggs::new);
-        let collector: SegmentTermCollector<PagedTermMap, HighCardSubAggCache> =
+        let sub_agg = sub_agg_collector.map(BufferedSubAggs::new);
+        let collector: SegmentTermCollector<PagedTermMap, HighCardSubAggBuffer> =
            SegmentTermCollector {
                parent_buckets: vec![term_buckets],
                sub_agg,
@@ -427,8 +423,8 @@ pub(crate) fn build_segment_term_collector(
    } else {
        let term_buckets: HashMapTermBuckets = HashMapTermBuckets::default();
        // Build sub-aggregation blueprint (flat pairs)
-        let sub_agg = sub_agg_collector.map(CachedSubAggs::new);
-        let collector: SegmentTermCollector<HashMapTermBuckets, HighCardSubAggCache> =
+        let sub_agg = sub_agg_collector.map(BufferedSubAggs::new);
+        let collector: SegmentTermCollector<HashMapTermBuckets, HighCardSubAggBuffer> =
            SegmentTermCollector {
                parent_buckets: vec![term_buckets],
                sub_agg,
@@ -758,10 +754,10 @@ impl TermAggregationMap for VecTermBuckets {
 /// The collector puts values from the fast field into the correct buckets and does a conversion to
 /// the correct datatype.
 #[derive(Debug)]
-struct SegmentTermCollector<TermMap: TermAggregationMap, C: SubAggCache> {
+struct SegmentTermCollector<TermMap: TermAggregationMap, B: SubAggBuffer> {
    /// The buckets containing the aggregation data.
    parent_buckets: Vec<TermMap>,
-    sub_agg: Option<CachedSubAggs<C>>,
+    sub_agg: Option<BufferedSubAggs<B>>,
    bucket_id_provider: BucketIdProvider,
    max_term_id: u64,
    terms_req_data: TermsAggReqData,
@@ -772,8 +768,8 @@ pub(crate) fn get_agg_name_and_property(name: &str) -> (&str, &str) {
    (agg_name, agg_property)
 }

-impl<TermMap: TermAggregationMap, C: SubAggCache> SegmentAggregationCollector
-    for SegmentTermCollector<TermMap, C>
+impl<TermMap: TermAggregationMap, B: SubAggBuffer> SegmentAggregationCollector
+    for SegmentTermCollector<TermMap, B>
 {
    fn add_intermediate_aggregation_result(
        &mut self,
@@ -790,8 +786,14 @@ impl<TermMap: TermAggregationMap, C: SubAggCache> SegmentAggregationCollector
        let term_req = &self.terms_req_data;
        let name = term_req.name.clone();

-        let bucket =
-            Self::into_intermediate_bucket_result(term_req, &mut self.sub_agg, bucket, agg_data)?;
+        let bucket = Self::into_intermediate_bucket_result(
+            term_req,
+            self.sub_agg
+                .as_mut()
+                .map(BufferedSubAggs::get_sub_agg_collector),
+            bucket,
+            agg_data,
+        )?;
        results.push(name, IntermediateAggregationResult::Bucket(bucket))?;
        Ok(())
    }
@@ -803,15 +805,17 @@ impl<TermMap: TermAggregationMap, C: SubAggCache> SegmentAggregationCollector
        docs: &[crate::DocId],
        agg_data: &mut AggregationsSegmentCtx,
    ) -> crate::Result<()> {
-        let mem_pre = self.get_memory_consumption();
+        let mem_pre = self.get_memory_consumption(parent_bucket_id);

        let req_data = &mut self.terms_req_data;

-        agg_data.column_block_accessor.fetch_block_with_missing(
-            docs,
-            &req_data.accessor,
-            req_data.missing_value_for_accessor,
-        );
+        agg_data
+            .column_block_accessor
+            .fetch_block_with_missing_unique_per_doc(
+                docs,
+                &req_data.accessor,
+                req_data.missing_value_for_accessor,
+            );

        if let Some(sub_agg) = &mut self.sub_agg {
            let term_buckets = &mut self.parent_buckets[parent_bucket_id as usize];
@@ -845,7 +849,7 @@ impl<TermMap: TermAggregationMap, C: SubAggCache> SegmentAggregationCollector
            }
        }

-        let mem_delta = self.get_memory_consumption() - mem_pre;
+        let mem_delta = self.get_memory_consumption(parent_bucket_id) - mem_pre;
        if mem_delta > 0 {
            agg_data
                .context
@@ -879,6 +883,17 @@ impl<TermMap: TermAggregationMap, C: SubAggCache> SegmentAggregationCollector
        }
        Ok(())
    }
+
+    fn compute_metric_value(
+        &self,
+        _bucket_id: BucketId,
+        _sub_agg_name: &str,
+        _sub_agg_property: &str,
+        _agg_data: &AggregationsSegmentCtx,
+    ) -> Option<f64> {
+        // Terms is a multi-bucket agg with no single value to extract.
+        None
+    }
 }

 /// Missing value are represented as a sentinel value in the column.
@@ -905,30 +920,53 @@ fn extract_missing_value<T>(
    Some((key, bucket))
 }

-impl<TermMap, C> SegmentTermCollector<TermMap, C>
+fn reborrow_opt_collector<'a>(
+    opt: &'a mut Option<&mut dyn SegmentAggregationCollector>,
+) -> Option<&'a mut dyn SegmentAggregationCollector> {
+    match opt {
+        Some(inner) => Some(*inner),
+        None => None,
+    }
+}
+
+fn into_intermediate_bucket_entry(
+    bucket: Bucket,
+    sub_agg_collector: Option<&mut dyn SegmentAggregationCollector>,
+    agg_data: &AggregationsSegmentCtx,
+) -> crate::Result<IntermediateTermBucketEntry> {
+    let mut sub_aggregation_res = IntermediateAggregationResults::default();
+    if let Some(sub_agg_collector) = sub_agg_collector {
+        sub_agg_collector.add_intermediate_aggregation_result(
+            agg_data,
+            &mut sub_aggregation_res,
+            bucket.bucket_id,
+        )?;
+    }
+    Ok(IntermediateTermBucketEntry {
+        doc_count: bucket.count,
+        sub_aggregation: sub_aggregation_res,
+    })
+}
+
+impl<TermMap, B> SegmentTermCollector<TermMap, B>
 where
    TermMap: TermAggregationMap,
-    C: SubAggCache,
+    B: SubAggBuffer,
 {
-    fn get_memory_consumption(&self) -> usize {
-        self.parent_buckets
-            .iter()
-            .map(|b| b.get_memory_consumption())
-            .sum()
+    #[inline]
+    fn get_memory_consumption(&self, parent_bucket_id: BucketId) -> usize {
+        self.parent_buckets[parent_bucket_id as usize].get_memory_consumption()
    }

    #[inline]
    pub(crate) fn into_intermediate_bucket_result(
        term_req: &TermsAggReqData,
-        sub_agg: &mut Option<CachedSubAggs<C>>,
+        mut sub_agg_collector: Option<&mut dyn SegmentAggregationCollector>,
        term_buckets: TermMap,
        agg_data: &AggregationsSegmentCtx,
    ) -> crate::Result<IntermediateBucketResult> {
        let mut entries: Vec<(u64, Bucket)> = term_buckets.into_vec();

-        let order_by_sub_aggregation =
-            matches!(term_req.req.order.target, OrderTarget::SubAggregation(_));
-
        match &term_req.req.order.target {
            OrderTarget::Key => {
                // We rely on the fact, that term ordinals match the order of the strings
@@ -940,10 +978,37 @@ where
                    entries.sort_unstable_by_key(|bucket| bucket.0);
                }
            }
-            OrderTarget::SubAggregation(_name) => {
-                // don't sort and cut off since it's hard to make assumptions on the quality of the
-                // results when cutting off du to unknown nature of the sub_aggregation (possible
-                // to check).
+            OrderTarget::SubAggregation(sub_agg_path) => {
+                // Peek segment-level metric values, sort, then fall through to
+                // `cut_off_buckets`. Like Elasticsearch, we always cut off when ordering
+                // by a sub-agg: top-K results are approximate and may differ from the
+                // global ordering, especially for non-monotonic metrics like avg/min.
+                let coll = sub_agg_collector.as_deref().ok_or_else(|| {
+                    TantivyError::InvalidArgument(format!(
+                        "Could not find sub-aggregation collector for path {sub_agg_path}"
+                    ))
+                })?;
+                let (agg_name, agg_prop) = get_agg_name_and_property(sub_agg_path);
+                // Fetch values up-front; otherwise sort would re-compute per comparison
+                let mut keyed: Vec<(f64, (u64, Bucket))> = entries
+                    .into_iter()
+                    .map(|bucket| {
+                        let metric_value = coll
+                            .compute_metric_value(bucket.1.bucket_id, agg_name, agg_prop, agg_data)
+                            .unwrap_or(0.0);
+                        (metric_value, bucket)
+                    })
+                    .collect();
+                if term_req.req.order.order == Order::Desc {
+                    keyed.sort_unstable_by(|a, b| {
+                        b.0.partial_cmp(&a.0).unwrap_or(std::cmp::Ordering::Equal)
+                    });
+                } else {
+                    keyed.sort_unstable_by(|a, b| {
+                        a.0.partial_cmp(&b.0).unwrap_or(std::cmp::Ordering::Equal)
+                    });
+                }
+                entries = keyed.into_iter().map(|(_, e)| e).collect();
            }
            OrderTarget::Count => {
                if term_req.req.order.order == Order::Desc {
@@ -954,40 +1019,12 @@ where
            }
        }

-        let (term_doc_count_before_cutoff, sum_other_doc_count) = if order_by_sub_aggregation {
-            (0, 0)
-        } else {
-            cut_off_buckets(&mut entries, term_req.req.segment_size as usize)
-        };
+        let (term_doc_count_before_cutoff, sum_other_doc_count) =
+            cut_off_buckets(&mut entries, term_req.req.segment_size as usize);

        let mut dict: FxHashMap<IntermediateKey, IntermediateTermBucketEntry> = Default::default();
        dict.reserve(entries.len());

-        let into_intermediate_bucket_entry =
-            |bucket: Bucket,
-             sub_agg: &mut Option<CachedSubAggs<C>>|
-             -> crate::Result<IntermediateTermBucketEntry> {
-                if let Some(sub_agg) = sub_agg {
-                    let mut sub_aggregation_res = IntermediateAggregationResults::default();
-                    sub_agg
-                        .get_sub_agg_collector()
-                        .add_intermediate_aggregation_result(
-                            agg_data,
-                            &mut sub_aggregation_res,
-                            bucket.bucket_id,
-                        )?;
-                    Ok(IntermediateTermBucketEntry {
-                        doc_count: bucket.count,
-                        sub_aggregation: sub_aggregation_res,
-                    })
-                } else {
-                    Ok(IntermediateTermBucketEntry {
-                        doc_count: bucket.count,
-                        sub_aggregation: Default::default(),
-                    })
-                }
-            };
-
        if term_req.column_type == ColumnType::Str {
            let fallback_dict = Dictionary::empty();
            let term_dict = term_req
@@ -998,7 +1035,11 @@ where

            if let Some((intermediate_key, bucket)) = extract_missing_value(&mut entries, term_req)
            {
-                let intermediate_entry = into_intermediate_bucket_entry(bucket, sub_agg)?;
+                let intermediate_entry = into_intermediate_bucket_entry(
+                    bucket,
+                    reborrow_opt_collector(&mut sub_agg_collector),
+                    agg_data,
+                )?;
                dict.insert(intermediate_key, intermediate_entry);
            }

@@ -1006,19 +1047,28 @@ where
            entries.sort_unstable_by_key(|bucket| bucket.0);

            let (term_ids, buckets): (Vec<u64>, Vec<Bucket>) = entries.into_iter().unzip();
-            let mut buckets_it = buckets.into_iter();

-            term_dict.sorted_ords_to_term_cb(term_ids.into_iter(), |term| {
-                let bucket = buckets_it.next().unwrap();
-                let intermediate_entry =
-                    into_intermediate_bucket_entry(bucket, sub_agg).map_err(io::Error::other)?;
+            let intermediate_entries: Vec<IntermediateTermBucketEntry> = buckets
+                .into_iter()
+                .map(|bucket| {
+                    into_intermediate_bucket_entry(
+                        bucket,
+                        reborrow_opt_collector(&mut sub_agg_collector),
+                        agg_data,
+                    )
+                })
+                .collect::<crate::Result<_>>()?;
+
+            let mut intermediate_entry_it = intermediate_entries.into_iter();
+
+            term_dict.sorted_ords_to_term_cb(&term_ids[..], |term| {
+                let intermediate_entry = intermediate_entry_it.next().unwrap();
                dict.insert(
                    IntermediateKey::Str(
                        String::from_utf8(term.to_vec()).expect("could not convert to String"),
                    ),
                    intermediate_entry,
                );
-                Ok(())
            })?;

            if term_req.req.min_doc_count == 0 {
@@ -1053,14 +1103,22 @@ where
            }
        } else if term_req.column_type == ColumnType::DateTime {
            for (val, doc_count) in entries {
-                let intermediate_entry = into_intermediate_bucket_entry(doc_count, sub_agg)?;
+                let intermediate_entry = into_intermediate_bucket_entry(
+                    doc_count,
+                    reborrow_opt_collector(&mut sub_agg_collector),
+                    agg_data,
+                )?;
                let val = i64::from_u64(val);
                let date = format_date(val)?;
                dict.insert(IntermediateKey::Str(date), intermediate_entry);
            }
        } else if term_req.column_type == ColumnType::Bool {
            for (val, doc_count) in entries {
-                let intermediate_entry = into_intermediate_bucket_entry(doc_count, sub_agg)?;
+                let intermediate_entry = into_intermediate_bucket_entry(
+                    doc_count,
+                    reborrow_opt_collector(&mut sub_agg_collector),
+                    agg_data,
+                )?;
                let val = bool::from_u64(val);
                dict.insert(IntermediateKey::Bool(val), intermediate_entry);
            }
@@ -1080,14 +1138,22 @@ where
                })?;

            for (val, doc_count) in entries {
-                let intermediate_entry = into_intermediate_bucket_entry(doc_count, sub_agg)?;
+                let intermediate_entry = into_intermediate_bucket_entry(
+                    doc_count,
+                    reborrow_opt_collector(&mut sub_agg_collector),
+                    agg_data,
+                )?;
                let val: u128 = compact_space_accessor.compact_to_u128(val as u32);
                let val = Ipv6Addr::from_u128(val);
                dict.insert(IntermediateKey::IpAddr(val), intermediate_entry);
            }
        } else {
            for (val, doc_count) in entries {
-                let intermediate_entry = into_intermediate_bucket_entry(doc_count, sub_agg)?;
+                let intermediate_entry = into_intermediate_bucket_entry(
+                    doc_count,
+                    reborrow_opt_collector(&mut sub_agg_collector),
+                    agg_data,
+                )?;
                if term_req.column_type == ColumnType::U64 {
                    dict.insert(IntermediateKey::U64(val), intermediate_entry);
                } else if term_req.column_type == ColumnType::I64 {
@@ -1121,13 +1187,13 @@ where
    }
 }

-impl<TermMap: TermAggregationMap, C: SubAggCache> SegmentTermCollector<TermMap, C> {
+impl<TermMap: TermAggregationMap, B: SubAggBuffer> SegmentTermCollector<TermMap, B> {
    #[inline]
    fn collect_terms_with_docs(
        iter: impl Iterator<Item = (crate::DocId, u64)>,
        term_buckets: &mut TermMap,
        bucket_id_provider: &mut BucketIdProvider,
-        sub_agg: &mut CachedSubAggs<C>,
+        sub_agg: &mut BufferedSubAggs<B>,
    ) {
        for (doc, term_id) in iter {
            let bucket_id = term_buckets.term_entry(term_id, bucket_id_provider);
@@ -1200,7 +1266,7 @@ mod tests {
    use crate::aggregation::{AggregationLimitsGuard, DistributedAggregationCollector};
    use crate::indexer::NoMergePolicy;
    use crate::query::AllQuery;
-    use crate::schema::{IntoIpv6Addr, Schema, FAST, STRING};
+    use crate::schema::{IntoIpv6Addr, Schema, FAST, INDEXED, STRING, TEXT};
    use crate::{Index, IndexWriter};

    #[test]
@@ -1729,6 +1795,263 @@ mod tests {
        Ok(())
    }

+    #[test]
+    fn terms_aggregation_order_by_cardinality_desc_single_segment() -> crate::Result<()> {
+        terms_aggregation_order_by_cardinality_desc(true)
+    }
+    #[test]
+    fn terms_aggregation_order_by_cardinality_desc_multi_segment() -> crate::Result<()> {
+        terms_aggregation_order_by_cardinality_desc(false)
+    }
+    fn terms_aggregation_order_by_cardinality_desc(merge_segments: bool) -> crate::Result<()> {
+        // Distinct score values per bucket key: A→5, B→1, C→3.
+        // Order by cardinality desc must yield A, C, B.
+        let segment_and_terms = vec![vec![
+            (1.0, "A".to_string()),
+            (2.0, "A".to_string()),
+            (3.0, "A".to_string()),
+            (4.0, "A".to_string()),
+            (5.0, "A".to_string()),
+            (1.0, "B".to_string()),
+            (1.0, "B".to_string()),
+            (1.0, "B".to_string()),
+            (1.0, "C".to_string()),
+            (2.0, "C".to_string()),
+            (3.0, "C".to_string()),
+        ]];
+        let index = get_test_index_from_values_and_terms(merge_segments, &segment_and_terms)?;
+
+        let agg_req: Aggregations = serde_json::from_value(json!({
+            "my_texts": {
+                "terms": {
+                    "field": "string_id",
+                    "order": { "card": "desc" }
+                },
+                "aggs": {
+                    "card": { "cardinality": { "field": "score" } }
+                }
+            }
+        }))
+        .unwrap();
+
+        let res = exec_request(agg_req, &index)?;
+        assert_eq!(res["my_texts"]["buckets"][0]["key"], "A");
+        assert_eq!(res["my_texts"]["buckets"][0]["card"]["value"], 5.0);
+        assert_eq!(res["my_texts"]["buckets"][1]["key"], "C");
+        assert_eq!(res["my_texts"]["buckets"][1]["card"]["value"], 3.0);
+        assert_eq!(res["my_texts"]["buckets"][2]["key"], "B");
+        assert_eq!(res["my_texts"]["buckets"][2]["card"]["value"], 1.0);
+
+        // Asc engages the segment-cutoff path too (monotonic-safe: discarded buckets had
+        // local card >= cutoff, so merged card >= cutoff and they cannot be globally smallest).
+        let agg_req: Aggregations = serde_json::from_value(json!({
+            "my_texts": {
+                "terms": {
+                    "field": "string_id",
+                    "order": { "card": "asc" }
+                },
+                "aggs": {
+                    "card": { "cardinality": { "field": "score" } }
+                }
+            }
+        }))
+        .unwrap();
+        let res = exec_request(agg_req, &index)?;
+        assert_eq!(res["my_texts"]["buckets"][0]["key"], "B");
+        assert_eq!(res["my_texts"]["buckets"][1]["key"], "C");
+        assert_eq!(res["my_texts"]["buckets"][2]["key"], "A");
+
+        // size=2 with desc engages the segment cutoff: must keep top-2 by cardinality (A, C),
+        // and `sum_other_doc_count` reflects the dropped B (3 docs).
+        let agg_req: Aggregations = serde_json::from_value(json!({
+            "my_texts": {
+                "terms": {
+                    "field": "string_id",
+                    "size": 2,
+                    "order": { "card": "desc" }
+                },
+                "aggs": {
+                    "card": { "cardinality": { "field": "score" } }
+                }
+            }
+        }))
+        .unwrap();
+        let res = exec_request(agg_req, &index)?;
+        assert_eq!(res["my_texts"]["buckets"][0]["key"], "A");
+        assert_eq!(res["my_texts"]["buckets"][1]["key"], "C");
+        assert_eq!(res["my_texts"]["buckets"].as_array().unwrap().len(), 2);
+
+        // size=2 with asc engages the segment cutoff: must keep bottom-2 by cardinality (B, C).
+        let agg_req: Aggregations = serde_json::from_value(json!({
+            "my_texts": {
+                "terms": {
+                    "field": "string_id",
+                    "size": 2,
+                    "order": { "card": "asc" }
+                },
+                "aggs": {
+                    "card": { "cardinality": { "field": "score" } }
+                }
+            }
+        }))
+        .unwrap();
+        let res = exec_request(agg_req, &index)?;
+        assert_eq!(res["my_texts"]["buckets"][0]["key"], "B");
+        assert_eq!(res["my_texts"]["buckets"][1]["key"], "C");
+        assert_eq!(res["my_texts"]["buckets"].as_array().unwrap().len(), 2);
+
+        Ok(())
+    }
+
+    #[test]
+    fn terms_aggregation_order_by_sum_single_segment() -> crate::Result<()> {
+        terms_aggregation_order_by_sum(true)
+    }
+    #[test]
+    fn terms_aggregation_order_by_sum_multi_segment() -> crate::Result<()> {
+        terms_aggregation_order_by_sum(false)
+    }
+    fn terms_aggregation_order_by_sum(merge_segments: bool) -> crate::Result<()> {
+        // Per-bucket sums on the U64 `score` column (non-negative => sum is monotonic):
+        //   A → 1+2+3+4+5 = 15, B → 1+1+1 = 3, C → 1+2+3 = 6.
+        let segment_and_terms = vec![
+            vec![
+                (1.0, "A".to_string()),
+                (2.0, "A".to_string()),
+                (3.0, "A".to_string()),
+                (1.0, "B".to_string()),
+                (1.0, "C".to_string()),
+            ],
+            vec![
+                (4.0, "A".to_string()),
+                (5.0, "A".to_string()),
+                (1.0, "B".to_string()),
+                (1.0, "B".to_string()),
+                (2.0, "C".to_string()),
+                (3.0, "C".to_string()),
+            ],
+        ];
+        let index = get_test_index_from_values_and_terms(merge_segments, &segment_and_terms)?;
+
+        // Desc on a Sum metric engages the fast path (column is U64).
+        let agg_req: Aggregations = serde_json::from_value(json!({
+            "my_texts": {
+                "terms": {
+                    "field": "string_id",
+                    "order": { "total": "desc" }
+                },
+                "aggs": {
+                    "total": { "sum": { "field": "score" } }
+                }
+            }
+        }))
+        .unwrap();
+        let res = exec_request(agg_req, &index)?;
+        assert_eq!(res["my_texts"]["buckets"][0]["key"], "A");
+        assert_eq!(res["my_texts"]["buckets"][0]["total"]["value"], 15.0);
+        assert_eq!(res["my_texts"]["buckets"][1]["key"], "C");
+        assert_eq!(res["my_texts"]["buckets"][1]["total"]["value"], 6.0);
+        assert_eq!(res["my_texts"]["buckets"][2]["key"], "B");
+        assert_eq!(res["my_texts"]["buckets"][2]["total"]["value"], 3.0);
+
+        // Asc engages the fast path too — discarded buckets had local sum >= cutoff,
+        // and merged sum >= local (non-negative addends), so they cannot be globally smallest.
+        let agg_req: Aggregations = serde_json::from_value(json!({
+            "my_texts": {
+                "terms": {
+                    "field": "string_id",
+                    "order": { "total": "asc" }
+                },
+                "aggs": {
+                    "total": { "sum": { "field": "score" } }
+                }
+            }
+        }))
+        .unwrap();
+        let res = exec_request(agg_req, &index)?;
+        assert_eq!(res["my_texts"]["buckets"][0]["key"], "B");
+        assert_eq!(res["my_texts"]["buckets"][1]["key"], "C");
+        assert_eq!(res["my_texts"]["buckets"][2]["key"], "A");
+
+        // size=2 desc with cutoff: top-2 by sum (A, C).
+        let agg_req: Aggregations = serde_json::from_value(json!({
+            "my_texts": {
+                "terms": {
+                    "field": "string_id",
+                    "size": 2,
+                    "order": { "total": "desc" }
+                },
+                "aggs": {
+                    "total": { "sum": { "field": "score" } }
+                }
+            }
+        }))
+        .unwrap();
+        let res = exec_request(agg_req, &index)?;
+        assert_eq!(res["my_texts"]["buckets"][0]["key"], "A");
+        assert_eq!(res["my_texts"]["buckets"][1]["key"], "C");
+        assert_eq!(res["my_texts"]["buckets"].as_array().unwrap().len(), 2);
+
+        // Stats sub-property: ordering by `mystats.sum` on a U64 column also engages.
+        let agg_req: Aggregations = serde_json::from_value(json!({
+            "my_texts": {
+                "terms": {
+                    "field": "string_id",
+                    "order": { "mystats.sum": "desc" }
+                },
+                "aggs": {
+                    "mystats": { "stats": { "field": "score" } }
+                }
+            }
+        }))
+        .unwrap();
+        let res = exec_request(agg_req, &index)?;
+        assert_eq!(res["my_texts"]["buckets"][0]["key"], "A");
+        assert_eq!(res["my_texts"]["buckets"][1]["key"], "C");
+        assert_eq!(res["my_texts"]["buckets"][2]["key"], "B");
+
+        // Sum on a signed column (I64) takes the same cutoff path. Results may be
+        // approximate near the boundary on adversarial data, but for this dataset the
+        // top-K is unambiguous.
+        let agg_req: Aggregations = serde_json::from_value(json!({
+            "my_texts": {
+                "terms": {
+                    "field": "string_id",
+                    "order": { "total": "desc" }
+                },
+                "aggs": {
+                    "total": { "sum": { "field": "score_i64" } }
+                }
+            }
+        }))
+        .unwrap();
+        let res = exec_request(agg_req, &index)?;
+        assert_eq!(res["my_texts"]["buckets"][0]["key"], "A");
+        assert_eq!(res["my_texts"]["buckets"][1]["key"], "C");
+        assert_eq!(res["my_texts"]["buckets"][2]["key"], "B");
+
+        // Order by extended_stats sub-property exercises compute_metric_value on the
+        // ExtendedStats collector. A→max=5, B→max=1, C→max=3, so desc by max → A, C, B.
+        let agg_req: Aggregations = serde_json::from_value(json!({
+            "my_texts": {
+                "terms": {
+                    "field": "string_id",
+                    "order": { "ext.max": "desc" }
+                },
+                "aggs": {
+                    "ext": { "extended_stats": { "field": "score" } }
+                }
+            }
+        }))
+        .unwrap();
+        let res = exec_request(agg_req, &index)?;
+        assert_eq!(res["my_texts"]["buckets"][0]["key"], "A");
+        assert_eq!(res["my_texts"]["buckets"][1]["key"], "C");
+        assert_eq!(res["my_texts"]["buckets"][2]["key"], "B");
+
+        Ok(())
+    }
+
    #[test]
    fn terms_aggregation_test_order_key_single_segment() -> crate::Result<()> {
        terms_aggregation_test_order_key_merge_segment(true)
@@ -2347,7 +2670,7 @@ mod tests {

        // text field
        assert_eq!(res["my_texts"]["buckets"][0]["key"], "Hello Hello");
-        assert_eq!(res["my_texts"]["buckets"][0]["doc_count"], 5);
+        assert_eq!(res["my_texts"]["buckets"][0]["doc_count"], 4);
        assert_eq!(res["my_texts"]["buckets"][1]["key"], "Empty");
        assert_eq!(res["my_texts"]["buckets"][1]["doc_count"], 2);
        assert_eq!(
@@ -2356,7 +2679,7 @@ mod tests {
        );
        // text field with number as missing fallback
        assert_eq!(res["my_texts2"]["buckets"][0]["key"], "Hello Hello");
-        assert_eq!(res["my_texts2"]["buckets"][0]["doc_count"], 5);
+        assert_eq!(res["my_texts2"]["buckets"][0]["doc_count"], 4);
        assert_eq!(res["my_texts2"]["buckets"][1]["key"], 1337.0);
        assert_eq!(res["my_texts2"]["buckets"][1]["doc_count"], 2);
        assert_eq!(
@@ -2370,7 +2693,7 @@ mod tests {
        assert_eq!(res["my_ids"]["buckets"][0]["key"], 1337.0);
        assert_eq!(res["my_ids"]["buckets"][0]["doc_count"], 4);
        assert_eq!(res["my_ids"]["buckets"][1]["key"], 1.0);
-        assert_eq!(res["my_ids"]["buckets"][1]["doc_count"], 3);
+        assert_eq!(res["my_ids"]["buckets"][1]["doc_count"], 2);
        assert_eq!(res["my_ids"]["buckets"][2]["key"], serde_json::Value::Null);

        Ok(())
@@ -2894,4 +3217,101 @@ mod tests {

        Ok(())
    }
+
+    fn prep_index_with_n_unique_terms_plus_one_null(n: u64) -> crate::Result<Index> {
+        let mut schema_builder = Schema::builder();
+        let id_field = schema_builder.add_u64_field("id", INDEXED);
+        let title_field = schema_builder.add_text_field("title", TEXT | FAST);
+        let schema = schema_builder.build();
+        let index = Index::create_in_ram(schema.clone());
+        // set to one thread to guarantee all docs end up in the same segment
+        let mut writer = index.writer_with_num_threads(1, 50_000_000)?;
+
+        writer.add_document(doc!(
+            id_field => 0u64,
+        ))?;
+        for i in 1u64..=n {
+            let title = format!("foo{i}");
+            writer.add_document(doc!(
+                id_field => i,
+                title_field => title,
+            ))?;
+        }
+
+        writer.commit()?;
+
+        Ok(index)
+    }
+
+    #[test]
+    fn null_bitset_bounds_check_regression() -> crate::Result<()> {
+        // include cases
+        for i in 0..=4 {
+            let index = prep_index_with_n_unique_terms_plus_one_null(i * 64)?;
+            let normal_req: Aggregations = serde_json::from_value(json!({
+                "my_bool": {
+                    "terms": {
+                        "field": "title",
+                        "missing": "__NULL__",
+                        "size": 1000,
+                    }
+                }
+            }))?;
+            let include_req: Aggregations = serde_json::from_value(json!({
+                "my_bool": {
+                    "terms": {
+                        "field": "title",
+                        "include": "foo(.*)",
+                        "missing": "__NULL__",
+                        "size": 1000,
+                    }
+                }
+            }))?;
+            let exclude_req: Aggregations = serde_json::from_value(json!({
+                "my_bool": {
+                    "terms": {
+                        "field": "title",
+                        "exclude": "foo(.*)",
+                        "missing": "__NULL__",
+                        "size": 1000,
+                    }
+                }
+            }))?;
+
+            let normal_res = exec_request(normal_req, &index)?;
+            let normal_buckets = normal_res["my_bool"]["buckets"].as_array().unwrap();
+            assert_eq!(
+                normal_buckets.len(),
+                (i * 64) as usize + 1,
+                "The normal request should return all 'foo' buckets, plus the missing term bucket",
+            );
+
+            let include_res = exec_request(include_req, &index)?;
+            eprintln!("include_res: {include_res:?}");
+            let include_buckets = include_res["my_bool"]["buckets"].as_array().unwrap();
+            assert_eq!(
+                include_buckets.len(),
+                (i * 64) as usize,
+                "The include request should return all 'foo' buckets, and not the missing term \
+                 bucket",
+            );
+            assert!(include_buckets
+                .iter()
+                .all(|b| b["key"].as_str().unwrap().starts_with("foo")));
+
+            let exclude_res = exec_request(exclude_req, &index)?;
+            let exclude_buckets = exclude_res["my_bool"]["buckets"].as_array().unwrap();
+            if i != 0 {
+                // TODO: Remove this if after fixing exclude + missing bug
+                assert_eq!(
+                    exclude_buckets.len(),
+                    1,
+                    "The exclude request should exclude all 'foo' buckets, and only the missing \
+                     term bucket",
+                );
+                assert_eq!(exclude_buckets[0]["key"], "__NULL__");
+            }
+        }
+        Ok(())
+    }
 }
--- a/src/aggregation/bucket/term_missing_agg.rs
+++ b/src/aggregation/bucket/term_missing_agg.rs
@@ -5,7 +5,7 @@ use crate::aggregation::agg_data::{
    build_segment_agg_collectors, AggRefNode, AggregationsSegmentCtx,
 };
 use crate::aggregation::bucket::term_agg::TermsAggregation;
-use crate::aggregation::cached_sub_aggs::{CachedSubAggs, HighCardCachedSubAggs};
+use crate::aggregation::buffered_sub_aggs::{BufferedSubAggs, HighCardBufferedSubAggs};
 use crate::aggregation::intermediate_agg_result::{
    IntermediateAggregationResult, IntermediateAggregationResults, IntermediateBucketResult,
    IntermediateKey, IntermediateTermBucketEntry, IntermediateTermBucketResult,
@@ -47,7 +47,7 @@ struct MissingCount {
 #[derive(Default, Debug)]
 pub struct TermMissingAgg {
    accessor_idx: usize,
-    sub_agg: Option<HighCardCachedSubAggs>,
+    sub_agg: Option<HighCardBufferedSubAggs>,
    /// Idx = parent bucket id, Value = missing count for that bucket
    missing_count_per_bucket: Vec<MissingCount>,
    bucket_id_provider: BucketIdProvider,
@@ -66,7 +66,7 @@ impl TermMissingAgg {
            None
        };

-        let sub_agg = sub_agg.map(CachedSubAggs::new);
+        let sub_agg = sub_agg.map(BufferedSubAggs::new);
        let bucket_id_provider = BucketIdProvider::default();

        Ok(Self {
@@ -177,6 +177,17 @@ impl SegmentAggregationCollector for TermMissingAgg {
        }
        Ok(())
    }
+
+    fn compute_metric_value(
+        &self,
+        _bucket_id: BucketId,
+        _sub_agg_name: &str,
+        _sub_agg_property: &str,
+        _agg_data: &AggregationsSegmentCtx,
+    ) -> Option<f64> {
+        // TODO: forward to `sub_agg` for nested order paths (`missing_agg>metric`).
+        None
+    }
 }

 #[cfg(test)]
--- a/src/aggregation/buffered_sub_aggs.rs
+++ b/src/aggregation/buffered_sub_aggs.rs
@@ -6,7 +6,7 @@ use crate::aggregation::bucket::MAX_NUM_TERMS_FOR_VEC;
 use crate::aggregation::BucketId;
 use crate::DocId;

-/// A cache for sub-aggregations, storing doc ids per bucket id.
+/// A buffer for sub-aggregations, storing doc ids per bucket id.
 /// Depending on the cardinality of the parent aggregation, we use different
 /// storage strategies.
 ///
@@ -24,21 +24,21 @@ use crate::DocId;
 /// aggregations.
 /// What this datastructure does in general is to group docs by bucket id.
 #[derive(Debug)]
-pub(crate) struct CachedSubAggs<C: SubAggCache> {
-    cache: C,
+pub(crate) struct BufferedSubAggs<B: SubAggBuffer> {
+    buffer: B,
    sub_agg_collector: Box<dyn SegmentAggregationCollector>,
    num_docs: usize,
 }

-pub type LowCardCachedSubAggs = CachedSubAggs<LowCardSubAggCache>;
-pub type HighCardCachedSubAggs = CachedSubAggs<HighCardSubAggCache>;
+pub type LowCardBufferedSubAggs = BufferedSubAggs<LowCardSubAggBuffer>;
+pub type HighCardBufferedSubAggs = BufferedSubAggs<HighCardSubAggBuffer>;

 const FLUSH_THRESHOLD: usize = 2048;

-/// A trait for caching sub-aggregation doc ids per bucket id.
+/// A trait for buffering sub-aggregation doc ids per bucket id.
 /// Different implementations can be used depending on the cardinality
 /// of the parent aggregation.
-pub trait SubAggCache: Debug {
+pub trait SubAggBuffer: Debug {
    fn new() -> Self;
    fn push(&mut self, bucket_id: BucketId, doc_id: DocId);
    fn flush_local(
@@ -49,22 +49,22 @@ pub trait SubAggCache: Debug {
    ) -> crate::Result<()>;
 }

-impl<Backend: SubAggCache + Debug> CachedSubAggs<Backend> {
+impl<Backend: SubAggBuffer + Debug> BufferedSubAggs<Backend> {
    pub fn new(sub_agg: Box<dyn SegmentAggregationCollector>) -> Self {
        Self {
-            cache: Backend::new(),
+            buffer: Backend::new(),
            sub_agg_collector: sub_agg,
            num_docs: 0,
        }
    }

-    pub fn get_sub_agg_collector(&mut self) -> &mut Box<dyn SegmentAggregationCollector> {
-        &mut self.sub_agg_collector
+    pub fn get_sub_agg_collector(&mut self) -> &mut dyn SegmentAggregationCollector {
+        &mut *self.sub_agg_collector
    }

    #[inline]
    pub fn push(&mut self, bucket_id: BucketId, doc_id: DocId) {
-        self.cache.push(bucket_id, doc_id);
+        self.buffer.push(bucket_id, doc_id);
        self.num_docs += 1;
    }

@@ -75,7 +75,7 @@ impl<Backend: SubAggCache + Debug> CachedSubAggs<Backend> {
        agg_data: &mut AggregationsSegmentCtx,
    ) -> crate::Result<()> {
        if self.num_docs >= FLUSH_THRESHOLD {
-            self.cache
+            self.buffer
                .flush_local(&mut self.sub_agg_collector, agg_data, false)?;
            self.num_docs = 0;
        }
@@ -85,7 +85,7 @@ impl<Backend: SubAggCache + Debug> CachedSubAggs<Backend> {
    /// Note: this _does_ flush the sub aggregations.
    pub fn flush(&mut self, agg_data: &mut AggregationsSegmentCtx) -> crate::Result<()> {
        if self.num_docs != 0 {
-            self.cache
+            self.buffer
                .flush_local(&mut self.sub_agg_collector, agg_data, true)?;
            self.num_docs = 0;
        }
@@ -94,11 +94,11 @@ impl<Backend: SubAggCache + Debug> CachedSubAggs<Backend> {
    }
 }

-/// Number of partitions for high cardinality sub-aggregation cache.
+/// Number of partitions for high cardinality sub-aggregation buffer.
 const NUM_PARTITIONS: usize = 16;

 #[derive(Debug)]
-pub(crate) struct HighCardSubAggCache {
+pub(crate) struct HighCardSubAggBuffer {
    /// This weird partitioning is used to do some cheap grouping on the bucket ids.
    /// bucket ids are dense, e.g. when we don't detect the cardinality as low cardinality,
    /// but there are just 16 bucket ids, each bucket id will go to its own partition.
@@ -108,7 +108,7 @@ pub(crate) struct HighCardSubAggCache {
    partitions: Box<[PartitionEntry; NUM_PARTITIONS]>,
 }

-impl HighCardSubAggCache {
+impl HighCardSubAggBuffer {
    #[inline]
    fn clear(&mut self) {
        for partition in self.partitions.iter_mut() {
@@ -131,7 +131,7 @@ impl PartitionEntry {
    }
 }

-impl SubAggCache for HighCardSubAggCache {
+impl SubAggBuffer for HighCardSubAggBuffer {
    fn new() -> Self {
        Self {
            partitions: Box::new(core::array::from_fn(|_| PartitionEntry::default())),
@@ -173,14 +173,14 @@ impl SubAggCache for HighCardSubAggCache {
 }

 #[derive(Debug)]
-pub(crate) struct LowCardSubAggCache {
-    /// Cache doc ids per bucket for sub-aggregations.
+pub(crate) struct LowCardSubAggBuffer {
+    /// Buffer doc ids per bucket for sub-aggregations.
    ///
    /// The outer Vec is indexed by BucketId.
    per_bucket_docs: Vec<Vec<DocId>>,
 }

-impl LowCardSubAggCache {
+impl LowCardSubAggBuffer {
    #[inline]
    fn clear(&mut self) {
        for v in &mut self.per_bucket_docs {
@@ -189,7 +189,7 @@ impl LowCardSubAggCache {
    }
 }

-impl SubAggCache for LowCardSubAggCache {
+impl SubAggBuffer for LowCardSubAggBuffer {
    fn new() -> Self {
        Self {
            per_bucket_docs: Vec::new(),
--- a/src/aggregation/collector.rs
+++ b/src/aggregation/collector.rs
@@ -1,6 +1,6 @@
 use super::agg_req::Aggregations;
 use super::agg_result::AggregationResults;
-use super::cached_sub_aggs::LowCardCachedSubAggs;
+use super::buffered_sub_aggs::LowCardBufferedSubAggs;
 use super::intermediate_agg_result::IntermediateAggregationResults;
 use super::AggContextParams;
 // group buffering strategy is chosen explicitly by callers; no need to hash-group on the fly.
@@ -136,7 +136,7 @@ fn merge_fruits(
 /// `AggregationSegmentCollector` does the aggregation collection on a segment.
 pub struct AggregationSegmentCollector {
    aggs_with_accessor: AggregationsSegmentCtx,
-    agg_collector: LowCardCachedSubAggs,
+    agg_collector: LowCardBufferedSubAggs,
    error: Option<TantivyError>,
 }

@@ -152,7 +152,7 @@ impl AggregationSegmentCollector {
        let mut agg_data =
            build_aggregations_data_from_req(agg, reader, segment_ordinal, context.clone())?;
        let mut result =
-            LowCardCachedSubAggs::new(build_segment_agg_collectors_root(&mut agg_data)?);
+            LowCardBufferedSubAggs::new(build_segment_agg_collectors_root(&mut agg_data)?);
        result
            .get_sub_agg_collector()
            .prepare_max_bucket(0, &agg_data)?; // prepare for bucket zero
--- a/src/aggregation/intermediate_agg_result.rs
+++ b/src/aggregation/intermediate_agg_result.rs
@@ -15,18 +15,18 @@ use serde::{Deserialize, Serialize};
 use super::agg_req::{Aggregation, AggregationVariants, Aggregations};
 use super::agg_result::{AggregationResult, BucketResult, MetricResult, RangeBucketEntry};
 use super::bucket::{
-    cut_off_buckets, get_agg_name_and_property, intermediate_histogram_buckets_to_final_buckets,
-    GetDocCount, Order, OrderTarget, RangeAggregation, TermsAggregation,
+    composite_intermediate_key_ordering, cut_off_buckets, get_agg_name_and_property,
+    intermediate_histogram_buckets_to_final_buckets, CompositeAggregation, GetDocCount,
+    MissingOrder, Order, OrderTarget, RangeAggregation, TermsAggregation,
 };
 use super::metric::{
-    AverageMetricResult, CardinalityMetricResult, IntermediateAverage, IntermediateCount,
-    IntermediateExtendedStats, IntermediateMax, IntermediateMin, IntermediateStats,
-    IntermediateSum, PercentilesCollector, TopHitsTopNComputer,
+    IntermediateAverage, IntermediateCount, IntermediateExtendedStats, IntermediateMax,
+    IntermediateMin, IntermediateStats, IntermediateSum, PercentilesCollector, TopHitsTopNComputer,
 };
 use super::segment_agg_result::AggregationLimitsGuard;
 use super::{format_date, AggregationError, Key, SerializedKey};
 use crate::aggregation::agg_result::{
-    AggregationResults, BucketEntries, BucketEntry, FilterBucketResult,
+    AggregationResults, BucketEntries, BucketEntry, CompositeBucketEntry, FilterBucketResult,
 };
 use crate::aggregation::bucket::TermsAggregationInternal;
 use crate::aggregation::metric::CardinalityCollector;
@@ -91,6 +91,19 @@ impl From<IntermediateKey> for Key {

 impl Eq for IntermediateKey {}

+impl std::fmt::Display for IntermediateKey {
+    fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
+        match self {
+            IntermediateKey::Str(val) => f.write_str(val),
+            IntermediateKey::F64(val) => f.write_str(&val.to_string()),
+            IntermediateKey::U64(val) => f.write_str(&val.to_string()),
+            IntermediateKey::I64(val) => f.write_str(&val.to_string()),
+            IntermediateKey::Bool(val) => f.write_str(&val.to_string()),
+            IntermediateKey::IpAddr(val) => f.write_str(&val.to_string()),
+        }
+    }
+}
+
 impl std::hash::Hash for IntermediateKey {
    fn hash<H: std::hash::Hasher>(&self, state: &mut H) {
        core::mem::discriminant(self).hash(state);
@@ -106,6 +119,21 @@ impl std::hash::Hash for IntermediateKey {
 }

 impl IntermediateAggregationResults {
+    /// Returns a reference to the intermediate aggregation result for the given key.
+    pub fn get(&self, key: &str) -> Option<&IntermediateAggregationResult> {
+        self.aggs_res.get(key)
+    }
+
+    /// Removes and returns the intermediate aggregation result for the given key.
+    pub fn remove(&mut self, key: &str) -> Option<IntermediateAggregationResult> {
+        self.aggs_res.remove(key)
+    }
+
+    /// Returns an iterator over the keys in the intermediate aggregation results.
+    pub fn keys(&self) -> impl Iterator<Item = &String> {
+        self.aggs_res.keys()
+    }
+
    /// Add a result
    pub fn push(&mut self, key: String, value: IntermediateAggregationResult) -> crate::Result<()> {
        let entry = self.aggs_res.entry(key);
@@ -253,6 +281,11 @@ pub(crate) fn empty_from_req(req: &Aggregation) -> IntermediateAggregationResult
            doc_count: 0,
            sub_aggregations: IntermediateAggregationResults::default(),
        }),
+        Composite(_) => {
+            IntermediateAggregationResult::Bucket(IntermediateBucketResult::Composite {
+                buckets: IntermediateCompositeBucketResult::default(),
+            })
+        }
    }
 }

@@ -326,11 +359,7 @@ impl IntermediateMetricResult {
    fn into_final_metric_result(self, req: &Aggregation) -> MetricResult {
        match self {
            IntermediateMetricResult::Average(intermediate_avg) => {
-                MetricResult::Average(AverageMetricResult {
-                    value: intermediate_avg.finalize(),
-                    sum: intermediate_avg.sum(),
-                    count: intermediate_avg.count(),
-                })
+                MetricResult::Average(intermediate_avg.finalize().into())
            }
            IntermediateMetricResult::Count(intermediate_count) => {
                MetricResult::Count(intermediate_count.finalize().into())
@@ -358,11 +387,7 @@ impl IntermediateMetricResult {
                MetricResult::TopHits(top_hits.into_final_result())
            }
            IntermediateMetricResult::Cardinality(cardinality) => {
-                let value = cardinality.finalize();
-                MetricResult::Cardinality(CardinalityMetricResult {
-                    value,
-                    sketch: Some(cardinality),
-                })
+                MetricResult::Cardinality(cardinality.finalize().into())
            }
        }
    }
@@ -454,6 +479,11 @@ pub enum IntermediateBucketResult {
        /// Sub-aggregation results
        sub_aggregations: IntermediateAggregationResults,
    },
+    /// Composite aggregation
+    Composite {
+        /// The composite buckets
+        buckets: IntermediateCompositeBucketResult,
+    },
 }

 impl IntermediateBucketResult {
@@ -549,6 +579,13 @@ impl IntermediateBucketResult {
                    sub_aggregations: final_sub_aggregations,
                }))
            }
+            IntermediateBucketResult::Composite { buckets } => {
+                let composite_req = req
+                    .agg
+                    .as_composite()
+                    .expect("unexpected aggregation, expected composite aggregation");
+                buckets.into_final_result(composite_req, req.sub_aggregation(), limits)
+            }
        }
    }

@@ -615,6 +652,16 @@ impl IntermediateBucketResult {
                *doc_count_left += doc_count_right;
                sub_aggs_left.merge_fruits(sub_aggs_right)?;
            }
+            (
+                IntermediateBucketResult::Composite {
+                    buckets: composite_left,
+                },
+                IntermediateBucketResult::Composite {
+                    buckets: composite_right,
+                },
+            ) => {
+                composite_left.merge_fruits(composite_right)?;
+            }
            (IntermediateBucketResult::Range(_), _) => {
                panic!("try merge on different types")
            }
@@ -627,6 +674,9 @@ impl IntermediateBucketResult {
            (IntermediateBucketResult::Filter { .. }, _) => {
                panic!("try merge on different types")
            }
+            (IntermediateBucketResult::Composite { .. }, _) => {
+                panic!("try merge on different types")
+            }
        }
        Ok(())
    }
@@ -648,6 +698,21 @@ pub struct IntermediateTermBucketResult {
 }

 impl IntermediateTermBucketResult {
+    /// Returns a reference to the map of bucket entries keyed by [`IntermediateKey`].
+    pub fn entries(&self) -> &FxHashMap<IntermediateKey, IntermediateTermBucketEntry> {
+        &self.entries
+    }
+
+    /// Returns the count of documents not included in the returned buckets.
+    pub fn sum_other_doc_count(&self) -> u64 {
+        self.sum_other_doc_count
+    }
+
+    /// Returns the upper bound of the error on document counts in the returned buckets.
+    pub fn doc_count_error_upper_bound(&self) -> u64 {
+        self.doc_count_error_upper_bound
+    }
+
    pub(crate) fn into_final_result(
        self,
        req: &TermsAggregation,
@@ -880,6 +945,172 @@ impl MergeFruits for IntermediateHistogramBucketEntry {
    }
 }

+/// Entry for the composite bucket.
+pub type IntermediateCompositeBucketEntry = IntermediateTermBucketEntry;
+
+/// The fully typed key for composite aggregation
+#[derive(Clone, Debug, PartialEq, Serialize, Deserialize)]
+pub enum CompositeIntermediateKey {
+    /// Bool key
+    Bool(bool),
+    /// String key
+    Str(String),
+    /// Float key
+    F64(f64),
+    /// Signed integer key
+    I64(i64),
+    /// Unsigned integer key
+    U64(u64),
+    /// DateTime key, nanoseconds since epoch
+    DateTime(i64),
+    /// IP Address key
+    IpAddr(Ipv6Addr),
+    /// Missing value key
+    Null,
+}
+
+impl Eq for CompositeIntermediateKey {}
+
+impl std::hash::Hash for CompositeIntermediateKey {
+    fn hash<H: std::hash::Hasher>(&self, state: &mut H) {
+        core::mem::discriminant(self).hash(state);
+        match self {
+            CompositeIntermediateKey::Bool(val) => val.hash(state),
+            CompositeIntermediateKey::Str(text) => text.hash(state),
+            CompositeIntermediateKey::F64(val) => val.to_bits().hash(state),
+            CompositeIntermediateKey::U64(val) => val.hash(state),
+            CompositeIntermediateKey::I64(val) => val.hash(state),
+            CompositeIntermediateKey::DateTime(val) => val.hash(state),
+            CompositeIntermediateKey::IpAddr(val) => val.hash(state),
+            CompositeIntermediateKey::Null => {}
+        }
+    }
+}
+
+/// Composite aggregation page.
+#[derive(Default, Clone, Debug, PartialEq, Serialize, Deserialize)]
+pub struct IntermediateCompositeBucketResult {
+    pub(crate) entries: FxHashMap<Vec<CompositeIntermediateKey>, IntermediateCompositeBucketEntry>,
+    pub(crate) target_size: u32,
+    pub(crate) orders: Vec<(Order, MissingOrder)>,
+}
+
+impl IntermediateCompositeBucketResult {
+    pub(crate) fn into_final_result(
+        self,
+        req: &CompositeAggregation,
+        sub_aggregation_req: &Aggregations,
+        limits: &mut AggregationLimitsGuard,
+    ) -> crate::Result<BucketResult> {
+        let trimmed_entry_vec =
+            trim_composite_buckets(self.entries, &self.orders, self.target_size)?;
+        let after_key = trimmed_entry_vec
+            .last()
+            .map(|bucket| {
+                let (intermediate_key, _entry) = bucket;
+                intermediate_key
+                    .iter()
+                    .enumerate()
+                    .map(|(idx, intermediate_key)| {
+                        let source = &req.sources[idx];
+                        (source.name().to_string(), intermediate_key.clone().into())
+                    })
+                    .collect()
+            })
+            .unwrap_or_default();
+
+        let buckets = trimmed_entry_vec
+            .into_iter()
+            .map(|(intermediate_key, entry)| {
+                let key = intermediate_key
+                    .into_iter()
+                    .enumerate()
+                    .map(|(idx, intermediate_key)| {
+                        let source = &req.sources[idx];
+                        (source.name().to_string(), intermediate_key.into())
+                    })
+                    .collect();
+                Ok(CompositeBucketEntry {
+                    key,
+                    doc_count: entry.doc_count as u64,
+                    sub_aggregation: entry
+                        .sub_aggregation
+                        .into_final_result_internal(sub_aggregation_req, limits)?,
+                })
+            })
+            .collect::<crate::Result<Vec<_>>>()?;
+
+        Ok(BucketResult::Composite { after_key, buckets })
+    }
+
+    fn merge_fruits(&mut self, other: IntermediateCompositeBucketResult) -> crate::Result<()> {
+        merge_maps(&mut self.entries, other.entries)?;
+        if self.entries.len() as u32 > 2 * self.target_size {
+            self.trim()?;
+        }
+        Ok(())
+    }
+
+    /// Trim the composite buckets to the target size, according to the ordering.
+    pub(crate) fn trim(&mut self) -> crate::Result<()> {
+        if self.entries.len() as u32 <= self.target_size {
+            return Ok(());
+        }
+
+        let sorted_entries = trim_composite_buckets(
+            std::mem::take(&mut self.entries),
+            &self.orders,
+            self.target_size,
+        )?;
+
+        self.entries = sorted_entries.into_iter().collect();
+        Ok(())
+    }
+}
+
+fn trim_composite_buckets(
+    entries: FxHashMap<Vec<CompositeIntermediateKey>, IntermediateCompositeBucketEntry>,
+    orders: &[(Order, MissingOrder)],
+    target_size: u32,
+) -> crate::Result<
+    Vec<(
+        Vec<CompositeIntermediateKey>,
+        IntermediateCompositeBucketEntry,
+    )>,
+> {
+    let mut entries: Vec<_> = entries.into_iter().collect();
+    let mut sort_error: Option<TantivyError> = None;
+    entries.sort_by(|(left_key, _), (right_key, _)| {
+        if sort_error.is_some() {
+            return Ordering::Equal;
+        }
+
+        for idx in 0..orders.len() {
+            match composite_intermediate_key_ordering(
+                &left_key[idx],
+                &right_key[idx],
+                orders[idx].0,
+                orders[idx].1,
+            ) {
+                Ok(ordering) if ordering != Ordering::Equal => return ordering,
+                Ok(_) => continue,
+                Err(err) => {
+                    sort_error = Some(err);
+                    break;
+                }
+            }
+        }
+        Ordering::Equal
+    });
+
+    if let Some(err) = sort_error {
+        return Err(err);
+    }
+
+    entries.truncate(target_size as usize);
+    Ok(entries)
+}
+
 #[cfg(test)]
 mod tests {
    use std::collections::HashMap;
--- a/src/aggregation/metric/average.rs
+++ b/src/aggregation/metric/average.rs
@@ -55,6 +55,12 @@ impl IntermediateAverage {
    pub(crate) fn from_stats(stats: IntermediateStats) -> Self {
        Self { stats }
    }
+
+    /// Returns a reference to the underlying [`IntermediateStats`].
+    pub fn stats(&self) -> &IntermediateStats {
+        &self.stats
+    }
+
    /// Merges the other intermediate result into self.
    pub fn merge_fruits(&mut self, other: IntermediateAverage) {
        self.stats.merge_fruits(other.stats);
@@ -63,16 +69,6 @@ impl IntermediateAverage {
    pub fn finalize(&self) -> Option<f64> {
        self.stats.finalize().avg
    }
-
-    /// Returns the sum of all collected values.
-    pub fn sum(&self) -> f64 {
-        self.stats.sum
-    }
-
-    /// Returns the count of all collected values.
-    pub fn count(&self) -> u64 {
-        self.stats.count
-    }
 }

 #[cfg(test)]
--- a/src/aggregation/metric/cardinality.rs
+++ b/src/aggregation/metric/cardinality.rs
--- a/src/aggregation/metric/extended_stats.rs
+++ b/src/aggregation/metric/extended_stats.rs
@@ -399,6 +399,26 @@ impl SegmentAggregationCollector for SegmentExtendedStatsCollector {
        }
        Ok(())
    }
+
+    fn compute_metric_value(
+        &self,
+        bucket_id: BucketId,
+        sub_agg_name: &str,
+        sub_agg_property: &str,
+        _agg_data: &AggregationsSegmentCtx,
+    ) -> Option<f64> {
+        if self.name != sub_agg_name {
+            return None;
+        }
+        let extended = self.buckets.get(bucket_id as usize)?;
+        // Finalize is a pure read of accumulators — calling it here for the cutoff sort
+        // doesn't disturb the eventual intermediate result.
+        extended
+            .finalize()
+            .get_value(sub_agg_property)
+            .ok()
+            .flatten()
+    }
 }

 #[cfg(test)]
--- a/src/aggregation/metric/mod.rs
+++ b/src/aggregation/metric/mod.rs
@@ -93,41 +93,6 @@ impl From<Option<f64>> for SingleMetricResult {
    }
 }

-/// Average metric result with intermediate data for merging.
-///
-/// Unlike [`SingleMetricResult`], this struct includes the raw `sum` and `count`
-/// values that can be used for multi-step query merging.
-#[derive(Clone, Debug, PartialEq, Serialize, Deserialize)]
-pub struct AverageMetricResult {
-    /// The computed average value. None if no documents matched.
-    pub value: Option<f64>,
-    /// The sum of all values (for multi-step merging).
-    pub sum: f64,
-    /// The count of all values (for multi-step merging).
-    pub count: u64,
-}
-
-/// Cardinality metric result with computed value and raw HLL sketch for multi-step merging.
-///
-/// The `value` field contains the computed cardinality estimate.
-/// The `sketch` field contains the serialized HyperLogLog++ sketch that can be used
-/// for merging results across multiple query steps.
-#[derive(Clone, Debug, Serialize, Deserialize)]
-pub struct CardinalityMetricResult {
-    /// The computed cardinality estimate.
-    pub value: Option<f64>,
-    /// The serialized HyperLogLog++ sketch for multi-step merging.
-    #[serde(skip_serializing_if = "Option::is_none")]
-    pub sketch: Option<CardinalityCollector>,
-}
-
-impl PartialEq for CardinalityMetricResult {
-    fn eq(&self, other: &Self) -> bool {
-        // Only compare values, not sketch (sketch comparison is complex)
-        self.value == other.value
-    }
-}
-
 /// This is the wrapper of percentile entries, which can be vector or hashmap
 /// depending on if it's keyed or not.
 #[derive(Clone, Debug, PartialEq, Serialize, Deserialize)]
@@ -142,30 +107,19 @@ pub enum PercentileValues {
 #[derive(Clone, Debug, PartialEq, Serialize, Deserialize)]
 /// The entry when requesting percentiles with keyed: false
 pub struct PercentileValuesVecEntry {
-    key: f64,
-    value: f64,
+    /// The percentile key (e.g. 1.0, 5.0, 25.0).
+    pub key: f64,
+    /// The percentile value. `NaN` when there are no values.
+    pub value: f64,
 }

-/// Percentiles metric result with computed values and raw sketch for multi-step merging.
+/// Single-metric aggregations use this common result structure.
 ///
-/// The `values` field contains the computed percentile values.
-/// The `sketch` field contains the serialized DDSketch that can be used for merging
-/// results across multiple query steps.
-#[derive(Clone, Debug, Serialize, Deserialize)]
+/// Main reason to wrap it in value is to match elasticsearch output structure.
+#[derive(Clone, Debug, PartialEq, Serialize, Deserialize)]
 pub struct PercentilesMetricResult {
-    /// The computed percentile values.
+    /// The result of the percentile metric.
    pub values: PercentileValues,
-    /// The serialized DDSketch for multi-step merging.
-    /// This is the raw sketch data that can be deserialized and merged with other sketches.
-    #[serde(skip_serializing_if = "Option::is_none")]
-    pub sketch: Option<PercentilesCollector>,
-}
-
-impl PartialEq for PercentilesMetricResult {
-    fn eq(&self, other: &Self) -> bool {
-        // Only compare values, not sketch (sketch comparison is complex)
-        self.values == other.values
-    }
 }

 /// The top_hits metric results entry
@@ -246,105 +200,4 @@ mod tests {
        assert_eq!(aggregations_res_json["price_min"]["value"], 0.0);
        assert_eq!(aggregations_res_json["price_sum"]["value"], 15.0);
    }
-
-    #[test]
-    fn test_average_returns_sum_and_count() {
-        let mut schema_builder = Schema::builder();
-        let field_options = NumericOptions::default().set_fast();
-        let field = schema_builder.add_f64_field("price", field_options);
-        let index = Index::create_in_ram(schema_builder.build());
-        let mut index_writer: IndexWriter = index.writer_for_tests().unwrap();
-
-        // Add documents with values 0, 1, 2, 3, 4, 5
-        // sum = 15, count = 6, avg = 2.5
-        for i in 0..6 {
-            index_writer
-                .add_document(doc!(
-                    field => i as f64,
-                ))
-                .unwrap();
-        }
-        index_writer.commit().unwrap();
-
-        let aggregations_json = r#"{ "price_avg": { "avg": { "field": "price" } } }"#;
-        let aggregations: Aggregations = serde_json::from_str(aggregations_json).unwrap();
-        let collector = AggregationCollector::from_aggs(aggregations, Default::default());
-        let reader = index.reader().unwrap();
-        let searcher = reader.searcher();
-        let aggregations_res: AggregationResults = searcher.search(&AllQuery, &collector).unwrap();
-        let aggregations_res_json = serde_json::to_value(aggregations_res).unwrap();
-
-        // Verify all three fields are present and correct
-        assert_eq!(aggregations_res_json["price_avg"]["value"], 2.5);
-        assert_eq!(aggregations_res_json["price_avg"]["sum"], 15.0);
-        assert_eq!(aggregations_res_json["price_avg"]["count"], 6);
-    }
-
-    #[test]
-    fn test_percentiles_returns_sketch() {
-        let mut schema_builder = Schema::builder();
-        let field_options = NumericOptions::default().set_fast();
-        let field = schema_builder.add_f64_field("latency", field_options);
-        let index = Index::create_in_ram(schema_builder.build());
-        let mut index_writer: IndexWriter = index.writer_for_tests().unwrap();
-
-        // Add documents with latency values
-        for i in 0..100 {
-            index_writer
-                .add_document(doc!(
-                    field => i as f64,
-                ))
-                .unwrap();
-        }
-        index_writer.commit().unwrap();
-
-        let aggregations_json =
-            r#"{ "latency_percentiles": { "percentiles": { "field": "latency" } } }"#;
-        let aggregations: Aggregations = serde_json::from_str(aggregations_json).unwrap();
-        let collector = AggregationCollector::from_aggs(aggregations, Default::default());
-        let reader = index.reader().unwrap();
-        let searcher = reader.searcher();
-        let aggregations_res: AggregationResults = searcher.search(&AllQuery, &collector).unwrap();
-        let aggregations_res_json = serde_json::to_value(aggregations_res).unwrap();
-
-        // Verify percentile values are present
-        assert!(aggregations_res_json["latency_percentiles"]["values"].is_object());
-        // Verify sketch is present (serialized DDSketch)
-        assert!(aggregations_res_json["latency_percentiles"]["sketch"].is_object());
-    }
-
-    #[test]
-    fn test_cardinality_returns_sketch() {
-        let mut schema_builder = Schema::builder();
-        let field_options = NumericOptions::default().set_fast();
-        let field = schema_builder.add_u64_field("user_id", field_options);
-        let index = Index::create_in_ram(schema_builder.build());
-        let mut index_writer: IndexWriter = index.writer_for_tests().unwrap();
-
-        // Add documents with some duplicate user_ids
-        for i in 0..50 {
-            index_writer
-                .add_document(doc!(
-                    field => (i % 10) as u64,  // 10 unique values
-                ))
-                .unwrap();
-        }
-        index_writer.commit().unwrap();
-
-        let aggregations_json = r#"{ "unique_users": { "cardinality": { "field": "user_id" } } }"#;
-        let aggregations: Aggregations = serde_json::from_str(aggregations_json).unwrap();
-        let collector = AggregationCollector::from_aggs(aggregations, Default::default());
-        let reader = index.reader().unwrap();
-        let searcher = reader.searcher();
-        let aggregations_res: AggregationResults = searcher.search(&AllQuery, &collector).unwrap();
-        let aggregations_res_json = serde_json::to_value(aggregations_res).unwrap();
-
-        // Verify cardinality value is present and approximately correct
-        let cardinality = aggregations_res_json["unique_users"]["value"]
-            .as_f64()
-            .unwrap();
-        assert!(cardinality >= 9.0 && cardinality <= 11.0); // HLL is approximate
-                                                            // Verify sketch is present (serialized HyperLogLog++)
-        assert!(aggregations_res_json["unique_users"]["sketch"].is_object());
-    }
 }
--- a/src/aggregation/metric/percentiles.rs
+++ b/src/aggregation/metric/percentiles.rs
@@ -178,9 +178,6 @@ fn format_percentile(percentile: f64) -> String {
 impl PercentilesCollector {
    /// Convert result into final result. This will query the quantils from the underlying quantil
    /// collector.
-    ///
-    /// The result includes both the computed percentile values and the raw DDSketch
-    /// for multi-step query merging.
    pub fn into_final_result(self, req: &PercentilesAggregationReq) -> PercentilesMetricResult {
        let percentiles: &[f64] = req
            .percents
@@ -213,15 +210,7 @@ impl PercentilesCollector {
                    .collect(),
            )
        };
-        PercentilesMetricResult {
-            values,
-            sketch: Some(self),
-        }
-    }
-
-    /// Returns a reference to the underlying DDSketch.
-    pub fn sketch(&self) -> &sketches_ddsketch::DDSketch {
-        &self.sketch
+        PercentilesMetricResult { values }
    }

    fn new() -> Self {
@@ -233,6 +222,12 @@ impl PercentilesCollector {
        self.sketch.add(val);
    }

+    /// Encode the underlying DDSketch to Java-compatible binary format
+    /// for cross-language serialization with Java consumers.
+    pub fn to_sketch_bytes(&self) -> Vec<u8> {
+        self.sketch.to_java_bytes()
+    }
+
    pub(crate) fn merge_fruits(&mut self, right: PercentilesCollector) -> crate::Result<()> {
        self.sketch.merge(&right.sketch).map_err(|err| {
            TantivyError::AggregationError(AggregationError::InternalError(format!(
@@ -317,6 +312,26 @@ impl SegmentAggregationCollector for SegmentPercentilesCollector {
        }
        Ok(())
    }
+
+    fn compute_metric_value(
+        &self,
+        bucket_id: BucketId,
+        sub_agg_name: &str,
+        sub_agg_property: &str,
+        agg_data: &AggregationsSegmentCtx,
+    ) -> Option<f64> {
+        if agg_data.get_metric_req_data(self.accessor_idx).name != sub_agg_name {
+            return None;
+        }
+        let percentile: f64 = sub_agg_property.parse().ok()?;
+        if !(0.0..=100.0).contains(&percentile) {
+            return None;
+        }
+        let bucket = self.buckets.get(bucket_id as usize)?;
+        // DDSketch.quantile is a pure read; calling it here for the cutoff sort does
+        // not affect the intermediate state used for the final result.
+        bucket.sketch.quantile(percentile / 100.0).ok().flatten()
+    }
 }

 #[cfg(test)]
@@ -336,7 +351,7 @@ mod tests {
    use crate::aggregation::AggregationCollector;
    use crate::query::AllQuery;
    use crate::schema::{Schema, FAST};
-    use crate::Index;
+    use crate::{assert_nearly_equals, Index};

    #[test]
    fn test_aggregation_percentiles_empty_index() -> crate::Result<()> {
@@ -619,12 +634,16 @@ mod tests {
        let res = exec_request_with_query(agg_req, &index, None)?;
        assert_eq!(res["range_with_stats"]["buckets"][0]["doc_count"], 3);

-        assert_eq!(
-            res["range_with_stats"]["buckets"][0]["percentiles"]["values"]["1.0"],
+        assert_nearly_equals!(
+            res["range_with_stats"]["buckets"][0]["percentiles"]["values"]["1.0"]
+                .as_f64()
+                .unwrap(),
            5.0028295751107414
        );
-        assert_eq!(
-            res["range_with_stats"]["buckets"][0]["percentiles"]["values"]["99.0"],
+        assert_nearly_equals!(
+            res["range_with_stats"]["buckets"][0]["percentiles"]["values"]["99.0"]
+                .as_f64()
+                .unwrap(),
            10.07469668951144
        );

@@ -670,8 +689,14 @@ mod tests {

        let res = exec_request_with_query(agg_req, &index, None)?;

-        assert_eq!(res["percentiles"]["values"]["1.0"], 5.0028295751107414);
-        assert_eq!(res["percentiles"]["values"]["99.0"], 10.07469668951144);
+        assert_nearly_equals!(
+            res["percentiles"]["values"]["1.0"].as_f64().unwrap(),
+            5.0028295751107414
+        );
+        assert_nearly_equals!(
+            res["percentiles"]["values"]["99.0"].as_f64().unwrap(),
+            10.07469668951144
+        );

        Ok(())
    }
--- a/src/aggregation/metric/stats.rs
+++ b/src/aggregation/metric/stats.rs
@@ -110,6 +110,16 @@ impl Default for IntermediateStats {
 }

 impl IntermediateStats {
+    /// Returns the number of values collected.
+    pub fn count(&self) -> u64 {
+        self.count
+    }
+
+    /// Returns the sum of all values collected.
+    pub fn sum(&self) -> f64 {
+        self.sum
+    }
+
    /// Merges the other stats intermediate result into self.
    pub fn merge_fruits(&mut self, other: IntermediateStats) {
        self.count += other.count;
@@ -311,6 +321,40 @@ impl<const COLUMN_TYPE_ID: u8> SegmentAggregationCollector
        }
        Ok(())
    }
+
+    fn compute_metric_value(
+        &self,
+        bucket_id: BucketId,
+        sub_agg_name: &str,
+        sub_agg_property: &str,
+        _agg_data: &AggregationsSegmentCtx,
+    ) -> Option<f64> {
+        if self.name != sub_agg_name {
+            return None;
+        }
+        let stats = self.buckets.get(bucket_id as usize)?;
+        // The property depends on what we're collecting:
+        //   - StatsType::Stats exposes count/sum/min/max/avg via dotted property.
+        //   - Single-value kinds (Sum/Count/Min/Max/Average) expect an empty property and return
+        //     the value they were configured to collect.
+        let prop = match self.collecting_for {
+            StatsType::Stats if !sub_agg_property.is_empty() => sub_agg_property,
+            StatsType::Sum if sub_agg_property.is_empty() => "sum",
+            StatsType::Count if sub_agg_property.is_empty() => "count",
+            StatsType::Max if sub_agg_property.is_empty() => "max",
+            StatsType::Min if sub_agg_property.is_empty() => "min",
+            StatsType::Average if sub_agg_property.is_empty() => "avg",
+            _ => return None,
+        };
+        match prop {
+            "count" => Some(stats.count as f64),
+            "sum" => Some(stats.sum),
+            "min" if stats.count > 0 => Some(stats.min),
+            "max" if stats.count > 0 => Some(stats.max),
+            "avg" if stats.count > 0 => Some(stats.sum / stats.count as f64),
+            _ => None,
+        }
+    }
 }

 #[inline]
--- a/src/aggregation/metric/top_hits.rs
+++ b/src/aggregation/metric/top_hits.rs
@@ -644,6 +644,17 @@ impl SegmentAggregationCollector for TopHitsSegmentCollector {
        );
        Ok(())
    }
+
+    fn compute_metric_value(
+        &self,
+        _bucket_id: BucketId,
+        _sub_agg_name: &str,
+        _sub_agg_property: &str,
+        _agg_data: &AggregationsSegmentCtx,
+    ) -> Option<f64> {
+        // top_hits is not a numeric metric and cannot be used as an order target.
+        None
+    }
 }

 #[cfg(test)]
--- a/src/aggregation/mod.rs
+++ b/src/aggregation/mod.rs
@@ -133,7 +133,7 @@ mod agg_limits;
 pub mod agg_req;
 pub mod agg_result;
 pub mod bucket;
-pub(crate) mod cached_sub_aggs;
+pub(crate) mod buffered_sub_aggs;
 mod collector;
 mod date;
 mod error;
--- a/src/aggregation/segment_agg_result.rs
+++ b/src/aggregation/segment_agg_result.rs
@@ -76,6 +76,31 @@ pub trait SegmentAggregationCollector: Debug {
    fn flush(&mut self, _agg_data: &mut AggregationsSegmentCtx) -> crate::Result<()> {
        Ok(())
    }
+
+    /// Compute the segment-level metric value of the named direct-child metric for `bucket_id`.
+    ///
+    /// Used by parent term aggs that order by a sub-aggregation: the parent sorts on
+    /// this value and cuts off at segment time, matching the approximation tradeoff
+    /// Elasticsearch makes for any sub-agg ordering.
+    ///
+    /// `sub_agg_property` is the dotted suffix (e.g. `"sum"` in `mystats.sum`); empty when
+    /// the metric is a single-value kind such as cardinality.
+    ///
+    /// Returns `None` only on name mismatch, unknown property, or empty bucket. Implementations
+    /// may finalize their per-bucket state (e.g. compute a percentile from a sketch); calls
+    /// must be idempotent so the final intermediate result is unaffected.
+    ///
+    /// No default impl on purpose: every collector must decide explicitly whether it
+    /// produces a metric value, forwards into children (single-bucket aggs), or rejects
+    /// the lookup. A silent `None` default would let a parent term agg's cutoff sort all
+    /// buckets to the same key and drop arbitrary winners.
+    fn compute_metric_value(
+        &self,
+        bucket_id: BucketId,
+        sub_agg_name: &str,
+        sub_agg_property: &str,
+        agg_data: &AggregationsSegmentCtx,
+    ) -> Option<f64>;
 }

 #[derive(Default)]
@@ -137,4 +162,21 @@ impl SegmentAggregationCollector for GenericSegmentAggregationResultsCollector {
        }
        Ok(())
    }
+
+    fn compute_metric_value(
+        &self,
+        bucket_id: BucketId,
+        sub_agg_name: &str,
+        sub_agg_property: &str,
+        agg_data: &AggregationsSegmentCtx,
+    ) -> Option<f64> {
+        for agg in &self.aggs {
+            if let Some(value) =
+                agg.compute_metric_value(bucket_id, sub_agg_name, sub_agg_property, agg_data)
+            {
+                return Some(value);
+            }
+        }
+        None
+    }
 }
--- a/src/codec/mod.rs
+++ b/src/codec/mod.rs
@@ -0,0 +1,229 @@
+/// Codec specific to postings data.
+pub mod postings;
+
+/// Standard tantivy codec. This is the codec you use by default.
+pub mod standard;
+
+use std::io;
+
+pub use standard::StandardCodec;
+
+use crate::codec::postings::PostingsCodec;
+use crate::fieldnorm::FieldNormReader;
+use crate::postings::{Postings, TermInfo};
+use crate::query::score_combiner::DoNothingCombiner;
+use crate::query::term_query::TermScorer;
+use crate::query::{box_scorer, Bm25Weight, BufferedUnionScorer, Scorer, SumCombiner};
+use crate::schema::IndexRecordOption;
+use crate::{DocId, InvertedIndexReader, Score};
+
+/// Codecs describes how data is layed out on disk.
+///
+/// For the moment, only postings codec can be custom.
+pub trait Codec: Clone + std::fmt::Debug + Send + Sync + 'static {
+    /// The specific postings type used by this codec.
+    type PostingsCodec: PostingsCodec;
+
+    /// ID of the codec. It should be unique to your codec.
+    /// Make it human-readable, descriptive, short and unique.
+    const ID: &'static str;
+
+    /// Load codec based on the codec configuration.
+    fn from_json_props(json_value: &serde_json::Value) -> crate::Result<Self>;
+
+    /// Get codec configuration.
+    fn to_json_props(&self) -> serde_json::Value;
+
+    /// Returns the postings codec.
+    fn postings_codec(&self) -> &Self::PostingsCodec;
+}
+
+/// Object-safe codec is a Codec that can be used in a trait object.
+///
+/// The point of it is to offer a way to use a codec without a proliferation of generics.
+pub trait ObjectSafeCodec: 'static + Send + Sync {
+    /// Loads a type-erased Postings object for the given term.
+    ///
+    /// If the schema used to build the index did not provide enough
+    /// information to match the requested `option`, a Postings is still
+    /// returned in a best-effort manner.
+    fn load_postings_type_erased(
+        &self,
+        term_info: &TermInfo,
+        option: IndexRecordOption,
+        inverted_index_reader: &InvertedIndexReader,
+    ) -> io::Result<Box<dyn Postings>>;
+
+    /// Loads a type-erased TermScorer object for the given term.
+    ///
+    /// If the schema used to build the index did not provide enough
+    /// information to match the requested `option`, a TermScorer is still
+    /// returned in a best-effort manner.
+    ///
+    /// The point of this contraption is that the return TermScorer is backed,
+    /// not by Box<dyn Postings> but by the codec's concrete Postings type.
+    fn load_term_scorer_type_erased(
+        &self,
+        term_info: &TermInfo,
+        option: IndexRecordOption,
+        inverted_index_reader: &InvertedIndexReader,
+        fieldnorm_reader: FieldNormReader,
+        similarity_weight: Bm25Weight,
+    ) -> io::Result<Box<dyn Scorer>>;
+
+    /// Loads a type-erased PhraseScorer object for the given term.
+    ///
+    /// If the schema used to build the index did not provide enough
+    /// information to match the requested `option`, a TermScorer is still
+    /// returned in a best-effort manner.
+    ///
+    /// The point of this contraption is that the return PhraseScorer is backed,
+    /// not by Box<dyn Postings> but by the codec's concrete Postings type.
+    fn new_phrase_scorer_type_erased(
+        &self,
+        term_infos: &[(usize, TermInfo)],
+        similarity_weight: Option<Bm25Weight>,
+        fieldnorm_reader: FieldNormReader,
+        slop: u32,
+        inverted_index_reader: &InvertedIndexReader,
+    ) -> io::Result<Box<dyn Scorer>>;
+
+    /// Performs a for_each_pruning operation on the given scorer.
+    ///
+    /// The function will go through matching documents and call the callback
+    /// function for all docs with a score exceeding the threshold.
+    ///
+    /// The function itself will return a larger threshold value,
+    /// meant to update the threshold value.
+    ///
+    /// If the codec and the scorer allow it, this function can rely on
+    /// optimizations like the block-max wand.
+    fn for_each_pruning(
+        &self,
+        threshold: Score,
+        scorer: Box<dyn Scorer>,
+        callback: &mut dyn FnMut(DocId, Score) -> Score,
+    );
+
+    /// Builds a union scorer possibly specialized if
+    /// all scorers are `Term<Self::Postings>`.
+    fn build_union_scorer_with_sum_combiner(
+        &self,
+        scorers: Vec<Box<dyn Scorer>>,
+        num_docs: DocId,
+        score_combiner_type: SumOrDoNothingCombiner,
+    ) -> Box<dyn Scorer>;
+}
+
+impl<TCodec: Codec> ObjectSafeCodec for TCodec {
+    fn load_postings_type_erased(
+        &self,
+        term_info: &TermInfo,
+        option: IndexRecordOption,
+        inverted_index_reader: &InvertedIndexReader,
+    ) -> io::Result<Box<dyn Postings>> {
+        let postings = inverted_index_reader
+            .read_postings_from_terminfo_specialized(term_info, option, self)?;
+        Ok(Box::new(postings))
+    }
+
+    fn load_term_scorer_type_erased(
+        &self,
+        term_info: &TermInfo,
+        option: IndexRecordOption,
+        inverted_index_reader: &InvertedIndexReader,
+        fieldnorm_reader: FieldNormReader,
+        similarity_weight: Bm25Weight,
+    ) -> io::Result<Box<dyn Scorer>> {
+        let scorer = inverted_index_reader.new_term_scorer_specialized(
+            term_info,
+            option,
+            fieldnorm_reader,
+            similarity_weight,
+            self,
+        )?;
+        Ok(box_scorer(scorer))
+    }
+
+    fn new_phrase_scorer_type_erased(
+        &self,
+        term_infos: &[(usize, TermInfo)],
+        similarity_weight: Option<Bm25Weight>,
+        fieldnorm_reader: FieldNormReader,
+        slop: u32,
+        inverted_index_reader: &InvertedIndexReader,
+    ) -> io::Result<Box<dyn Scorer>> {
+        let scorer = inverted_index_reader.new_phrase_scorer_type_specialized(
+            term_infos,
+            similarity_weight,
+            fieldnorm_reader,
+            slop,
+            self,
+        )?;
+        Ok(box_scorer(scorer))
+    }
+
+    fn build_union_scorer_with_sum_combiner(
+        &self,
+        scorers: Vec<Box<dyn Scorer>>,
+        num_docs: DocId,
+        sum_or_do_nothing_combiner: SumOrDoNothingCombiner,
+    ) -> Box<dyn Scorer> {
+        if !scorers.iter().all(|scorer| {
+            scorer.is::<TermScorer<<<Self as Codec>::PostingsCodec as PostingsCodec>::Postings>>()
+        }) {
+            return box_scorer(BufferedUnionScorer::build(
+                scorers,
+                SumCombiner::default,
+                num_docs,
+            ));
+        }
+        let specialized_scorers: Vec<
+            TermScorer<<<Self as Codec>::PostingsCodec as PostingsCodec>::Postings>,
+        > = scorers
+            .into_iter()
+            .map(|scorer| {
+                *scorer.downcast::<TermScorer<_>>().ok().expect(
+                    "Downcast failed despite the fact we already checked the type was correct",
+                )
+            })
+            .collect();
+        match sum_or_do_nothing_combiner {
+            SumOrDoNothingCombiner::Sum => box_scorer(BufferedUnionScorer::build(
+                specialized_scorers,
+                SumCombiner::default,
+                num_docs,
+            )),
+            SumOrDoNothingCombiner::DoNothing => box_scorer(BufferedUnionScorer::build(
+                specialized_scorers,
+                DoNothingCombiner::default,
+                num_docs,
+            )),
+        }
+    }
+
+    fn for_each_pruning(
+        &self,
+        threshold: Score,
+        scorer: Box<dyn Scorer>,
+        callback: &mut dyn FnMut(DocId, Score) -> Score,
+    ) {
+        let accerelerated_foreach_pruning_res =
+            <TCodec as Codec>::PostingsCodec::try_accelerated_for_each_pruning(
+                threshold, scorer, callback,
+            );
+        if let Err(mut scorer) = accerelerated_foreach_pruning_res {
+            // No acceleration available. We need to do things manually.
+            scorer.for_each_pruning(threshold, callback);
+        }
+    }
+}
+
+/// SumCombiner or DoNothingCombiner
+#[derive(Copy, Clone)]
+pub enum SumOrDoNothingCombiner {
+    /// Sum scores together
+    Sum,
+    /// Do not track any score.
+    DoNothing,
+}
--- a/src/query/boolean_query/block_wand.rs
+++ b/src/query/boolean_query/block_wand.rs
@@ -1,5 +1,6 @@
 use std::ops::{Deref, DerefMut};

+use crate::codec::postings::PostingsWithBlockMax;
 use crate::query::term_query::TermScorer;
 use crate::query::Scorer;
 use crate::{DocId, DocSet, Score, TERMINATED};
@@ -13,8 +14,8 @@ use crate::{DocId, DocSet, Score, TERMINATED};
 /// We always have `before_pivot_len` < `pivot_len`.
 ///
 /// `None` is returned if we establish that no document can exceed the threshold.
-fn find_pivot_doc(
-    term_scorers: &[TermScorerWithMaxScore],
+fn find_pivot_doc<TPostings: PostingsWithBlockMax>(
+    term_scorers: &[TermScorerWithMaxScore<TPostings>],
    threshold: Score,
 ) -> Option<(usize, usize, DocId)> {
    let mut max_score = 0.0;
@@ -46,11 +47,11 @@ fn find_pivot_doc(
 /// the next doc candidate defined by the min of `last_doc_in_block + 1` for
 /// scorer in scorers[..pivot_len] and `scorer.doc()` for scorer in scorers[pivot_len..].
 /// Note: before and after calling this method, scorers need to be sorted by their `.doc()`.
-fn block_max_was_too_low_advance_one_scorer(
-    scorers: &mut [TermScorerWithMaxScore],
+fn block_max_was_too_low_advance_one_scorer<TPostings: PostingsWithBlockMax>(
+    scorers: &mut [TermScorerWithMaxScore<TPostings>],
    pivot_len: usize,
 ) {
-    debug_assert!(is_sorted(scorers.iter().map(|scorer| scorer.doc())));
+    debug_assert!(scorers.iter().map(|scorer| scorer.doc()).is_sorted());
    let mut scorer_to_seek = pivot_len - 1;
    let mut global_max_score = scorers[scorer_to_seek].max_score;
    let mut doc_to_seek_after = scorers[scorer_to_seek].last_doc_in_block();
@@ -76,13 +77,16 @@ fn block_max_was_too_low_advance_one_scorer(
    scorers[scorer_to_seek].seek(doc_to_seek_after);

    restore_ordering(scorers, scorer_to_seek);
-    debug_assert!(is_sorted(scorers.iter().map(|scorer| scorer.doc())));
+    debug_assert!(scorers.iter().map(|scorer| scorer.doc()).is_sorted());
 }

 // Given a list of term_scorers and a `ord` and assuming that `term_scorers[ord]` is sorted
 // except term_scorers[ord] that might be in advance compared to its ranks,
 // bubble up term_scorers[ord] in order to restore the ordering.
-fn restore_ordering(term_scorers: &mut [TermScorerWithMaxScore], ord: usize) {
+fn restore_ordering<TPostings: PostingsWithBlockMax>(
+    term_scorers: &mut [TermScorerWithMaxScore<TPostings>],
+    ord: usize,
+) {
    let doc = term_scorers[ord].doc();
    for i in ord + 1..term_scorers.len() {
        if term_scorers[i].doc() >= doc {
@@ -90,16 +94,17 @@ fn restore_ordering(term_scorers: &mut [TermScorerWithMaxScore], ord: usize) {
        }
        term_scorers.swap(i, i - 1);
    }
-    debug_assert!(is_sorted(term_scorers.iter().map(|scorer| scorer.doc())));
+    debug_assert!(term_scorers.iter().map(|scorer| scorer.doc()).is_sorted());
 }

 // Attempts to advance all term_scorers between `&term_scorers[0..before_len]` to the pivot.
 // If this works, return true.
 // If this fails (ie: one of the term_scorer does not contain `pivot_doc` and seek goes past the
 // pivot), reorder the term_scorers to ensure the list is still sorted and returns `false`.
-// If a term_scorer reach TERMINATED in the process return false remove the term_scorer and return.
-fn align_scorers(
-    term_scorers: &mut Vec<TermScorerWithMaxScore>,
+// If a term_scorer reach TERMINATED in the process return false remove the term_scorer and
+// return.
+fn align_scorers<TPostings: PostingsWithBlockMax>(
+    term_scorers: &mut Vec<TermScorerWithMaxScore<TPostings>>,
    pivot_doc: DocId,
    before_pivot_len: usize,
 ) -> bool {
@@ -126,7 +131,10 @@ fn align_scorers(
 // Assumes terms_scorers[..pivot_len] are positioned on the same doc (pivot_doc).
 // Advance term_scorers[..pivot_len] and out of these removes the terminated scores.
 // Restores the ordering of term_scorers.
-fn advance_all_scorers_on_pivot(term_scorers: &mut Vec<TermScorerWithMaxScore>, pivot_len: usize) {
+fn advance_all_scorers_on_pivot<TPostings: PostingsWithBlockMax>(
+    term_scorers: &mut Vec<TermScorerWithMaxScore<TPostings>>,
+    pivot_len: usize,
+) {
    for term_scorer in &mut term_scorers[..pivot_len] {
        term_scorer.advance();
    }
@@ -145,31 +153,32 @@ fn advance_all_scorers_on_pivot(term_scorers: &mut Vec<TermScorerWithMaxScore>,
 /// Implements the WAND (Weak AND) algorithm for dynamic pruning
 /// described in the paper "Faster Top-k Document Retrieval Using Block-Max Indexes".
 /// Link: <http://engineering.nyu.edu/~suel/papers/bmw.pdf>
-pub fn block_wand(
-    mut scorers: Vec<TermScorer>,
+pub fn block_wand<TPostings: PostingsWithBlockMax>(
+    mut scorers: Vec<TermScorer<TPostings>>,
    mut threshold: Score,
    callback: &mut dyn FnMut(u32, Score) -> Score,
 ) {
-    let mut scorers: Vec<TermScorerWithMaxScore> = scorers
+    scorers.retain(|scorer| scorer.doc() < TERMINATED);
+    if scorers.len() == 1 {
+        let scorer = scorers.pop().unwrap();
+        return block_wand_single_scorer(scorer, threshold, callback);
+    }
+    let mut scorers: Vec<TermScorerWithMaxScore<TPostings>> = scorers
        .iter_mut()
        .map(TermScorerWithMaxScore::from)
        .collect();
-    scorers.sort_by_key(|scorer| scorer.doc());
    // At this point we need to ensure that the scorers are sorted!
-    debug_assert!(is_sorted(scorers.iter().map(|scorer| scorer.doc())));
+    scorers.sort_by_key(|scorer| scorer.doc());
    while let Some((before_pivot_len, pivot_len, pivot_doc)) =
        find_pivot_doc(&scorers[..], threshold)
    {
-        debug_assert!(is_sorted(scorers.iter().map(|scorer| scorer.doc())));
+        debug_assert!(scorers.iter().map(|scorer| scorer.doc()).is_sorted());
        debug_assert_ne!(pivot_doc, TERMINATED);
        debug_assert!(before_pivot_len < pivot_len);

        let block_max_score_upperbound: Score = scorers[..pivot_len]
            .iter_mut()
-            .map(|scorer| {
-                scorer.seek_block(pivot_doc);
-                scorer.block_max_score()
-            })
+            .map(|scorer| scorer.seek_block_max(pivot_doc))
            .sum();

        // Beware after shallow advance, skip readers can be in advance compared to
@@ -220,21 +229,22 @@ pub fn block_wand(
 ///   - On a block, advance until the end and execute `callback` when the doc score is greater or
 ///     equal to the `threshold`.
 pub fn block_wand_single_scorer(
-    mut scorer: TermScorer,
+    mut scorer: TermScorer<impl PostingsWithBlockMax>,
    mut threshold: Score,
    callback: &mut dyn FnMut(u32, Score) -> Score,
 ) {
    let mut doc = scorer.doc();
+    let mut block_max_score = scorer.seek_block_max(doc);
    loop {
        // We position the scorer on a block that can reach
        // the threshold.
-        while scorer.block_max_score() < threshold {
+        while block_max_score < threshold {
            let last_doc_in_block = scorer.last_doc_in_block();
            if last_doc_in_block == TERMINATED {
                return;
            }
            doc = last_doc_in_block + 1;
-            scorer.seek_block(doc);
+            block_max_score = scorer.seek_block_max(doc);
        }
        // Seek will effectively load that block.
        doc = scorer.seek(doc);
@@ -256,48 +266,38 @@ pub fn block_wand_single_scorer(
            }
        }
        doc += 1;
-        scorer.seek_block(doc);
+        block_max_score = scorer.seek_block_max(doc);
    }
 }

-struct TermScorerWithMaxScore<'a> {
-    scorer: &'a mut TermScorer,
+struct TermScorerWithMaxScore<'a, TPostings: PostingsWithBlockMax> {
+    scorer: &'a mut TermScorer<TPostings>,
    max_score: Score,
 }

-impl<'a> From<&'a mut TermScorer> for TermScorerWithMaxScore<'a> {
-    fn from(scorer: &'a mut TermScorer) -> Self {
+impl<'a, TPostings: PostingsWithBlockMax> From<&'a mut TermScorer<TPostings>>
+    for TermScorerWithMaxScore<'a, TPostings>
+{
+    fn from(scorer: &'a mut TermScorer<TPostings>) -> Self {
        let max_score = scorer.max_score();
        TermScorerWithMaxScore { scorer, max_score }
    }
 }

-impl Deref for TermScorerWithMaxScore<'_> {
-    type Target = TermScorer;
+impl<TPostings: PostingsWithBlockMax> Deref for TermScorerWithMaxScore<'_, TPostings> {
+    type Target = TermScorer<TPostings>;

    fn deref(&self) -> &Self::Target {
        self.scorer
    }
 }

-impl DerefMut for TermScorerWithMaxScore<'_> {
+impl<TPostings: PostingsWithBlockMax> DerefMut for TermScorerWithMaxScore<'_, TPostings> {
    fn deref_mut(&mut self) -> &mut Self::Target {
        self.scorer
    }
 }

-fn is_sorted<I: Iterator<Item = DocId>>(mut it: I) -> bool {
-    if let Some(first) = it.next() {
-        let mut prev = first;
-        for doc in it {
-            if doc < prev {
-                return false;
-            }
-            prev = doc;
-        }
-    }
-    true
-}
 #[cfg(test)]
 mod tests {
    use std::cmp::Ordering;
--- a/src/codec/postings/mod.rs
+++ b/src/codec/postings/mod.rs
@@ -0,0 +1,122 @@
+use std::io;
+
+/// Block-max WAND algorithm.
+pub mod block_wand;
+use common::OwnedBytes;
+
+use crate::fieldnorm::FieldNormReader;
+use crate::postings::Postings;
+use crate::query::{Bm25Weight, Scorer};
+use crate::schema::IndexRecordOption;
+use crate::{DocId, Score};
+
+/// Postings codec.
+pub trait PostingsCodec: Send + Sync + 'static {
+    /// Serializer type for the postings codec.
+    type PostingsSerializer: PostingsSerializer;
+    /// Postings type for the postings codec.
+    type Postings: Postings + Clone;
+    /// Creates a new postings serializer.
+    fn new_serializer(
+        &self,
+        avg_fieldnorm: Score,
+        mode: IndexRecordOption,
+        fieldnorm_reader: Option<FieldNormReader>,
+    ) -> Self::PostingsSerializer;
+
+    /// Loads postings
+    ///
+    /// Record option is the option that was passed at indexing time.
+    /// Requested option is the option that is requested.
+    ///
+    /// For instance, we may have term_freq in the posting list
+    /// but we can skip decompressing as we read the posting list.
+    ///
+    /// If record option does not support the requested option,
+    /// this method does NOT return an error and will in fact restrict
+    /// requested_option to what is available.
+    fn load_postings(
+        &self,
+        doc_freq: u32,
+        postings_data: OwnedBytes,
+        record_option: IndexRecordOption,
+        requested_option: IndexRecordOption,
+        positions_data: Option<OwnedBytes>,
+    ) -> io::Result<Self::Postings>;
+
+    /// If your codec supports different ways to accelerate `for_each_pruning` that's
+    /// where you should implement it.
+    ///
+    /// Returning `Err(scorer)` without mutating the scorer nor calling the callback function,
+    /// is never "wrong". It just leaves the responsability to the caller to call a fallback
+    /// implementation on the scorer.
+    ///
+    /// If your codec supports BlockMax-Wand, you just need to have your
+    /// postings implement `PostingsWithBlockMax` and copy what is done in the StandardPostings
+    /// codec to enable it.
+    fn try_accelerated_for_each_pruning(
+        _threshold: Score,
+        scorer: Box<dyn Scorer>,
+        _callback: &mut dyn FnMut(DocId, Score) -> Score,
+    ) -> Result<(), Box<dyn Scorer>> {
+        Err(scorer)
+    }
+}
+
+/// A postings serializer is a listener that is in charge of serializing postings
+///
+/// IO is done only once per postings, once all of the data has been received.
+/// A serializer will therefore contain internal buffers.
+///
+/// A serializer is created once and recycled for all postings.
+///
+/// Clients should use PostingsSerializer as follows.
+/// ```text
+/// // First postings list
+/// serializer.new_term(2, true);
+/// serializer.write_doc(2, 1);
+/// serializer.write_doc(6, 2);
+/// serializer.close_term(3, &mut wrt)?;
+/// // Second postings list
+/// serializer.new_term(1, true);
+/// serializer.write_doc(3, 1);
+/// serializer.close_term(1, &mut wrt)?;
+/// ```
+pub trait PostingsSerializer {
+    /// The term_doc_freq here is the number of documents
+    /// in the postings lists.
+    ///
+    /// It can be used to compute the idf that will be used for the
+    /// blockmax parameters.
+    ///
+    /// If not available (e.g. if we do not collect `term_frequencies`
+    /// blockwand is disabled), the term_doc_freq passed will be set 0.
+    fn new_term(&mut self, term_doc_freq: u32, record_term_freq: bool);
+
+    /// Records a new document id for the current term.
+    /// The serializer may ignore it.
+    fn write_doc(&mut self, doc_id: DocId, term_freq: u32);
+
+    /// Closes the current term and writes the postings list associated.
+    fn close_term(&mut self, doc_freq: u32, wrt: &mut impl io::Write) -> io::Result<()>;
+}
+
+/// A light complement interface to Postings to allow block-max wand acceleration.
+pub trait PostingsWithBlockMax: Postings {
+    /// Moves the postings to the block containign `target_doc` and returns
+    /// an upperbound of the score for documents in the block.
+    ///
+    /// `Warning`: Calling this method may leave the postings in an invalid state.
+    /// callers are required to call seek before calling any other of the
+    /// `Postings` method (like doc / advance etc.).
+    fn seek_block_max(
+        &mut self,
+        target_doc: crate::DocId,
+        fieldnorm_reader: &FieldNormReader,
+        similarity_weight: &Bm25Weight,
+    ) -> Score;
+
+    /// Returns the last document in the current block (or Terminated if this
+    /// is the last block).
+    fn last_doc_in_block(&self) -> crate::DocId;
+}
--- a/src/codec/standard/mod.rs
+++ b/src/codec/standard/mod.rs
@@ -0,0 +1,35 @@
+use serde::{Deserialize, Serialize};
+
+use crate::codec::standard::postings::StandardPostingsCodec;
+use crate::codec::Codec;
+
+/// Tantivy's default postings codec.
+pub mod postings;
+
+/// Tantivy's default codec.
+#[derive(Debug, Default, Clone, Serialize, Deserialize)]
+pub struct StandardCodec;
+
+impl Codec for StandardCodec {
+    type PostingsCodec = StandardPostingsCodec;
+
+    const ID: &'static str = "tantivy-default";
+
+    fn from_json_props(json_value: &serde_json::Value) -> crate::Result<Self> {
+        if !json_value.is_null() {
+            return Err(crate::TantivyError::InvalidArgument(format!(
+                "Codec property for the StandardCodec are unexpected. expected null, got {}",
+                json_value.as_str().unwrap_or("null")
+            )));
+        }
+        Ok(StandardCodec)
+    }
+
+    fn to_json_props(&self) -> serde_json::Value {
+        serde_json::Value::Null
+    }
+
+    fn postings_codec(&self) -> &Self::PostingsCodec {
+        &StandardPostingsCodec
+    }
+}
--- a/src/codec/standard/postings/block.rs
+++ b/src/codec/standard/postings/block.rs
@@ -0,0 +1,50 @@
+use crate::postings::compression::COMPRESSION_BLOCK_SIZE;
+use crate::DocId;
+
+pub struct Block {
+    doc_ids: [DocId; COMPRESSION_BLOCK_SIZE],
+    term_freqs: [u32; COMPRESSION_BLOCK_SIZE],
+    len: usize,
+}
+
+impl Block {
+    pub fn new() -> Self {
+        Block {
+            doc_ids: [0u32; COMPRESSION_BLOCK_SIZE],
+            term_freqs: [0u32; COMPRESSION_BLOCK_SIZE],
+            len: 0,
+        }
+    }
+
+    pub fn doc_ids(&self) -> &[DocId] {
+        &self.doc_ids[..self.len]
+    }
+
+    pub fn term_freqs(&self) -> &[u32] {
+        &self.term_freqs[..self.len]
+    }
+
+    pub fn clear(&mut self) {
+        self.len = 0;
+    }
+
+    pub fn append_doc(&mut self, doc: DocId, term_freq: u32) {
+        let len = self.len;
+        self.doc_ids[len] = doc;
+        self.term_freqs[len] = term_freq;
+        self.len = len + 1;
+    }
+
+    pub fn is_full(&self) -> bool {
+        self.len == COMPRESSION_BLOCK_SIZE
+    }
+
+    pub fn is_empty(&self) -> bool {
+        self.len == 0
+    }
+
+    pub fn last_doc(&self) -> DocId {
+        assert_eq!(self.len, COMPRESSION_BLOCK_SIZE);
+        self.doc_ids[COMPRESSION_BLOCK_SIZE - 1]
+    }
+}
--- a/src/codec/standard/postings/block_segment_postings.rs
+++ b/src/codec/standard/postings/block_segment_postings.rs
@@ -1,28 +1,19 @@
 use std::io;

-use common::VInt;
+use common::{OwnedBytes, VInt};

-use crate::directory::{FileSlice, OwnedBytes};
+use crate::codec::standard::postings::skip::{BlockInfo, SkipReader};
+use crate::codec::standard::postings::FreqReadingOption;
 use crate::fieldnorm::FieldNormReader;
-use crate::postings::compression::{BlockDecoder, VIntDecoder, COMPRESSION_BLOCK_SIZE};
-use crate::postings::{BlockInfo, FreqReadingOption, SkipReader};
+use crate::postings::compression::{BlockDecoder, VIntDecoder as _, COMPRESSION_BLOCK_SIZE};
 use crate::query::Bm25Weight;
 use crate::schema::IndexRecordOption;
 use crate::{DocId, Score, TERMINATED};

-fn max_score<I: Iterator<Item = Score>>(mut it: I) -> Option<Score> {
-    it.next().map(|first| it.fold(first, Score::max))
-}
-
 /// `BlockSegmentPostings` is a cursor iterating over blocks
 /// of documents.
-///
-/// # Warning
-///
-/// While it is useful for some very specific high-performance
-/// use cases, you should prefer using `SegmentPostings` for most usage.
 #[derive(Clone)]
-pub struct BlockSegmentPostings {
+pub(crate) struct BlockSegmentPostings {
    pub(crate) doc_decoder: BlockDecoder,
    block_loaded: bool,
    freq_decoder: BlockDecoder,
@@ -88,7 +79,7 @@ fn split_into_skips_and_postings(
 }

 impl BlockSegmentPostings {
-    /// Opens a `BlockSegmentPostings`.
+    /// Opens a `StandardPostingsReader`.
    /// `doc_freq` is the number of documents in the posting list.
    /// `record_option` represents the amount of data available according to the schema.
    /// `requested_option` is the amount of data requested by the user.
@@ -96,11 +87,10 @@ impl BlockSegmentPostings {
    /// term frequency blocks.
    pub(crate) fn open(
        doc_freq: u32,
-        data: FileSlice,
+        bytes: OwnedBytes,
        mut record_option: IndexRecordOption,
        requested_option: IndexRecordOption,
    ) -> io::Result<BlockSegmentPostings> {
-        let bytes = data.read_bytes()?;
        let (skip_data_opt, postings_data) = split_into_skips_and_postings(doc_freq, bytes)?;
        let skip_reader = match skip_data_opt {
            Some(skip_data) => {
@@ -138,6 +128,86 @@ impl BlockSegmentPostings {
        block_segment_postings.load_block();
        Ok(block_segment_postings)
    }
+}
+
+fn max_score<I: Iterator<Item = Score>>(mut it: I) -> Option<Score> {
+    it.next().map(|first| it.fold(first, Score::max))
+}
+
+impl BlockSegmentPostings {
+    /// Returns the overall number of documents in the block postings.
+    /// It does not take in account whether documents are deleted or not.
+    ///
+    /// This `doc_freq` is simply the sum of the length of all of the blocks
+    /// length, and it does not take in account deleted documents.
+    pub fn doc_freq(&self) -> u32 {
+        self.doc_freq
+    }
+
+    /// Returns the array of docs in the current block.
+    ///
+    /// Before the first call to `.advance()`, the block
+    /// returned by `.docs()` is empty.
+    #[inline]
+    pub fn docs(&self) -> &[DocId] {
+        debug_assert!(self.block_loaded);
+        self.doc_decoder.output_array()
+    }
+
+    /// Return the document at index `idx` of the block.
+    #[inline]
+    pub fn doc(&self, idx: usize) -> u32 {
+        self.doc_decoder.output(idx)
+    }
+
+    /// Return the array of `term freq` in the block.
+    #[inline]
+    pub fn freqs(&self) -> &[u32] {
+        debug_assert!(self.block_loaded);
+        self.freq_decoder.output_array()
+    }
+
+    /// Return the frequency at index `idx` of the block.
+    #[inline]
+    pub fn freq(&self, idx: usize) -> u32 {
+        debug_assert!(self.block_loaded);
+        self.freq_decoder.output(idx)
+    }
+
+    /// Position on a block that may contains `target_doc`.
+    ///
+    /// If all docs are smaller than target, the block loaded may be empty,
+    /// or be the last an incomplete VInt block.
+    pub fn seek(&mut self, target_doc: DocId) -> usize {
+        // Move to the block that might contain our document.
+        self.seek_block_without_loading(target_doc);
+        self.load_block();
+
+        // At this point we are on the block that might contain our document.
+        let doc = self.doc_decoder.seek_within_block(target_doc);
+
+        // The last block is not full and padded with TERMINATED,
+        // so we are guaranteed to have at least one value (real or padding)
+        // that is >= target_doc.
+        debug_assert!(doc < COMPRESSION_BLOCK_SIZE);
+
+        // `doc` is now the first element >= `target_doc`.
+        // If all docs are smaller than target, the current block is incomplete and padded
+        // with TERMINATED. After the search, the cursor points to the first TERMINATED.
+        doc
+    }
+
+    pub fn position_offset(&self) -> u64 {
+        self.skip_reader.position_offset()
+    }
+
+    /// Advance to the next block.
+    pub fn advance(&mut self) {
+        self.skip_reader.advance();
+        self.block_loaded = false;
+        self.block_max_score_cache = None;
+        self.load_block();
+    }

    /// Returns the block_max_score for the current block.
    /// It does not require the block to be loaded. For instance, it is ok to call this method
@@ -160,7 +230,7 @@ impl BlockSegmentPostings {
        }
        // this is the last block of the segment posting list.
        // If it is actually loaded, we can compute block max manually.
-        if self.block_is_loaded() {
+        if self.block_loaded {
            let docs = self.doc_decoder.output_array().iter().cloned();
            let freqs = self.freq_decoder.output_array().iter().cloned();
            let bm25_scores = docs.zip(freqs).map(|(doc, term_freq)| {
@@ -177,112 +247,25 @@ impl BlockSegmentPostings {
        // We do not cache it however, so that it gets computed when once block is loaded.
        bm25_weight.max_score()
    }
+}

-    pub(crate) fn freq_reading_option(&self) -> FreqReadingOption {
-        self.freq_reading_option
-    }
-
-    // Resets the block segment postings on another position
-    // in the postings file.
-    //
-    // This is useful for enumerating through a list of terms,
-    // and consuming the associated posting lists while avoiding
-    // reallocating a `BlockSegmentPostings`.
-    //
-    // # Warning
-    //
-    // This does not reset the positions list.
-    pub(crate) fn reset(&mut self, doc_freq: u32, postings_data: OwnedBytes) -> io::Result<()> {
-        let (skip_data_opt, postings_data) =
-            split_into_skips_and_postings(doc_freq, postings_data)?;
-        self.data = postings_data;
-        self.block_max_score_cache = None;
-        self.block_loaded = false;
-        if let Some(skip_data) = skip_data_opt {
-            self.skip_reader.reset(skip_data, doc_freq);
-        } else {
-            self.skip_reader.reset(OwnedBytes::empty(), doc_freq);
+impl BlockSegmentPostings {
+    /// Returns an empty segment postings object
+    pub fn empty() -> BlockSegmentPostings {
+        BlockSegmentPostings {
+            doc_decoder: BlockDecoder::with_val(TERMINATED),
+            block_loaded: true,
+            freq_decoder: BlockDecoder::with_val(1),
+            freq_reading_option: FreqReadingOption::NoFreq,
+            block_max_score_cache: None,
+            doc_freq: 0,
+            data: OwnedBytes::empty(),
+            skip_reader: SkipReader::new(OwnedBytes::empty(), 0, IndexRecordOption::Basic),
        }
-        self.doc_freq = doc_freq;
-        self.load_block();
-        Ok(())
    }

-    /// Returns the overall number of documents in the block postings.
-    /// It does not take in account whether documents are deleted or not.
-    ///
-    /// This `doc_freq` is simply the sum of the length of all of the blocks
-    /// length, and it does not take in account deleted documents.
-    pub fn doc_freq(&self) -> u32 {
-        self.doc_freq
-    }
-
-    /// Returns the array of docs in the current block.
-    ///
-    /// Before the first call to `.advance()`, the block
-    /// returned by `.docs()` is empty.
-    #[inline]
-    pub fn docs(&self) -> &[DocId] {
-        debug_assert!(self.block_is_loaded());
-        self.doc_decoder.output_array()
-    }
-
-    /// Return the document at index `idx` of the block.
-    #[inline]
-    pub fn doc(&self, idx: usize) -> u32 {
-        self.doc_decoder.output(idx)
-    }
-
-    /// Return the array of `term freq` in the block.
-    #[inline]
-    pub fn freqs(&self) -> &[u32] {
-        debug_assert!(self.block_is_loaded());
-        self.freq_decoder.output_array()
-    }
-
-    /// Return the frequency at index `idx` of the block.
-    #[inline]
-    pub fn freq(&self, idx: usize) -> u32 {
-        debug_assert!(self.block_is_loaded());
-        self.freq_decoder.output(idx)
-    }
-
-    /// Returns the length of the current block.
-    ///
-    /// All blocks have a length of `NUM_DOCS_PER_BLOCK`,
-    /// except the last block that may have a length
-    /// of any number between 1 and `NUM_DOCS_PER_BLOCK - 1`
-    #[inline]
-    pub fn block_len(&self) -> usize {
-        debug_assert!(self.block_is_loaded());
-        self.doc_decoder.output_len
-    }
-
-    /// Position on a block that may contains `target_doc`.
-    ///
-    /// If all docs are smaller than target, the block loaded may be empty,
-    /// or be the last an incomplete VInt block.
-    pub fn seek(&mut self, target_doc: DocId) -> usize {
-        // Move to the block that might contain our document.
-        self.seek_block(target_doc);
-        self.load_block();
-
-        // At this point we are on the block that might contain our document.
-        let doc = self.doc_decoder.seek_within_block(target_doc);
-
-        // The last block is not full and padded with TERMINATED,
-        // so we are guaranteed to have at least one value (real or padding)
-        // that is >= target_doc.
-        debug_assert!(doc < COMPRESSION_BLOCK_SIZE);
-
-        // `doc` is now the first element >= `target_doc`.
-        // If all docs are smaller than target, the current block is incomplete and padded
-        // with TERMINATED. After the search, the cursor points to the first TERMINATED.
-        doc
-    }
-
-    pub(crate) fn position_offset(&self) -> u64 {
-        self.skip_reader.position_offset()
+    pub(crate) fn skip_reader(&self) -> &SkipReader {
+        &self.skip_reader
    }

    /// Dangerous API! This calls seeks the next block on the skip list,
@@ -291,19 +274,15 @@ impl BlockSegmentPostings {
    /// `.load_block()` needs to be called manually afterwards.
    /// If all docs are smaller than target, the block loaded may be empty,
    /// or be the last an incomplete VInt block.
-    pub(crate) fn seek_block(&mut self, target_doc: DocId) {
+    pub(crate) fn seek_block_without_loading(&mut self, target_doc: DocId) {
        if self.skip_reader.seek(target_doc) {
            self.block_max_score_cache = None;
            self.block_loaded = false;
        }
    }

-    pub(crate) fn block_is_loaded(&self) -> bool {
-        self.block_loaded
-    }
-
    pub(crate) fn load_block(&mut self) {
-        if self.block_is_loaded() {
+        if self.block_loaded {
            return;
        }
        let offset = self.skip_reader.byte_offset();
@@ -351,68 +330,40 @@ impl BlockSegmentPostings {
        }
        self.block_loaded = true;
    }
-
-    /// Advance to the next block.
-    pub fn advance(&mut self) {
-        self.skip_reader.advance();
-        self.block_loaded = false;
-        self.block_max_score_cache = None;
-        self.load_block();
-    }
-
-    /// Returns an empty segment postings object
-    pub fn empty() -> BlockSegmentPostings {
-        BlockSegmentPostings {
-            doc_decoder: BlockDecoder::with_val(TERMINATED),
-            block_loaded: true,
-            freq_decoder: BlockDecoder::with_val(1),
-            freq_reading_option: FreqReadingOption::NoFreq,
-            block_max_score_cache: None,
-            doc_freq: 0,
-            data: OwnedBytes::empty(),
-            skip_reader: SkipReader::new(OwnedBytes::empty(), 0, IndexRecordOption::Basic),
-        }
-    }
-
-    pub(crate) fn skip_reader(&self) -> &SkipReader {
-        &self.skip_reader
-    }
 }

 #[cfg(test)]
 mod tests {
-    use common::HasLen;
+    use common::OwnedBytes;

    use super::BlockSegmentPostings;
+    use crate::codec::postings::PostingsSerializer;
+    use crate::codec::standard::postings::segment_postings::SegmentPostings;
+    use crate::codec::standard::postings::StandardPostingsSerializer;
    use crate::docset::{DocSet, TERMINATED};
-    use crate::index::Index;
    use crate::postings::compression::COMPRESSION_BLOCK_SIZE;
-    use crate::postings::postings::Postings;
-    use crate::postings::SegmentPostings;
-    use crate::schema::{IndexRecordOption, Schema, Term, INDEXED};
-    use crate::DocId;
+    use crate::schema::IndexRecordOption;

-    #[test]
-    fn test_empty_segment_postings() {
-        let mut postings = SegmentPostings::empty();
-        assert_eq!(postings.doc(), TERMINATED);
-        assert_eq!(postings.advance(), TERMINATED);
-        assert_eq!(postings.advance(), TERMINATED);
-        assert_eq!(postings.doc_freq(), 0);
-        assert_eq!(postings.len(), 0);
-    }
-
-    #[test]
-    fn test_empty_postings_doc_returns_terminated() {
-        let mut postings = SegmentPostings::empty();
-        assert_eq!(postings.doc(), TERMINATED);
-        assert_eq!(postings.advance(), TERMINATED);
-    }
-
-    #[test]
-    fn test_empty_postings_doc_term_freq_returns_0() {
-        let postings = SegmentPostings::empty();
-        assert_eq!(postings.term_freq(), 1);
+    #[cfg(test)]
+    fn build_block_postings(docs: &[u32]) -> BlockSegmentPostings {
+        let doc_freq = docs.len() as u32;
+        let mut postings_serializer =
+            StandardPostingsSerializer::new(1.0f32, IndexRecordOption::Basic, None);
+        postings_serializer.new_term(docs.len() as u32, false);
+        for doc in docs {
+            postings_serializer.write_doc(*doc, 1u32);
+        }
+        let mut buffer: Vec<u8> = Vec::new();
+        postings_serializer
+            .close_term(doc_freq, &mut buffer)
+            .unwrap();
+        BlockSegmentPostings::open(
+            doc_freq,
+            OwnedBytes::new(buffer),
+            IndexRecordOption::Basic,
+            IndexRecordOption::Basic,
+        )
+        .unwrap()
    }

    #[test]
@@ -427,7 +378,7 @@ mod tests {

    #[test]
    fn test_block_segment_postings() -> crate::Result<()> {
-        let mut block_segments = build_block_postings(&(0..100_000).collect::<Vec<u32>>())?;
+        let mut block_segments = build_block_postings(&(0..100_000).collect::<Vec<u32>>());
        let mut offset: u32 = 0u32;
        // checking that the `doc_freq` is correct
        assert_eq!(block_segments.doc_freq(), 100_000);
@@ -452,7 +403,7 @@ mod tests {
        doc_ids.push(129);
        doc_ids.push(130);
        {
-            let block_segments = build_block_postings(&doc_ids)?;
+            let block_segments = build_block_postings(&doc_ids);
            let mut docset = SegmentPostings::from_block_postings(block_segments, None);
            assert_eq!(docset.seek(128), 129);
            assert_eq!(docset.doc(), 129);
@@ -461,7 +412,7 @@ mod tests {
            assert_eq!(docset.advance(), TERMINATED);
        }
        {
-            let block_segments = build_block_postings(&doc_ids).unwrap();
+            let block_segments = build_block_postings(&doc_ids);
            let mut docset = SegmentPostings::from_block_postings(block_segments, None);
            assert_eq!(docset.seek(129), 129);
            assert_eq!(docset.doc(), 129);
@@ -470,7 +421,7 @@ mod tests {
            assert_eq!(docset.advance(), TERMINATED);
        }
        {
-            let block_segments = build_block_postings(&doc_ids)?;
+            let block_segments = build_block_postings(&doc_ids);
            let mut docset = SegmentPostings::from_block_postings(block_segments, None);
            assert_eq!(docset.doc(), 0);
            assert_eq!(docset.seek(131), TERMINATED);
@@ -479,38 +430,13 @@ mod tests {
        Ok(())
    }

-    fn build_block_postings(docs: &[DocId]) -> crate::Result<BlockSegmentPostings> {
-        let mut schema_builder = Schema::builder();
-        let int_field = schema_builder.add_u64_field("id", INDEXED);
-        let schema = schema_builder.build();
-        let index = Index::create_in_ram(schema);
-        let mut index_writer = index.writer_for_tests()?;
-        let mut last_doc = 0u32;
-        for &doc in docs {
-            for _ in last_doc..doc {
-                index_writer.add_document(doc!(int_field=>1u64))?;
-            }
-            index_writer.add_document(doc!(int_field=>0u64))?;
-            last_doc = doc + 1;
-        }
-        index_writer.commit()?;
-        let searcher = index.reader()?.searcher();
-        let segment_reader = searcher.segment_reader(0);
-        let inverted_index = segment_reader.inverted_index(int_field).unwrap();
-        let term = Term::from_field_u64(int_field, 0u64);
-        let term_info = inverted_index.get_term_info(&term)?.unwrap();
-        let block_postings = inverted_index
-            .read_block_postings_from_terminfo(&term_info, IndexRecordOption::Basic)?;
-        Ok(block_postings)
-    }
-
    #[test]
    fn test_block_segment_postings_seek() -> crate::Result<()> {
-        let mut docs = vec![0];
+        let mut docs = Vec::new();
        for i in 0..1300 {
            docs.push((i * i / 100) + i);
        }
-        let mut block_postings = build_block_postings(&docs[..])?;
+        let mut block_postings = build_block_postings(&docs[..]);
        for i in &[0, 424, 10000] {
            block_postings.seek(*i);
            let docs = block_postings.docs();
@@ -521,40 +447,4 @@ mod tests {
        assert_eq!(block_postings.doc(COMPRESSION_BLOCK_SIZE - 1), TERMINATED);
        Ok(())
    }
-
-    #[test]
-    fn test_reset_block_segment_postings() -> crate::Result<()> {
-        let mut schema_builder = Schema::builder();
-        let int_field = schema_builder.add_u64_field("id", INDEXED);
-        let schema = schema_builder.build();
-        let index = Index::create_in_ram(schema);
-        let mut index_writer = index.writer_for_tests()?;
-        // create two postings list, one containing even number,
-        // the other containing odd numbers.
-        for i in 0..6 {
-            let doc = doc!(int_field=> (i % 2) as u64);
-            index_writer.add_document(doc)?;
-        }
-        index_writer.commit()?;
-        let searcher = index.reader()?.searcher();
-        let segment_reader = searcher.segment_reader(0);
-
-        let mut block_segments;
-        {
-            let term = Term::from_field_u64(int_field, 0u64);
-            let inverted_index = segment_reader.inverted_index(int_field)?;
-            let term_info = inverted_index.get_term_info(&term)?.unwrap();
-            block_segments = inverted_index
-                .read_block_postings_from_terminfo(&term_info, IndexRecordOption::Basic)?;
-        }
-        assert_eq!(block_segments.docs(), &[0, 2, 4]);
-        {
-            let term = Term::from_field_u64(int_field, 1u64);
-            let inverted_index = segment_reader.inverted_index(int_field)?;
-            let term_info = inverted_index.get_term_info(&term)?.unwrap();
-            inverted_index.reset_block_postings_from_terminfo(&term_info, &mut block_segments)?;
-        }
-        assert_eq!(block_segments.docs(), &[1, 3, 5]);
-        Ok(())
-    }
 }
--- a/src/codec/standard/postings/mod.rs
+++ b/src/codec/standard/postings/mod.rs
@@ -0,0 +1,164 @@
+use std::io;
+
+use crate::codec::postings::block_wand::{block_wand, block_wand_single_scorer};
+use crate::codec::postings::PostingsCodec;
+use crate::codec::standard::postings::block_segment_postings::BlockSegmentPostings;
+pub use crate::codec::standard::postings::segment_postings::SegmentPostings;
+use crate::fieldnorm::FieldNormReader;
+use crate::positions::PositionReader;
+use crate::query::term_query::TermScorer;
+use crate::query::{BufferedUnionScorer, Scorer, SumCombiner};
+use crate::schema::IndexRecordOption;
+use crate::{DocSet as _, Score, TERMINATED};
+
+mod block;
+mod block_segment_postings;
+mod segment_postings;
+mod skip;
+mod standard_postings_serializer;
+
+pub use segment_postings::SegmentPostings as StandardPostings;
+pub use standard_postings_serializer::StandardPostingsSerializer;
+
+/// The default postings codec for tantivy.
+pub struct StandardPostingsCodec;
+
+#[expect(clippy::enum_variant_names)]
+#[derive(Debug, PartialEq, Clone, Copy, Eq)]
+pub(crate) enum FreqReadingOption {
+    NoFreq,
+    SkipFreq,
+    ReadFreq,
+}
+
+impl PostingsCodec for StandardPostingsCodec {
+    type PostingsSerializer = StandardPostingsSerializer;
+    type Postings = SegmentPostings;
+
+    fn new_serializer(
+        &self,
+        avg_fieldnorm: Score,
+        mode: IndexRecordOption,
+        fieldnorm_reader: Option<FieldNormReader>,
+    ) -> Self::PostingsSerializer {
+        StandardPostingsSerializer::new(avg_fieldnorm, mode, fieldnorm_reader)
+    }
+
+    fn load_postings(
+        &self,
+        doc_freq: u32,
+        postings_data: common::OwnedBytes,
+        record_option: IndexRecordOption,
+        requested_option: IndexRecordOption,
+        positions_data_opt: Option<common::OwnedBytes>,
+    ) -> io::Result<Self::Postings> {
+        // Rationalize record_option/requested_option.
+        let requested_option = requested_option.downgrade(record_option);
+        let block_segment_postings =
+            BlockSegmentPostings::open(doc_freq, postings_data, record_option, requested_option)?;
+        let position_reader = positions_data_opt.map(PositionReader::open).transpose()?;
+        Ok(SegmentPostings::from_block_postings(
+            block_segment_postings,
+            position_reader,
+        ))
+    }
+
+    fn try_accelerated_for_each_pruning(
+        mut threshold: Score,
+        mut scorer: Box<dyn Scorer>,
+        callback: &mut dyn FnMut(crate::DocId, Score) -> Score,
+    ) -> Result<(), Box<dyn Scorer>> {
+        scorer = match scorer.downcast::<TermScorer<Self::Postings>>() {
+            Ok(term_scorer) => {
+                block_wand_single_scorer(*term_scorer, threshold, callback);
+                return Ok(());
+            }
+            Err(scorer) => scorer,
+        };
+        let mut union_scorer =
+            scorer.downcast::<BufferedUnionScorer<Box<dyn Scorer>, SumCombiner>>()?;
+        if !union_scorer
+            .scorers()
+            .iter()
+            .all(|scorer| scorer.is::<TermScorer<Self::Postings>>())
+        {
+            return Err(union_scorer);
+        }
+        let doc = union_scorer.doc();
+        if doc == TERMINATED {
+            return Ok(());
+        }
+        let score = union_scorer.score();
+        if score > threshold {
+            threshold = callback(doc, score);
+        }
+        let boxed_scorers: Vec<Box<dyn Scorer>> = union_scorer.into_scorers();
+        let scorers: Vec<TermScorer<Self::Postings>> = boxed_scorers
+            .into_iter()
+            .map(|scorer| {
+                *scorer.downcast::<TermScorer<Self::Postings>>().ok().expect(
+                    "Downcast failed despite the fact we already checked the type was correct",
+                )
+            })
+            .collect();
+        block_wand(scorers, threshold, callback);
+        Ok(())
+    }
+}
+
+#[cfg(test)]
+mod tests {
+    use common::OwnedBytes;
+
+    use super::*;
+    use crate::codec::postings::PostingsSerializer as _;
+    use crate::postings::Postings as _;
+
+    fn test_segment_postings_tf_aux(num_docs: u32, include_term_freq: bool) -> SegmentPostings {
+        let mut postings_serializer =
+            StandardPostingsCodec.new_serializer(1.0f32, IndexRecordOption::WithFreqs, None);
+        let mut buffer = Vec::new();
+        postings_serializer.new_term(num_docs, include_term_freq);
+        for i in 0..num_docs {
+            postings_serializer.write_doc(i, 2);
+        }
+        postings_serializer
+            .close_term(num_docs, &mut buffer)
+            .unwrap();
+        StandardPostingsCodec
+            .load_postings(
+                num_docs,
+                OwnedBytes::new(buffer),
+                IndexRecordOption::WithFreqs,
+                IndexRecordOption::WithFreqs,
+                None,
+            )
+            .unwrap()
+    }
+
+    #[test]
+    fn test_segment_postings_small_block_with_and_without_freq() {
+        let small_block_without_term_freq = test_segment_postings_tf_aux(1, false);
+        assert!(!small_block_without_term_freq.has_freq());
+        assert_eq!(small_block_without_term_freq.doc(), 0);
+        assert_eq!(small_block_without_term_freq.term_freq(), 1);
+
+        let small_block_with_term_freq = test_segment_postings_tf_aux(1, true);
+        assert!(small_block_with_term_freq.has_freq());
+        assert_eq!(small_block_with_term_freq.doc(), 0);
+        assert_eq!(small_block_with_term_freq.term_freq(), 2);
+    }
+
+    #[test]
+    fn test_segment_postings_large_block_with_and_without_freq() {
+        let large_block_without_term_freq = test_segment_postings_tf_aux(128, false);
+        assert!(!large_block_without_term_freq.has_freq());
+        assert_eq!(large_block_without_term_freq.doc(), 0);
+        assert_eq!(large_block_without_term_freq.term_freq(), 1);
+
+        let large_block_with_term_freq = test_segment_postings_tf_aux(128, true);
+        assert!(large_block_with_term_freq.has_freq());
+        assert_eq!(large_block_with_term_freq.doc(), 0);
+        assert_eq!(large_block_with_term_freq.term_freq(), 2);
+    }
+}
--- a/src/codec/standard/postings/segment_postings.rs
+++ b/src/codec/standard/postings/segment_postings.rs
@@ -1,11 +1,14 @@
-use common::HasLen;
+use common::BitSet;

+use super::BlockSegmentPostings;
+use crate::codec::postings::PostingsWithBlockMax;
 use crate::docset::DocSet;
-use crate::fastfield::AliveBitSet;
+use crate::fieldnorm::FieldNormReader;
 use crate::positions::PositionReader;
 use crate::postings::compression::COMPRESSION_BLOCK_SIZE;
-use crate::postings::{BlockSegmentPostings, Postings};
-use crate::{DocId, TERMINATED};
+use crate::postings::{DocFreq, Postings};
+use crate::query::Bm25Weight;
+use crate::{DocId, Score};

 /// `SegmentPostings` represents the inverted list or postings associated with
 /// a term in a `Segment`.
@@ -29,31 +32,6 @@ impl SegmentPostings {
        }
    }

-    /// Compute the number of non-deleted documents.
-    ///
-    /// This method will clone and scan through the posting lists.
-    /// (this is a rather expensive operation).
-    pub fn doc_freq_given_deletes(&self, alive_bitset: &AliveBitSet) -> u32 {
-        let mut docset = self.clone();
-        let mut doc_freq = 0;
-        loop {
-            let doc = docset.doc();
-            if doc == TERMINATED {
-                return doc_freq;
-            }
-            if alive_bitset.is_alive(doc) {
-                doc_freq += 1u32;
-            }
-            docset.advance();
-        }
-    }
-
-    /// Returns the overall number of documents in the block postings.
-    /// It does not take in account whether documents are deleted or not.
-    pub fn doc_freq(&self) -> u32 {
-        self.block_cursor.doc_freq()
-    }
-
    /// Creates a segment postings object with the given documents
    /// and no frequency encoded.
    ///
@@ -64,13 +42,19 @@ impl SegmentPostings {
    /// buffer with the serialized data.
    #[cfg(test)]
    pub fn create_from_docs(docs: &[u32]) -> SegmentPostings {
-        use crate::directory::FileSlice;
-        use crate::postings::serializer::PostingsSerializer;
+        use common::OwnedBytes;
+
        use crate::schema::IndexRecordOption;
        let mut buffer = Vec::new();
        {
+            use crate::codec::postings::PostingsSerializer;
+
            let mut postings_serializer =
-                PostingsSerializer::new(0.0, IndexRecordOption::Basic, None);
+                crate::codec::standard::postings::StandardPostingsSerializer::new(
+                    0.0,
+                    IndexRecordOption::Basic,
+                    None,
+                );
            postings_serializer.new_term(docs.len() as u32, false);
            for &doc in docs {
                postings_serializer.write_doc(doc, 1u32);
@@ -81,7 +65,7 @@ impl SegmentPostings {
        }
        let block_segment_postings = BlockSegmentPostings::open(
            docs.len() as u32,
-            FileSlice::from(buffer),
+            OwnedBytes::new(buffer),
            IndexRecordOption::Basic,
            IndexRecordOption::Basic,
        )
@@ -95,9 +79,11 @@ impl SegmentPostings {
        doc_and_tfs: &[(u32, u32)],
        fieldnorms: Option<&[u32]>,
    ) -> SegmentPostings {
-        use crate::directory::FileSlice;
+        use common::OwnedBytes;
+
+        use crate::codec::postings::PostingsSerializer as _;
+        use crate::codec::standard::postings::StandardPostingsSerializer;
        use crate::fieldnorm::FieldNormReader;
-        use crate::postings::serializer::PostingsSerializer;
        use crate::schema::IndexRecordOption;
        use crate::Score;
        let mut buffer: Vec<u8> = Vec::new();
@@ -114,7 +100,7 @@ impl SegmentPostings {
                total_num_tokens as Score / fieldnorms.len() as Score
            })
            .unwrap_or(0.0);
-        let mut postings_serializer = PostingsSerializer::new(
+        let mut postings_serializer = StandardPostingsSerializer::new(
            average_field_norm,
            IndexRecordOption::WithFreqs,
            fieldnorm_reader,
@@ -128,7 +114,7 @@ impl SegmentPostings {
            .unwrap();
        let block_segment_postings = BlockSegmentPostings::open(
            doc_and_tfs.len() as u32,
-            FileSlice::from(buffer),
+            OwnedBytes::new(buffer),
            IndexRecordOption::WithFreqs,
            IndexRecordOption::WithFreqs,
        )
@@ -158,7 +144,6 @@ impl DocSet for SegmentPostings {
    // next needs to be called a first time to point to the correct element.
    #[inline]
    fn advance(&mut self) -> DocId {
-        debug_assert!(self.block_cursor.block_is_loaded());
        if self.cur == COMPRESSION_BLOCK_SIZE - 1 {
            self.cur = 0;
            self.block_cursor.advance();
@@ -197,13 +182,31 @@ impl DocSet for SegmentPostings {
    }

    fn size_hint(&self) -> u32 {
-        self.len() as u32
+        self.doc_freq().into()
    }
-}

-impl HasLen for SegmentPostings {
-    fn len(&self) -> usize {
-        self.block_cursor.doc_freq() as usize
+    fn fill_bitset(&mut self, bitset: &mut BitSet) {
+        let bitset_max_value: DocId = bitset.max_value();
+        loop {
+            let docs = self.block_cursor.docs();
+            let Some(&last_doc) = docs.last() else {
+                break;
+            };
+            if last_doc < bitset_max_value {
+                // All docs are within the range of the bitset
+                for &doc in docs {
+                    bitset.insert(doc);
+                }
+            } else {
+                for &doc in docs {
+                    if doc < bitset_max_value {
+                        bitset.insert(doc);
+                    }
+                }
+                break;
+            }
+            self.block_cursor.advance();
+        }
    }
 }

@@ -229,6 +232,13 @@ impl Postings for SegmentPostings {
        self.block_cursor.freq(self.cur)
    }

+    /// Returns the overall number of documents in the block postings.
+    /// It does not take in account whether documents are deleted or not.
+    #[inline(always)]
+    fn doc_freq(&self) -> DocFreq {
+        DocFreq::Exact(self.block_cursor.doc_freq())
+    }
+
    fn append_positions_with_offset(&mut self, offset: u32, output: &mut Vec<u32>) {
        let term_freq = self.term_freq();
        let prev_len = output.len();
@@ -252,24 +262,42 @@ impl Postings for SegmentPostings {
            }
        }
    }
+
+    fn has_freq(&self) -> bool {
+        !self.block_cursor.freqs().is_empty()
+    }
+}
+
+impl PostingsWithBlockMax for SegmentPostings {
+    fn seek_block_max(
+        &mut self,
+        target_doc: crate::DocId,
+        fieldnorm_reader: &FieldNormReader,
+        similarity_weight: &Bm25Weight,
+    ) -> Score {
+        self.block_cursor.seek_block_without_loading(target_doc);
+        self.block_cursor
+            .block_max_score(fieldnorm_reader, similarity_weight)
+    }
+
+    fn last_doc_in_block(&self) -> crate::DocId {
+        self.block_cursor.skip_reader().last_doc_in_block()
+    }
 }

 #[cfg(test)]
 mod tests {
-
-    use common::HasLen;
-
    use super::SegmentPostings;
    use crate::docset::{DocSet, TERMINATED};
-    use crate::fastfield::AliveBitSet;
-    use crate::postings::postings::Postings;
+    use crate::postings::Postings;

    #[test]
    fn test_empty_segment_postings() {
        let mut postings = SegmentPostings::empty();
+        assert_eq!(postings.doc(), TERMINATED);
        assert_eq!(postings.advance(), TERMINATED);
        assert_eq!(postings.advance(), TERMINATED);
-        assert_eq!(postings.len(), 0);
+        assert_eq!(postings.doc_freq(), crate::postings::DocFreq::Exact(0));
    }

    #[test]
@@ -284,15 +312,4 @@ mod tests {
        let postings = SegmentPostings::empty();
        assert_eq!(postings.term_freq(), 1);
    }
-
-    #[test]
-    fn test_doc_freq() {
-        let docs = SegmentPostings::create_from_docs(&[0, 2, 10]);
-        assert_eq!(docs.doc_freq(), 3);
-        let alive_bitset = AliveBitSet::for_test_from_deleted_docs(&[2], 12);
-        assert_eq!(docs.doc_freq_given_deletes(&alive_bitset), 2);
-        let all_deleted =
-            AliveBitSet::for_test_from_deleted_docs(&[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11], 12);
-        assert_eq!(docs.doc_freq_given_deletes(&all_deleted), 0);
-    }
 }
--- a/src/codec/standard/postings/skip.rs
+++ b/src/codec/standard/postings/skip.rs
@@ -14,7 +14,11 @@ use crate::{DocId, Score, TERMINATED};
 //   (requiring a 6th bit), but the biggest doc_id we can want to encode is TERMINATED-1, which can
 //   be represented on 31b without delta encoding.
 fn encode_bitwidth(bitwidth: u8, delta_1: bool) -> u8 {
-    assert!(bitwidth < 32);
+    assert!(
+        bitwidth < 32,
+        "bitwidth needs to be less than 32, but got {}",
+        bitwidth
+    );
    bitwidth | ((delta_1 as u8) << 6)
 }

@@ -142,23 +146,6 @@ impl SkipReader {
        skip_reader
    }

-    pub fn reset(&mut self, data: OwnedBytes, doc_freq: u32) {
-        self.last_doc_in_block = if doc_freq >= COMPRESSION_BLOCK_SIZE as u32 {
-            0
-        } else {
-            TERMINATED
-        };
-        self.last_doc_in_previous_block = 0u32;
-        self.owned_read = data;
-        self.block_info = BlockInfo::VInt { num_docs: doc_freq };
-        self.byte_offset = 0;
-        self.remaining_docs = doc_freq;
-        self.position_offset = 0u64;
-        if doc_freq >= COMPRESSION_BLOCK_SIZE as u32 {
-            self.read_block_info();
-        }
-    }
-
    // Returns the block max score for this block if available.
    //
    // The block max score is available for all full bitpacked block,
--- a/src/codec/standard/postings/standard_postings_serializer.rs
+++ b/src/codec/standard/postings/standard_postings_serializer.rs
@@ -0,0 +1,184 @@
+use std::cmp::Ordering;
+use std::io::{self, Write as _};
+
+use common::{BinarySerializable as _, VInt};
+
+use crate::codec::postings::PostingsSerializer;
+use crate::codec::standard::postings::block::Block;
+use crate::codec::standard::postings::skip::SkipSerializer;
+use crate::fieldnorm::FieldNormReader;
+use crate::postings::compression::{BlockEncoder, VIntEncoder as _, COMPRESSION_BLOCK_SIZE};
+use crate::query::Bm25Weight;
+use crate::schema::IndexRecordOption;
+use crate::{DocId, Score};
+
+/// Serializer object for tantivy's default postings format.
+pub struct StandardPostingsSerializer {
+    last_doc_id_encoded: u32,
+
+    block_encoder: BlockEncoder,
+    block: Box<Block>,
+
+    postings_write: Vec<u8>,
+    skip_write: SkipSerializer,
+
+    mode: IndexRecordOption,
+    fieldnorm_reader: Option<FieldNormReader>,
+
+    bm25_weight: Option<Bm25Weight>,
+    avg_fieldnorm: Score, /* Average number of term in the field for that segment.
+                           * this value is used to compute the block wand information. */
+    term_has_freq: bool,
+}
+
+impl StandardPostingsSerializer {
+    pub(crate) fn new(
+        avg_fieldnorm: Score,
+        mode: IndexRecordOption,
+        fieldnorm_reader: Option<FieldNormReader>,
+    ) -> StandardPostingsSerializer {
+        Self {
+            last_doc_id_encoded: 0,
+            block_encoder: BlockEncoder::new(),
+            block: Box::new(Block::new()),
+            postings_write: Vec::new(),
+            skip_write: SkipSerializer::new(),
+            mode,
+            fieldnorm_reader,
+            bm25_weight: None,
+            avg_fieldnorm,
+            term_has_freq: false,
+        }
+    }
+}
+
+impl PostingsSerializer for StandardPostingsSerializer {
+    fn new_term(&mut self, term_doc_freq: u32, record_term_freq: bool) {
+        self.clear();
+
+        self.term_has_freq = self.mode.has_freq() && record_term_freq;
+        if !self.term_has_freq {
+            return;
+        }
+
+        let num_docs_in_segment: u64 =
+            if let Some(fieldnorm_reader) = self.fieldnorm_reader.as_ref() {
+                fieldnorm_reader.num_docs() as u64
+            } else {
+                return;
+            };
+
+        if num_docs_in_segment == 0 {
+            return;
+        }
+
+        self.bm25_weight = Some(Bm25Weight::for_one_term_without_explain(
+            term_doc_freq as u64,
+            num_docs_in_segment,
+            self.avg_fieldnorm,
+        ));
+    }
+
+    fn write_doc(&mut self, doc_id: DocId, term_freq: u32) {
+        self.block.append_doc(doc_id, term_freq);
+        if self.block.is_full() {
+            self.write_block();
+        }
+    }
+
+    fn close_term(&mut self, doc_freq: u32, output_write: &mut impl io::Write) -> io::Result<()> {
+        if !self.block.is_empty() {
+            // we have doc ids waiting to be written
+            // this happens when the number of doc ids is
+            // not a perfect multiple of our block size.
+            //
+            // In that case, the remaining part is encoded
+            // using variable int encoding.
+            {
+                let block_encoded = self
+                    .block_encoder
+                    .compress_vint_sorted(self.block.doc_ids(), self.last_doc_id_encoded);
+                self.postings_write.write_all(block_encoded)?;
+            }
+            // ... Idem for term frequencies
+            if self.term_has_freq {
+                let block_encoded = self
+                    .block_encoder
+                    .compress_vint_unsorted(self.block.term_freqs());
+                self.postings_write.write_all(block_encoded)?;
+            }
+            self.block.clear();
+        }
+        if doc_freq >= COMPRESSION_BLOCK_SIZE as u32 {
+            let skip_data = self.skip_write.data();
+            VInt(skip_data.len() as u64).serialize(output_write)?;
+            output_write.write_all(skip_data)?;
+        }
+        output_write.write_all(&self.postings_write[..])?;
+        self.skip_write.clear();
+        self.postings_write.clear();
+        self.bm25_weight = None;
+        Ok(())
+    }
+}
+
+impl StandardPostingsSerializer {
+    fn clear(&mut self) {
+        self.bm25_weight = None;
+        self.block.clear();
+        self.last_doc_id_encoded = 0;
+    }
+
+    fn write_block(&mut self) {
+        {
+            // encode the doc ids
+            let (num_bits, block_encoded): (u8, &[u8]) = self
+                .block_encoder
+                .compress_block_sorted(self.block.doc_ids(), self.last_doc_id_encoded);
+            self.last_doc_id_encoded = self.block.last_doc();
+            self.skip_write
+                .write_doc(self.last_doc_id_encoded, num_bits);
+            // last el block 0, offset block 1,
+            self.postings_write.extend(block_encoded);
+        }
+        if self.term_has_freq {
+            let (num_bits, block_encoded): (u8, &[u8]) = self
+                .block_encoder
+                .compress_block_unsorted(self.block.term_freqs(), true);
+            self.postings_write.extend(block_encoded);
+            self.skip_write.write_term_freq(num_bits);
+            if self.mode.has_positions() {
+                // We serialize the sum of term freqs within the skip information
+                // in order to navigate through positions.
+                let sum_freq = self.block.term_freqs().iter().cloned().sum();
+                self.skip_write.write_total_term_freq(sum_freq);
+            }
+            let mut blockwand_params = (0u8, 0u32);
+            if let Some(bm25_weight) = self.bm25_weight.as_ref() {
+                if let Some(fieldnorm_reader) = self.fieldnorm_reader.as_ref() {
+                    let docs = self.block.doc_ids().iter().cloned();
+                    let term_freqs = self.block.term_freqs().iter().cloned();
+                    let fieldnorms = docs.map(|doc| fieldnorm_reader.fieldnorm_id(doc));
+                    blockwand_params = fieldnorms
+                        .zip(term_freqs)
+                        .max_by(
+                            |(left_fieldnorm_id, left_term_freq),
+                             (right_fieldnorm_id, right_term_freq)| {
+                                let left_score =
+                                    bm25_weight.tf_factor(*left_fieldnorm_id, *left_term_freq);
+                                let right_score =
+                                    bm25_weight.tf_factor(*right_fieldnorm_id, *right_term_freq);
+                                left_score
+                                    .partial_cmp(&right_score)
+                                    .unwrap_or(Ordering::Equal)
+                            },
+                        )
+                        .unwrap();
+                }
+            }
+            let (fieldnorm_id, term_freq) = blockwand_params;
+            self.skip_write.write_blockwand_max(fieldnorm_id, term_freq);
+        }
+        self.block.clear();
+    }
+}
--- a/src/collector/count_collector.rs
+++ b/src/collector/count_collector.rs
@@ -1,5 +1,6 @@
 use super::Collector;
 use crate::collector::SegmentCollector;
+use crate::query::Weight;
 use crate::{DocId, Score, SegmentOrdinal, SegmentReader};

 /// `CountCollector` collector only counts how many
@@ -55,6 +56,15 @@ impl Collector for Count {
    fn merge_fruits(&self, segment_counts: Vec<usize>) -> crate::Result<usize> {
        Ok(segment_counts.into_iter().sum())
    }
+
+    fn collect_segment(
+        &self,
+        weight: &dyn Weight,
+        _segment_ord: u32,
+        reader: &SegmentReader,
+    ) -> crate::Result<usize> {
+        Ok(weight.count(reader)? as usize)
+    }
 }

 #[derive(Default)]
--- a/src/collector/facet_collector.rs
+++ b/src/collector/facet_collector.rs
@@ -389,6 +389,13 @@ impl SegmentCollector for FacetSegmentCollector {
            }
            let mut facet = vec![];
            let (facet_ord, facet_depth) = self.unique_facet_ords[collapsed_facet_ord];
+            // u64::MAX is used as a sentinel for unmapped ordinals (e.g. when a
+            // document has the exact registered facet, not a child of it).
+            // Passing it to ord_to_term would resolve to the last dictionary
+            // entry and produce a spurious facet from an unrelated branch.
+            if facet_ord == u64::MAX {
+                continue;
+            }
            // TODO handle errors.
            if facet_dict.ord_to_term(facet_ord, &mut facet).is_ok() {
                if let Some((end_collapsed_facet, _)) = facet
@@ -814,6 +821,63 @@ mod tests {
        assert!(!super::is_child_facet(&b"foo\0bar"[..], &b"foo"[..]));
        assert!(!super::is_child_facet(&b"foo"[..], &b"foobar\0baz"[..]));
    }
+
+    // Regression test for https://github.com/quickwit-oss/tantivy/issues/2494
+    // When a document has the exact registered facet path (not just a child),
+    // harvest() must not turn the unmapped sentinel into a spurious root entry.
+    #[test]
+    fn test_facet_collector_wrong_root() -> crate::Result<()> {
+        let mut schema_builder = Schema::builder();
+        let facet_field = schema_builder.add_facet_field("facet", FacetOptions::default());
+        let schema = schema_builder.build();
+        let index = Index::create_in_ram(schema);
+
+        let mut index_writer: IndexWriter = index.writer_for_tests()?;
+        let facets: Vec<&str> = vec![
+            "/science-fiction/asimov",
+            "/science-fiction/clarke",
+            "/science-fiction/dick",
+            "/science-fiction/herbert",
+            "/science-fiction/orwell",
+            // This exact match on the registered facet is the bug trigger:
+            // its ordinal maps to the sentinel (u64::MAX, 0) in the collapse
+            // mapping, which without the fix resolves to an unrelated term.
+            "/fantasy/epic-fantasy",
+            "/fantasy/epic-fantasy/tolkien",
+            "/fantasy/epic-fantasy/martin",
+        ];
+        for facet_str in &facets {
+            index_writer.add_document(doc!(
+                facet_field => Facet::from(*facet_str)
+            ))?;
+        }
+        index_writer.commit()?;
+
+        let reader = index.reader()?;
+        let searcher = reader.searcher();
+
+        let term = Term::from_facet(facet_field, &Facet::from("/fantasy/epic-fantasy"));
+        let query = TermQuery::new(term, IndexRecordOption::Basic);
+
+        let mut facet_collector = FacetCollector::for_field("facet");
+        facet_collector.add_facet("/fantasy/epic-fantasy");
+        let counts: FacetCounts = searcher.search(&query, &facet_collector)?;
+
+        let result: Vec<(String, u64)> = counts
+            .get("/")
+            .map(|(facet, count)| (facet.to_string(), count))
+            .collect();
+
+        // Only children of /fantasy/epic-fantasy should appear, not /science-fiction
+        assert_eq!(
+            result,
+            vec![
+                ("/fantasy/epic-fantasy/martin".to_string(), 1),
+                ("/fantasy/epic-fantasy/tolkien".to_string(), 1),
+            ]
+        );
+        Ok(())
+    }
 }

 #[cfg(all(test, feature = "unstable"))]
--- a/src/collector/sort_key/mod.rs
+++ b/src/collector/sort_key/mod.rs
@@ -1,4 +1,5 @@
 mod order;
+mod sort_by_bytes;
 mod sort_by_erased_type;
 mod sort_by_score;
 mod sort_by_static_fast_value;
@@ -6,6 +7,7 @@ mod sort_by_string;
 mod sort_key_computer;

 pub use order::*;
+pub use sort_by_bytes::SortByBytes;
 pub use sort_by_erased_type::SortByErasedType;
 pub use sort_by_score::SortBySimilarityScore;
 pub use sort_by_static_fast_value::SortByStaticFastValue;
--- a/src/collector/sort_key/sort_by_bytes.rs
+++ b/src/collector/sort_key/sort_by_bytes.rs
@@ -0,0 +1,168 @@
+use columnar::BytesColumn;
+
+use crate::collector::sort_key::NaturalComparator;
+use crate::collector::{SegmentSortKeyComputer, SortKeyComputer};
+use crate::termdict::TermOrdinal;
+use crate::{DocId, Score};
+
+/// Sort by the first value of a bytes column.
+///
+/// If the field is multivalued, only the first value is considered.
+///
+/// Documents that do not have this value are still considered.
+/// Their sort key will simply be `None`.
+#[derive(Debug, Clone)]
+pub struct SortByBytes {
+    column_name: String,
+}
+
+impl SortByBytes {
+    /// Creates a new sort by bytes sort key computer.
+    pub fn for_field(column_name: impl ToString) -> Self {
+        SortByBytes {
+            column_name: column_name.to_string(),
+        }
+    }
+}
+
+impl SortKeyComputer for SortByBytes {
+    type SortKey = Option<Vec<u8>>;
+    type Child = ByBytesColumnSegmentSortKeyComputer;
+    type Comparator = NaturalComparator;
+
+    fn segment_sort_key_computer(
+        &self,
+        segment_reader: &crate::SegmentReader,
+    ) -> crate::Result<Self::Child> {
+        let bytes_column_opt = segment_reader.fast_fields().bytes(&self.column_name)?;
+        Ok(ByBytesColumnSegmentSortKeyComputer { bytes_column_opt })
+    }
+}
+
+/// Segment-level sort key computer for bytes columns.
+pub struct ByBytesColumnSegmentSortKeyComputer {
+    bytes_column_opt: Option<BytesColumn>,
+}
+
+impl SegmentSortKeyComputer for ByBytesColumnSegmentSortKeyComputer {
+    type SortKey = Option<Vec<u8>>;
+    type SegmentSortKey = Option<TermOrdinal>;
+    type SegmentComparator = NaturalComparator;
+
+    #[inline(always)]
+    fn segment_sort_key(&mut self, doc: DocId, _score: Score) -> Option<TermOrdinal> {
+        let bytes_column = self.bytes_column_opt.as_ref()?;
+        bytes_column.ords().first(doc)
+    }
+
+    fn convert_segment_sort_key(&self, term_ord_opt: Option<TermOrdinal>) -> Option<Vec<u8>> {
+        // TODO: Individual lookups to the dictionary like this are very likely to repeatedly
+        // decompress the same blocks. See https://github.com/quickwit-oss/tantivy/issues/2776
+        let term_ord = term_ord_opt?;
+        let bytes_column = self.bytes_column_opt.as_ref()?;
+        let mut bytes = Vec::new();
+        bytes_column
+            .dictionary()
+            .ord_to_term(term_ord, &mut bytes)
+            .ok()?;
+        Some(bytes)
+    }
+}
+
+#[cfg(test)]
+mod tests {
+    use super::SortByBytes;
+    use crate::collector::TopDocs;
+    use crate::query::AllQuery;
+    use crate::schema::{BytesOptions, Schema, FAST, INDEXED};
+    use crate::{Index, IndexWriter, Order, TantivyDocument};
+
+    #[test]
+    fn test_sort_by_bytes_asc() -> crate::Result<()> {
+        let mut schema_builder = Schema::builder();
+        let bytes_field = schema_builder
+            .add_bytes_field("data", BytesOptions::default().set_fast().set_indexed());
+        let id_field = schema_builder.add_u64_field("id", FAST | INDEXED);
+        let schema = schema_builder.build();
+        let index = Index::create_in_ram(schema);
+        let mut index_writer: IndexWriter = index.writer_for_tests()?;
+
+        // Insert documents with byte values in non-sorted order
+        let test_data: Vec<(u64, Vec<u8>)> = vec![
+            (1, vec![0x02, 0x00]),
+            (2, vec![0x00, 0x10]),
+            (3, vec![0x01, 0x00]),
+            (4, vec![0x00, 0x20]),
+        ];
+
+        for (id, bytes) in &test_data {
+            let mut doc = TantivyDocument::new();
+            doc.add_u64(id_field, *id);
+            doc.add_bytes(bytes_field, bytes);
+            index_writer.add_document(doc)?;
+        }
+        index_writer.commit()?;
+
+        let reader = index.reader()?;
+        let searcher = reader.searcher();
+
+        // Sort ascending by bytes
+        let top_docs =
+            TopDocs::with_limit(10).order_by((SortByBytes::for_field("data"), Order::Asc));
+        let results: Vec<(Option<Vec<u8>>, _)> = searcher.search(&AllQuery, &top_docs)?;
+
+        // Expected order: [0x00,0x10], [0x00,0x20], [0x01,0x00], [0x02,0x00]
+        let sorted_bytes: Vec<Option<Vec<u8>>> = results.into_iter().map(|(b, _)| b).collect();
+        assert_eq!(
+            sorted_bytes,
+            vec![
+                Some(vec![0x00, 0x10]),
+                Some(vec![0x00, 0x20]),
+                Some(vec![0x01, 0x00]),
+                Some(vec![0x02, 0x00]),
+            ]
+        );
+
+        Ok(())
+    }
+
+    #[test]
+    fn test_sort_by_bytes_desc() -> crate::Result<()> {
+        let mut schema_builder = Schema::builder();
+        let bytes_field = schema_builder
+            .add_bytes_field("data", BytesOptions::default().set_fast().set_indexed());
+        let schema = schema_builder.build();
+        let index = Index::create_in_ram(schema);
+        let mut index_writer: IndexWriter = index.writer_for_tests()?;
+
+        let test_data: Vec<Vec<u8>> = vec![vec![0x00, 0x10], vec![0x02, 0x00], vec![0x01, 0x00]];
+
+        for bytes in &test_data {
+            let mut doc = TantivyDocument::new();
+            doc.add_bytes(bytes_field, bytes);
+            index_writer.add_document(doc)?;
+        }
+        index_writer.commit()?;
+
+        let reader = index.reader()?;
+        let searcher = reader.searcher();
+
+        // Sort descending by bytes
+        let top_docs =
+            TopDocs::with_limit(10).order_by((SortByBytes::for_field("data"), Order::Desc));
+        let results: Vec<(Option<Vec<u8>>, _)> = searcher.search(&AllQuery, &top_docs)?;
+
+        // Expected order (descending): [0x02,0x00], [0x01,0x00], [0x00,0x10]
+        let sorted_bytes: Vec<Option<Vec<u8>>> = results.into_iter().map(|(b, _)| b).collect();
+        assert_eq!(
+            sorted_bytes,
+            vec![
+                Some(vec![0x02, 0x00]),
+                Some(vec![0x01, 0x00]),
+                Some(vec![0x00, 0x10]),
+            ]
+        );
+
+        Ok(())
+    }
+}
--- a/src/collector/sort_key/sort_by_erased_type.rs
+++ b/src/collector/sort_key/sort_by_erased_type.rs
@@ -1,7 +1,7 @@
 use columnar::{ColumnType, MonotonicallyMappableToU64};

 use crate::collector::sort_key::{
-    NaturalComparator, SortBySimilarityScore, SortByStaticFastValue, SortByString,
+    NaturalComparator, SortByBytes, SortBySimilarityScore, SortByStaticFastValue, SortByString,
 };
 use crate::collector::{SegmentSortKeyComputer, SortKeyComputer};
 use crate::fastfield::FastFieldNotAvailableError;
@@ -114,6 +114,16 @@ impl SortKeyComputer for SortByErasedType {
                            },
                        })
                    }
+                    ColumnType::Bytes => {
+                        let computer = SortByBytes::for_field(column_name);
+                        let inner = computer.segment_sort_key_computer(segment_reader)?;
+                        Box::new(ErasedSegmentSortKeyComputerWrapper {
+                            inner,
+                            converter: |val: Option<Vec<u8>>| {
+                                val.map(OwnedValue::Bytes).unwrap_or(OwnedValue::Null)
+                            },
+                        })
+                    }
                    ColumnType::U64 => {
                        let computer = SortByStaticFastValue::<u64>::for_field(column_name);
                        let inner = computer.segment_sort_key_computer(segment_reader)?;
@@ -281,6 +291,65 @@ mod tests {
        );
    }

+    #[test]
+    fn test_sort_by_owned_bytes() {
+        let mut schema_builder = Schema::builder();
+        let data_field = schema_builder.add_bytes_field("data", FAST);
+        let schema = schema_builder.build();
+        let index = Index::create_in_ram(schema);
+        let mut writer = index.writer_for_tests().unwrap();
+        writer
+            .add_document(doc!(data_field => vec![0x03u8, 0x00]))
+            .unwrap();
+        writer
+            .add_document(doc!(data_field => vec![0x01u8, 0x00]))
+            .unwrap();
+        writer
+            .add_document(doc!(data_field => vec![0x02u8, 0x00]))
+            .unwrap();
+        writer.add_document(doc!()).unwrap();
+        writer.commit().unwrap();
+
+        let reader = index.reader().unwrap();
+        let searcher = reader.searcher();
+
+        // Sort descending (Natural - highest first)
+        let collector = TopDocs::with_limit(10)
+            .order_by((SortByErasedType::for_field("data"), ComparatorEnum::Natural));
+        let top_docs = searcher.search(&AllQuery, &collector).unwrap();
+
+        let values: Vec<OwnedValue> = top_docs.into_iter().map(|(key, _)| key).collect();
+
+        assert_eq!(
+            values,
+            vec![
+                OwnedValue::Bytes(vec![0x03, 0x00]),
+                OwnedValue::Bytes(vec![0x02, 0x00]),
+                OwnedValue::Bytes(vec![0x01, 0x00]),
+                OwnedValue::Null
+            ]
+        );
+
+        // Sort ascending (ReverseNoneLower - lowest first, nulls last)
+        let collector = TopDocs::with_limit(10).order_by((
+            SortByErasedType::for_field("data"),
+            ComparatorEnum::ReverseNoneLower,
+        ));
+        let top_docs = searcher.search(&AllQuery, &collector).unwrap();
+
+        let values: Vec<OwnedValue> = top_docs.into_iter().map(|(key, _)| key).collect();
+
+        assert_eq!(
+            values,
+            vec![
+                OwnedValue::Bytes(vec![0x01, 0x00]),
+                OwnedValue::Bytes(vec![0x02, 0x00]),
+                OwnedValue::Bytes(vec![0x03, 0x00]),
+                OwnedValue::Null
+            ]
+        );
+    }
+
    #[test]
    fn test_sort_by_owned_reverse() {
        let mut schema_builder = Schema::builder();
--- a/src/collector/sort_key/sort_by_score.rs
+++ b/src/collector/sort_key/sort_by_score.rs
@@ -1,5 +1,8 @@
+use std::cmp::{Ordering, Reverse};
+use std::collections::BinaryHeap;
+
 use crate::collector::sort_key::NaturalComparator;
-use crate::collector::{SegmentSortKeyComputer, SortKeyComputer, TopNComputer};
+use crate::collector::{SegmentSortKeyComputer, SortKeyComputer};
 use crate::{DocAddress, DocId, Score};

 /// Sort by similarity score.
@@ -25,6 +28,10 @@ impl SortKeyComputer for SortBySimilarityScore {
    }

    // Sorting by score is special in that it allows for the Block-Wand optimization.
+    //
+    // We use a BinaryHeap (TopNHeap) instead of TopNComputer here so that the
+    // threshold is always the exact K-th best score. TopNComputer only updates its
+    // threshold every K docs (at truncation), giving Block-WAND a stale bound.
    fn collect_segment_top_k(
        &self,
        k: usize,
@@ -32,12 +39,10 @@ impl SortKeyComputer for SortBySimilarityScore {
        reader: &crate::SegmentReader,
        segment_ord: u32,
    ) -> crate::Result<Vec<(Self::SortKey, DocAddress)>> {
-        let mut top_n: TopNComputer<Score, DocId, Self::Comparator> =
-            TopNComputer::new_with_comparator(k, self.comparator());
+        let mut top_n = TopNHeap::new(k);

        if let Some(alive_bitset) = reader.alive_bitset() {
            let mut threshold = Score::MIN;
-            top_n.threshold = Some(threshold);
            weight.for_each_pruning(Score::MIN, reader, &mut |doc, score| {
                if alive_bitset.is_deleted(doc) {
                    return threshold;
@@ -56,7 +61,7 @@ impl SortKeyComputer for SortBySimilarityScore {
        Ok(top_n
            .into_vec()
            .into_iter()
-            .map(|cid| (cid.sort_key, DocAddress::new(segment_ord, cid.doc)))
+            .map(|(score, doc)| (score, DocAddress::new(segment_ord, doc)))
            .collect())
    }
 }
@@ -75,3 +80,204 @@ impl SegmentSortKeyComputer for SortBySimilarityScore {
        score
    }
 }
+
+/// Min-heap entry: higher score = greater, lower doc wins ties.
+struct ScoreHeapEntry {
+    score: Score,
+    doc: DocId,
+}
+
+impl Eq for ScoreHeapEntry {}
+
+impl PartialEq for ScoreHeapEntry {
+    fn eq(&self, other: &Self) -> bool {
+        self.cmp(other) == Ordering::Equal
+    }
+}
+
+impl PartialOrd for ScoreHeapEntry {
+    fn partial_cmp(&self, other: &Self) -> Option<Ordering> {
+        Some(self.cmp(other))
+    }
+}
+
+impl Ord for ScoreHeapEntry {
+    fn cmp(&self, other: &Self) -> Ordering {
+        self.score
+            .partial_cmp(&other.score)
+            .unwrap_or(Ordering::Equal)
+            .then_with(|| other.doc.cmp(&self.doc))
+    }
+}
+
+/// Heap-based top-K for score collection. O(log K) per insert, but the threshold
+/// is always tight, so Block-WAND prunes better than with [`TopNComputer`]'s
+/// buffer/median approach.
+///
+/// Like [`TopNComputer`], items must arrive in ascending doc order, and equal
+/// scores are rejected (strict `>`) so that lower doc IDs win ties.
+///
+/// [`TopNComputer`]: crate::collector::TopNComputer
+struct TopNHeap {
+    heap: BinaryHeap<Reverse<ScoreHeapEntry>>,
+    top_n: usize,
+    threshold: Option<Score>,
+}
+
+impl TopNHeap {
+    fn new(top_n: usize) -> Self {
+        TopNHeap {
+            heap: BinaryHeap::with_capacity(top_n),
+            top_n,
+            threshold: None,
+        }
+    }
+
+    #[inline]
+    fn push(&mut self, score: Score, doc: DocId) {
+        if self.heap.len() < self.top_n {
+            self.heap.push(Reverse(ScoreHeapEntry { score, doc }));
+            if self.heap.len() == self.top_n {
+                self.threshold = self.heap.peek().map(|Reverse(entry)| entry.score);
+            }
+        } else if let Some(threshold) = self.threshold {
+            if score > threshold {
+                // peek_mut + assign is a single sift-down, vs pop + push = two sifts.
+                if let Some(mut min) = self.heap.peek_mut() {
+                    *min = Reverse(ScoreHeapEntry { score, doc });
+                }
+                self.threshold = self.heap.peek().map(|Reverse(entry)| entry.score);
+            }
+        }
+    }
+
+    fn into_vec(self) -> Vec<(Score, DocId)> {
+        self.heap
+            .into_vec()
+            .into_iter()
+            .map(|Reverse(entry)| (entry.score, entry.doc))
+            .collect()
+    }
+}
+
+#[cfg(test)]
+mod tests {
+    use proptest::prelude::*;
+
+    use super::*;
+    use crate::collector::sort_key::NaturalComparator;
+    use crate::collector::TopNComputer;
+
+    #[test]
+    fn test_top_n_heap_zero_capacity() {
+        let mut heap = TopNHeap::new(0);
+        heap.push(1.0, 0);
+        heap.push(2.0, 1);
+        assert!(heap.into_vec().is_empty());
+    }
+
+    #[test]
+    fn test_top_n_heap_basic() {
+        let mut heap = TopNHeap::new(2);
+        heap.push(1.0, 0);
+        heap.push(3.0, 1);
+        heap.push(2.0, 2);
+
+        let mut results = heap.into_vec();
+        results.sort_by(|a, b| b.0.partial_cmp(&a.0).unwrap().then_with(|| a.1.cmp(&b.1)));
+        assert_eq!(results, vec![(3.0, 1), (2.0, 2)]);
+    }
+
+    #[test]
+    fn test_top_n_heap_threshold_always_accurate() {
+        let mut heap = TopNHeap::new(2);
+        assert_eq!(heap.threshold, None);
+
+        heap.push(1.0, 0);
+        assert_eq!(heap.threshold, None);
+
+        heap.push(3.0, 1);
+        assert_eq!(heap.threshold, Some(1.0));
+
+        heap.push(2.0, 2); // evicts 1.0
+        assert_eq!(heap.threshold, Some(2.0));
+
+        heap.push(4.0, 3); // evicts 2.0
+        assert_eq!(heap.threshold, Some(3.0));
+    }
+
+    #[test]
+    fn test_top_n_heap_tiebreaking_lower_doc_wins() {
+        let mut heap = TopNHeap::new(2);
+        heap.push(5.0, 0);
+        heap.push(5.0, 1);
+        heap.push(5.0, 2); // rejected: not strictly > threshold
+
+        let mut results = heap.into_vec();
+        results.sort_by_key(|&(_, doc)| doc);
+        assert_eq!(results, vec![(5.0, 0), (5.0, 1)]);
+    }
+
+    #[test]
+    fn test_top_n_heap_single_element() {
+        let mut heap = TopNHeap::new(1);
+        heap.push(1.0, 0);
+        assert_eq!(heap.threshold, Some(1.0));
+
+        heap.push(0.5, 1); // rejected
+        heap.push(2.0, 2); // accepted
+        assert_eq!(heap.threshold, Some(2.0));
+
+        let results = heap.into_vec();
+        assert_eq!(results, vec![(2.0, 2)]);
+    }
+
+    #[test]
+    fn test_top_n_heap_under_capacity() {
+        let mut heap = TopNHeap::new(5);
+        heap.push(3.0, 0);
+        heap.push(1.0, 1);
+        heap.push(2.0, 2);
+        // Only 3 elements, capacity is 5 — all should be kept
+        assert_eq!(heap.threshold, None);
+
+        let mut results = heap.into_vec();
+        results.sort_by(|a, b| b.0.partial_cmp(&a.0).unwrap().then_with(|| a.1.cmp(&b.1)));
+        assert_eq!(results, vec![(3.0, 0), (2.0, 2), (1.0, 1)]);
+    }
+
+    proptest! {
+        #[test]
+        fn test_top_n_heap_matches_top_n_computer(
+            limit in 0..20_usize,
+            mut docs in proptest::collection::vec((0..1000_u32, 0..1000_u32), 0..200_usize),
+        ) {
+            // Both require ascending doc order.
+            docs.sort_by_key(|(_, doc_id)| *doc_id);
+            docs.dedup_by_key(|(_, doc_id)| *doc_id);
+
+            let mut heap = TopNHeap::new(limit);
+            let mut computer: TopNComputer<Score, DocId, NaturalComparator> =
+                TopNComputer::new_with_comparator(limit, NaturalComparator);
+
+            for &(score_u32, doc) in &docs {
+                let score = score_u32 as Score;
+                heap.push(score, doc);
+                computer.push(score, doc);
+            }
+
+            let mut heap_results = heap.into_vec();
+            heap_results.sort_by(|a, b| {
+                b.0.partial_cmp(&a.0).unwrap().then_with(|| a.1.cmp(&b.1))
+            });
+
+            let computer_results: Vec<(Score, DocId)> = computer
+                .into_sorted_vec()
+                .into_iter()
+                .map(|cd| (cd.sort_key, cd.doc))
+                .collect();
+
+            prop_assert_eq!(heap_results, computer_results);
+        }
+    }
+}
--- a/src/collector/sort_key/sort_by_static_fast_value.rs
+++ b/src/collector/sort_key/sort_by_static_fast_value.rs
@@ -52,7 +52,7 @@ impl<T: FastValue> SortKeyComputer for SortByStaticFastValue<T> {
        if schema_type != T::to_type() {
            return Err(crate::TantivyError::SchemaError(format!(
                "Field `{}` is of type {schema_type:?}, not of the type {:?}.",
-                &self.field,
+                self.field,
                T::to_type()
            )));
        }
--- a/src/collector/top_score_collector.rs
+++ b/src/collector/top_score_collector.rs
@@ -513,7 +513,9 @@ pub struct TopNComputer<Score, D, C> {
    /// The buffer reverses sort order to get top-semantics instead of bottom-semantics
    buffer: Vec<ComparableDoc<Score, D>>,
    top_n: usize,
-    pub(crate) threshold: Option<Score>,
+    /// The current threshold for pruning. Documents with scores at or below
+    /// this value are skipped by `push()`. Updated when the buffer is truncated.
+    pub threshold: Option<Score>,
    comparator: C,
 }

--- a/src/core/json_utils.rs
+++ b/src/core/json_utils.rs
@@ -4,7 +4,7 @@ use common::{replace_in_place, JsonPathWriter};
 use rustc_hash::FxHashMap;

 use crate::indexer::indexing_term::IndexingTerm;
-use crate::postings::{IndexingContext, IndexingPosition, PostingsWriter};
+use crate::postings::{IndexingContext, IndexingPosition, PostingsWriter as _, PostingsWriterEnum};
 use crate::schema::document::{ReferenceValue, ReferenceValueLeaf, Value};
 use crate::schema::{Type, DATE_TIME_PRECISION_INDEXED};
 use crate::time::format_description::well_known::Rfc3339;
@@ -80,7 +80,7 @@ fn index_json_object<'a, V: Value<'a>>(
    text_analyzer: &mut TextAnalyzer,
    term_buffer: &mut IndexingTerm,
    json_path_writer: &mut JsonPathWriter,
-    postings_writer: &mut dyn PostingsWriter,
+    postings_writer: &mut PostingsWriterEnum,
    ctx: &mut IndexingContext,
    positions_per_path: &mut IndexingPositionsPerPath,
 ) {
@@ -110,7 +110,7 @@ pub(crate) fn index_json_value<'a, V: Value<'a>>(
    text_analyzer: &mut TextAnalyzer,
    term_buffer: &mut IndexingTerm,
    json_path_writer: &mut JsonPathWriter,
-    postings_writer: &mut dyn PostingsWriter,
+    postings_writer: &mut PostingsWriterEnum,
    ctx: &mut IndexingContext,
    positions_per_path: &mut IndexingPositionsPerPath,
 ) {
--- a/src/directory/composite_file.rs
+++ b/src/directory/composite_file.rs
@@ -167,6 +167,7 @@ impl CompositeFile {
            .map(|byte_range| self.data.slice(byte_range.clone()))
    }

+    /// Returns the space usage per field in this composite file.
    pub fn space_usage(&self, schema: &Schema) -> PerFieldSpaceUsage {
        let mut fields = Vec::new();
        for (&field_addr, byte_range) in &self.offsets_index {
--- a/src/directory/mod.rs
+++ b/src/directory/mod.rs
@@ -21,7 +21,7 @@ use std::path::PathBuf;
 pub use common::file_slice::{FileHandle, FileSlice};
 pub use common::{AntiCallToken, OwnedBytes, TerminatingWrite};

-pub(crate) use self::composite_file::{CompositeFile, CompositeWrite};
+pub use self::composite_file::{CompositeFile, CompositeWrite};
 pub use self::directory::{Directory, DirectoryClone, DirectoryLock};
 pub use self::directory_lock::{Lock, INDEX_WRITER_LOCK, META_LOCK};
 pub use self::ram_directory::RamDirectory;
@@ -52,7 +52,7 @@ pub use self::mmap_directory::MmapDirectory;
 ///
 /// `WritePtr` are required to implement both Write
 /// and Seek.
-pub type WritePtr = BufWriter<Box<dyn TerminatingWrite>>;
+pub type WritePtr = BufWriter<Box<dyn TerminatingWrite + Send + Sync>>;

 #[cfg(test)]
 mod tests;
--- a/src/docset.rs
+++ b/src/docset.rs
@@ -1,4 +1,8 @@
-use std::borrow::{Borrow, BorrowMut};
+use std::ops::{Deref as _, DerefMut as _};
+
+use common::BitSet;
+
+use common::TinySet;

 use crate::fastfield::AliveBitSet;
 use crate::DocId;
@@ -14,6 +18,12 @@ pub const TERMINATED: DocId = i32::MAX as u32;
 /// exactly this size as long as we can fill the buffer.
 pub const COLLECT_BLOCK_BUFFER_LEN: usize = 64;

+/// Number of `TinySet` (64-bit) buckets in a block used by [`DocSet::fill_bitset_block`].
+pub const BLOCK_NUM_TINYBITSETS: usize = 16;
+
+/// Number of doc IDs covered by one block: `BLOCK_NUM_TINYBITSETS * 64 = 1024`.
+pub const BLOCK_WINDOW: u32 = BLOCK_NUM_TINYBITSETS as u32 * 64;
+
 /// Represents an iterable set of sorted doc ids.
 pub trait DocSet: Send {
    /// Goes to the next element.
@@ -130,6 +140,19 @@ pub trait DocSet: Send {
        buffer.len()
    }

+    /// Fills the given bitset with the documents in the docset.
+    ///
+    /// If the docset max_doc is smaller than the largest doc, this function might not consume the
+    /// docset entirely.
+    fn fill_bitset(&mut self, bitset: &mut BitSet) {
+        let bitset_max_value: u32 = bitset.max_value();
+        let mut doc = self.doc();
+        while doc < bitset_max_value {
+            bitset.insert(doc);
+            doc = self.advance();
+        }
+    }
+
    /// Returns the current document
    /// Right after creating a new `DocSet`, the docset points to the first document.
    ///
@@ -160,6 +183,31 @@ pub trait DocSet: Send {
        self.size_hint() as u64
    }

+    /// Fills a bitmask representing which documents in `[min_doc, min_doc + BLOCK_WINDOW)` are
+    /// present in this docset.
+    ///
+    /// The window is divided into `BLOCK_NUM_TINYBITSETS` buckets of 64 docs each.
+    /// Returns the next doc `>= min_doc + BLOCK_WINDOW`, or `TERMINATED` if exhausted.
+    fn fill_bitset_block(
+        &mut self,
+        min_doc: DocId,
+        mask: &mut [TinySet; BLOCK_NUM_TINYBITSETS],
+    ) -> DocId {
+        self.seek(min_doc);
+        let horizon = min_doc + BLOCK_WINDOW;
+        loop {
+            let doc = self.doc();
+            if doc >= horizon {
+                return doc;
+            }
+            let delta = doc - min_doc;
+            mask[(delta / 64) as usize].insert_mut(delta % 64);
+            if self.advance() == TERMINATED {
+                return TERMINATED;
+            }
+        }
+    }
+
    /// Returns the number documents matching.
    /// Calling this method consumes the `DocSet`.
    fn count(&mut self, alive_bitset: &AliveBitSet) -> u32 {
@@ -214,6 +262,18 @@ impl DocSet for &mut dyn DocSet {
        (**self).seek_danger(target)
    }

+    fn fill_buffer(&mut self, buffer: &mut [DocId; COLLECT_BLOCK_BUFFER_LEN]) -> usize {
+        (**self).fill_buffer(buffer)
+    }
+
+    fn fill_bitset_block(
+        &mut self,
+        min_doc: DocId,
+        mask: &mut [TinySet; BLOCK_NUM_TINYBITSETS],
+    ) -> DocId {
+        (**self).fill_bitset_block(min_doc, mask)
+    }
+
    fn doc(&self) -> u32 {
        (**self).doc()
    }
@@ -233,51 +293,66 @@ impl DocSet for &mut dyn DocSet {
    fn count_including_deleted(&mut self) -> u32 {
        (**self).count_including_deleted()
    }
+
+    fn fill_bitset(&mut self, bitset: &mut BitSet) {
+        (**self).fill_bitset(bitset);
+    }
 }

 impl<TDocSet: DocSet + ?Sized> DocSet for Box<TDocSet> {
+    #[inline]
    fn advance(&mut self) -> DocId {
-        let unboxed: &mut TDocSet = self.borrow_mut();
-        unboxed.advance()
+        self.deref_mut().advance()
    }

+    #[inline]
    fn seek(&mut self, target: DocId) -> DocId {
-        let unboxed: &mut TDocSet = self.borrow_mut();
-        unboxed.seek(target)
+        self.deref_mut().seek(target)
    }

    fn seek_danger(&mut self, target: DocId) -> SeekDangerResult {
-        let unboxed: &mut TDocSet = self.borrow_mut();
-        unboxed.seek_danger(target)
+        self.deref_mut().seek_danger(target)
    }

+    #[inline]
    fn fill_buffer(&mut self, buffer: &mut [DocId; COLLECT_BLOCK_BUFFER_LEN]) -> usize {
-        let unboxed: &mut TDocSet = self.borrow_mut();
-        unboxed.fill_buffer(buffer)
+        self.deref_mut().fill_buffer(buffer)
    }

+    fn fill_bitset_block(
+        &mut self,
+        min_doc: DocId,
+        mask: &mut [TinySet; BLOCK_NUM_TINYBITSETS],
+    ) -> DocId {
+        let unboxed: &mut TDocSet = &mut **self;
+        unboxed.fill_bitset_block(min_doc, mask)
+    }
+
+    #[inline]
    fn doc(&self) -> DocId {
-        let unboxed: &TDocSet = self.borrow();
-        unboxed.doc()
+        self.deref().doc()
    }

+    #[inline]
    fn size_hint(&self) -> u32 {
-        let unboxed: &TDocSet = self.borrow();
-        unboxed.size_hint()
+        self.deref().size_hint()
    }

+    #[inline]
    fn cost(&self) -> u64 {
-        let unboxed: &TDocSet = self.borrow();
-        unboxed.cost()
+        self.deref().cost()
    }

+    #[inline]
    fn count(&mut self, alive_bitset: &AliveBitSet) -> u32 {
-        let unboxed: &mut TDocSet = self.borrow_mut();
-        unboxed.count(alive_bitset)
+        self.deref_mut().count(alive_bitset)
    }

    fn count_including_deleted(&mut self) -> u32 {
-        let unboxed: &mut TDocSet = self.borrow_mut();
-        unboxed.count_including_deleted()
+        self.deref_mut().count_including_deleted()
+    }
+
+    fn fill_bitset(&mut self, bitset: &mut BitSet) {
+        self.deref_mut().fill_bitset(bitset);
    }
 }
--- a/src/index/codec_configuration.rs
+++ b/src/index/codec_configuration.rs
@@ -0,0 +1,49 @@
+use std::borrow::Cow;
+
+use serde::{Deserialize, Serialize};
+
+use crate::codec::{Codec, StandardCodec};
+
+/// A Codec configuration is just a serializable object.
+#[derive(Serialize, Deserialize, Clone, Debug)]
+pub struct CodecConfiguration {
+    codec_id: Cow<'static, str>,
+    #[serde(default, skip_serializing_if = "serde_json::Value::is_null")]
+    props: serde_json::Value,
+}
+
+impl CodecConfiguration {
+    /// Returns true if the codec is the standard codec.
+    pub fn is_standard(&self) -> bool {
+        self.codec_id == StandardCodec::ID && self.props.is_null()
+    }
+
+    /// Creates a codec instance from the configuration.
+    ///
+    /// If the codec id does not match the code's name, an error is returned.
+    pub fn to_codec<C: Codec>(&self) -> crate::Result<C> {
+        if self.codec_id != C::ID {
+            return Err(crate::TantivyError::InvalidArgument(format!(
+                "Codec id mismatch: expected {}, got {}",
+                C::ID,
+                self.codec_id
+            )));
+        }
+        C::from_json_props(&self.props)
+    }
+}
+
+impl<'a, C: Codec> From<&'a C> for CodecConfiguration {
+    fn from(codec: &'a C) -> Self {
+        CodecConfiguration {
+            codec_id: Cow::Borrowed(C::ID),
+            props: codec.to_json_props(),
+        }
+    }
+}
+
+impl Default for CodecConfiguration {
+    fn default() -> Self {
+        CodecConfiguration::from(&StandardCodec)
+    }
+}
--- a/src/index/index.rs
+++ b/src/index/index.rs
@@ -8,12 +8,14 @@ use std::thread::available_parallelism;
 use super::segment::Segment;
 use super::segment_reader::merge_field_meta_data;
 use super::{FieldMetadata, IndexSettings};
+use crate::codec::StandardCodec;
 use crate::core::{Executor, META_FILEPATH};
 use crate::directory::error::OpenReadError;
 #[cfg(feature = "mmap")]
 use crate::directory::MmapDirectory;
 use crate::directory::{Directory, ManagedDirectory, RamDirectory, INDEX_WRITER_LOCK};
 use crate::error::{DataCorruption, TantivyError};
+use crate::index::codec_configuration::CodecConfiguration;
 use crate::index::{IndexMeta, SegmentId, SegmentMeta, SegmentMetaInventory};
 use crate::indexer::index_writer::{
    IndexWriterOptions, MAX_NUM_THREAD, MEMORY_BUDGET_NUM_BYTES_MIN,
@@ -59,6 +61,7 @@ fn save_new_metas(
    schema: Schema,
    index_settings: IndexSettings,
    directory: &dyn Directory,
+    codec: CodecConfiguration,
 ) -> crate::Result<()> {
    save_metas(
        &IndexMeta {
@@ -67,6 +70,7 @@ fn save_new_metas(
            schema,
            opstamp: 0u64,
            payload: None,
+            codec,
        },
        directory,
    )?;
@@ -101,18 +105,21 @@ fn save_new_metas(
 /// };
 /// let index = Index::builder().schema(schema).settings(settings).create_in_ram();
 /// ```
-pub struct IndexBuilder {
+pub struct IndexBuilder<Codec: crate::codec::Codec = StandardCodec> {
    schema: Option<Schema>,
    index_settings: IndexSettings,
    tokenizer_manager: TokenizerManager,
    fast_field_tokenizer_manager: TokenizerManager,
+    codec: Codec,
 }
-impl Default for IndexBuilder {
+
+impl Default for IndexBuilder<StandardCodec> {
    fn default() -> Self {
        IndexBuilder::new()
    }
 }
-impl IndexBuilder {
+
+impl IndexBuilder<StandardCodec> {
    /// Creates a new `IndexBuilder`
    pub fn new() -> Self {
        Self {
@@ -120,6 +127,21 @@ impl IndexBuilder {
            index_settings: IndexSettings::default(),
            tokenizer_manager: TokenizerManager::default(),
            fast_field_tokenizer_manager: TokenizerManager::default(),
+            codec: StandardCodec,
+        }
+    }
+}
+
+impl<Codec: crate::codec::Codec> IndexBuilder<Codec> {
+    /// Set the codec
+    #[must_use]
+    pub fn codec<NewCodec: crate::codec::Codec>(self, codec: NewCodec) -> IndexBuilder<NewCodec> {
+        IndexBuilder {
+            schema: self.schema,
+            index_settings: self.index_settings,
+            tokenizer_manager: self.tokenizer_manager,
+            fast_field_tokenizer_manager: self.fast_field_tokenizer_manager,
+            codec,
        }
    }

@@ -154,7 +176,7 @@ impl IndexBuilder {
    /// The index will be allocated in anonymous memory.
    /// This is useful for indexing small set of documents
    /// for instances like unit test or temporary in memory index.
-    pub fn create_in_ram(self) -> Result<Index, TantivyError> {
+    pub fn create_in_ram(self) -> Result<Index<Codec>, TantivyError> {
        let ram_directory = RamDirectory::create();
        self.create(ram_directory)
    }
@@ -165,7 +187,7 @@ impl IndexBuilder {
    /// If a previous index was in this directory, it returns an
    /// [`TantivyError::IndexAlreadyExists`] error.
    #[cfg(feature = "mmap")]
-    pub fn create_in_dir<P: AsRef<Path>>(self, directory_path: P) -> crate::Result<Index> {
+    pub fn create_in_dir<P: AsRef<Path>>(self, directory_path: P) -> crate::Result<Index<Codec>> {
        let mmap_directory: Box<dyn Directory> = Box::new(MmapDirectory::open(directory_path)?);
        if Index::exists(&*mmap_directory)? {
            return Err(TantivyError::IndexAlreadyExists);
@@ -186,7 +208,7 @@ impl IndexBuilder {
        self,
        dir: impl Into<Box<dyn Directory>>,
        mem_budget: usize,
-    ) -> crate::Result<SingleSegmentIndexWriter<D>> {
+    ) -> crate::Result<SingleSegmentIndexWriter<Codec, D>> {
        let index = self.create(dir)?;
        let index_simple_writer = SingleSegmentIndexWriter::new(index, mem_budget)?;
        Ok(index_simple_writer)
@@ -202,7 +224,7 @@ impl IndexBuilder {
    /// For other unit tests, prefer the [`RamDirectory`], see:
    /// [`IndexBuilder::create_in_ram()`].
    #[cfg(feature = "mmap")]
-    pub fn create_from_tempdir(self) -> crate::Result<Index> {
+    pub fn create_from_tempdir(self) -> crate::Result<Index<Codec>> {
        let mmap_directory: Box<dyn Directory> = Box::new(MmapDirectory::create_from_tempdir()?);
        self.create(mmap_directory)
    }
@@ -215,12 +237,15 @@ impl IndexBuilder {
    }

    /// Opens or creates a new index in the provided directory
-    pub fn open_or_create<T: Into<Box<dyn Directory>>>(self, dir: T) -> crate::Result<Index> {
+    pub fn open_or_create<T: Into<Box<dyn Directory>>>(
+        self,
+        dir: T,
+    ) -> crate::Result<Index<Codec>> {
        let dir: Box<dyn Directory> = dir.into();
        if !Index::exists(&*dir)? {
            return self.create(dir);
        }
-        let mut index = Index::open(dir)?;
+        let mut index: Index<Codec> = Index::<Codec>::open_with_codec(dir)?;
        index.set_tokenizers(self.tokenizer_manager.clone());
        if index.schema() == self.get_expect_schema()? {
            Ok(index)
@@ -244,18 +269,25 @@ impl IndexBuilder {
    /// Creates a new index given an implementation of the trait `Directory`.
    ///
    /// If a directory previously existed, it will be erased.
-    fn create<T: Into<Box<dyn Directory>>>(self, dir: T) -> crate::Result<Index> {
+    pub fn create<T: Into<Box<dyn Directory>>>(self, dir: T) -> crate::Result<Index<Codec>> {
+        self.create_avoid_monomorphization(dir.into())
+    }
+
+    fn create_avoid_monomorphization(self, dir: Box<dyn Directory>) -> crate::Result<Index<Codec>> {
        self.validate()?;
-        let dir = dir.into();
        let directory = ManagedDirectory::wrap(dir)?;
+        let codec: CodecConfiguration = CodecConfiguration::from(&self.codec);
        save_new_metas(
            self.get_expect_schema()?,
            self.index_settings.clone(),
            &directory,
+            codec,
        )?;
-        let mut metas = IndexMeta::with_schema(self.get_expect_schema()?);
+        let schema = self.get_expect_schema()?;
+        let mut metas = IndexMeta::with_schema_and_codec(schema, &self.codec);
        metas.index_settings = self.index_settings;
-        let mut index = Index::open_from_metas(directory, &metas, SegmentMetaInventory::default());
+        let mut index: Index<Codec> =
+            Index::<Codec>::open_from_metas(directory, &metas, SegmentMetaInventory::default())?;
        index.set_tokenizers(self.tokenizer_manager);
        index.set_fast_field_tokenizers(self.fast_field_tokenizer_manager);
        Ok(index)
@@ -264,7 +296,7 @@ impl IndexBuilder {

 /// Search Index
 #[derive(Clone)]
-pub struct Index {
+pub struct Index<Codec: crate::codec::Codec = crate::codec::StandardCodec> {
    directory: ManagedDirectory,
    schema: Schema,
    settings: IndexSettings,
@@ -272,6 +304,7 @@ pub struct Index {
    tokenizers: TokenizerManager,
    fast_field_tokenizers: TokenizerManager,
    inventory: SegmentMetaInventory,
+    codec: Codec,
 }

 impl Index {
@@ -279,41 +312,6 @@ impl Index {
    pub fn builder() -> IndexBuilder {
        IndexBuilder::new()
    }
-    /// Examines the directory to see if it contains an index.
-    ///
-    /// Effectively, it only checks for the presence of the `meta.json` file.
-    pub fn exists(dir: &dyn Directory) -> Result<bool, OpenReadError> {
-        dir.exists(&META_FILEPATH)
-    }
-
-    /// Accessor to the search executor.
-    ///
-    /// This pool is used by default when calling `searcher.search(...)`
-    /// to perform search on the individual segments.
-    ///
-    /// By default the executor is single thread, and simply runs in the calling thread.
-    pub fn search_executor(&self) -> &Executor {
-        &self.executor
-    }
-
-    /// Replace the default single thread search executor pool
-    /// by a thread pool with a given number of threads.
-    pub fn set_multithread_executor(&mut self, num_threads: usize) -> crate::Result<()> {
-        self.executor = Executor::multi_thread(num_threads, "tantivy-search-")?;
-        Ok(())
-    }
-
-    /// Custom thread pool by a outer thread pool.
-    pub fn set_executor(&mut self, executor: Executor) {
-        self.executor = executor;
-    }
-
-    /// Replace the default single thread search executor pool
-    /// by a thread pool with as many threads as there are CPUs on the system.
-    pub fn set_default_multithread_executor(&mut self) -> crate::Result<()> {
-        let default_num_threads = available_parallelism()?.get();
-        self.set_multithread_executor(default_num_threads)
-    }

    /// Creates a new index using the [`RamDirectory`].
    ///
@@ -324,6 +322,13 @@ impl Index {
        IndexBuilder::new().schema(schema).create_in_ram().unwrap()
    }

+    /// Examines the directory to see if it contains an index.
+    ///
+    /// Effectively, it only checks for the presence of the `meta.json` file.
+    pub fn exists(directory: &dyn Directory) -> Result<bool, OpenReadError> {
+        directory.exists(&META_FILEPATH)
+    }
+
    /// Creates a new index in a given filepath.
    /// The index will use the [`MmapDirectory`].
    ///
@@ -370,20 +375,108 @@ impl Index {
        schema: Schema,
        settings: IndexSettings,
    ) -> crate::Result<Index> {
-        let dir: Box<dyn Directory> = dir.into();
+        Self::create_to_avoid_monomorphization(dir.into(), schema, settings)
+    }
+
+    fn create_to_avoid_monomorphization(
+        dir: Box<dyn Directory>,
+        schema: Schema,
+        settings: IndexSettings,
+    ) -> crate::Result<Index> {
        let mut builder = IndexBuilder::new().schema(schema);
        builder = builder.settings(settings);
        builder.create(dir)
    }

+    /// Opens a new directory from an index path.
+    #[cfg(feature = "mmap")]
+    pub fn open_in_dir<P: AsRef<Path>>(directory_path: P) -> crate::Result<Index> {
+        Self::open_in_dir_to_avoid_monomorphization(directory_path.as_ref())
+    }
+
+    #[cfg(feature = "mmap")]
+    #[inline(never)]
+    fn open_in_dir_to_avoid_monomorphization(directory_path: &Path) -> crate::Result<Index> {
+        let mmap_directory = MmapDirectory::open(directory_path)?;
+        Index::open(mmap_directory)
+    }
+
+    /// Open the index using the provided directory
+    pub fn open<T: Into<Box<dyn Directory>>>(directory: T) -> crate::Result<Index> {
+        Index::<StandardCodec>::open_with_codec(directory.into())
+    }
+}
+
+impl<Codec: crate::codec::Codec> Index<Codec> {
+    /// Returns a version of this index with the standard codec.
+    /// This is useful when you need to pass the index to APIs that
+    /// don't care about the codec (e.g., for reading).
+    pub(crate) fn with_standard_codec(&self) -> Index<StandardCodec> {
+        Index {
+            directory: self.directory.clone(),
+            schema: self.schema.clone(),
+            settings: self.settings.clone(),
+            executor: self.executor.clone(),
+            tokenizers: self.tokenizers.clone(),
+            fast_field_tokenizers: self.fast_field_tokenizers.clone(),
+            inventory: self.inventory.clone(),
+            codec: StandardCodec,
+        }
+    }
+
+    /// Open the index using the provided directory
+    #[inline(never)]
+    pub fn open_with_codec(directory: Box<dyn Directory>) -> crate::Result<Index<Codec>> {
+        let directory = ManagedDirectory::wrap(directory)?;
+        let inventory = SegmentMetaInventory::default();
+        let metas = load_metas(&directory, &inventory)?;
+        let index: Index<Codec> = Index::<Codec>::open_from_metas(directory, &metas, inventory)?;
+        Ok(index)
+    }
+
+    /// Accessor to the codec.
+    pub fn codec(&self) -> &Codec {
+        &self.codec
+    }
+
+    /// Accessor to the search executor.
+    ///
+    /// This pool is used by default when calling `searcher.search(...)`
+    /// to perform search on the individual segments.
+    ///
+    /// By default the executor is single thread, and simply runs in the calling thread.
+    pub fn search_executor(&self) -> &Executor {
+        &self.executor
+    }
+
+    /// Replace the default single thread search executor pool
+    /// by a thread pool with a given number of threads.
+    pub fn set_multithread_executor(&mut self, num_threads: usize) -> crate::Result<()> {
+        self.executor = Executor::multi_thread(num_threads, "tantivy-search-")?;
+        Ok(())
+    }
+
+    /// Custom thread pool by a outer thread pool.
+    pub fn set_executor(&mut self, executor: Executor) {
+        self.executor = executor;
+    }
+
+    /// Replace the default single thread search executor pool
+    /// by a thread pool with as many threads as there are CPUs on the system.
+    pub fn set_default_multithread_executor(&mut self) -> crate::Result<()> {
+        let default_num_threads = available_parallelism()?.get();
+        self.set_multithread_executor(default_num_threads)
+    }
+
    /// Creates a new index given a directory and an [`IndexMeta`].
-    fn open_from_metas(
+    fn open_from_metas<C: crate::codec::Codec>(
        directory: ManagedDirectory,
        metas: &IndexMeta,
        inventory: SegmentMetaInventory,
-    ) -> Index {
+    ) -> crate::Result<Index<C>> {
        let schema = metas.schema.clone();
-        Index {
+        let codec = metas.codec.to_codec::<C>()?;
+        Ok(Index {
            settings: metas.index_settings.clone(),
            directory,
            schema,
@@ -391,7 +484,8 @@ impl Index {
            fast_field_tokenizers: TokenizerManager::default(),
            executor: Executor::single_thread(),
            inventory,
-        }
+            codec,
+        })
    }

    /// Setter for the tokenizer manager.
@@ -447,7 +541,7 @@ impl Index {
    /// Create a default [`IndexReader`] for the given index.
    ///
    /// See [`Index.reader_builder()`].
-    pub fn reader(&self) -> crate::Result<IndexReader> {
+    pub fn reader(&self) -> crate::Result<IndexReader<Codec>> {
        self.reader_builder().try_into()
    }

@@ -455,17 +549,10 @@ impl Index {
    ///
    /// Most project should create at most one reader for a given index.
    /// This method is typically called only once per `Index` instance.
-    pub fn reader_builder(&self) -> IndexReaderBuilder {
+    pub fn reader_builder(&self) -> IndexReaderBuilder<Codec> {
        IndexReaderBuilder::new(self.clone())
    }

-    /// Opens a new directory from an index path.
-    #[cfg(feature = "mmap")]
-    pub fn open_in_dir<P: AsRef<Path>>(directory_path: P) -> crate::Result<Index> {
-        let mmap_directory = MmapDirectory::open(directory_path)?;
-        Index::open(mmap_directory)
-    }
-
    /// Returns the list of the segment metas tracked by the index.
    ///
    /// Such segments can of course be part of the index,
@@ -506,16 +593,6 @@ impl Index {
        self.inventory.new_segment_meta(segment_id, max_doc)
    }

-    /// Open the index using the provided directory
-    pub fn open<T: Into<Box<dyn Directory>>>(directory: T) -> crate::Result<Index> {
-        let directory = directory.into();
-        let directory = ManagedDirectory::wrap(directory)?;
-        let inventory = SegmentMetaInventory::default();
-        let metas = load_metas(&directory, &inventory)?;
-        let index = Index::open_from_metas(directory, &metas, inventory);
-        Ok(index)
-    }
-
    /// Reads the index meta file from the directory.
    pub fn load_metas(&self) -> crate::Result<IndexMeta> {
        load_metas(self.directory(), &self.inventory)
@@ -539,7 +616,7 @@ impl Index {
    pub fn writer_with_options<D: Document>(
        &self,
        options: IndexWriterOptions,
-    ) -> crate::Result<IndexWriter<D>> {
+    ) -> crate::Result<IndexWriter<Codec, D>> {
        let directory_lock = self
            .directory
            .acquire_lock(&INDEX_WRITER_LOCK)
@@ -581,7 +658,7 @@ impl Index {
        &self,
        num_threads: usize,
        overall_memory_budget_in_bytes: usize,
-    ) -> crate::Result<IndexWriter<D>> {
+    ) -> crate::Result<IndexWriter<Codec, D>> {
        let memory_arena_in_bytes_per_thread = overall_memory_budget_in_bytes / num_threads;
        let options = IndexWriterOptions::builder()
            .num_worker_threads(num_threads)
@@ -595,7 +672,7 @@ impl Index {
    /// That index writer only simply has a single thread and a memory budget of 15 MB.
    /// Using a single thread gives us a deterministic allocation of DocId.
    #[cfg(test)]
-    pub fn writer_for_tests<D: Document>(&self) -> crate::Result<IndexWriter<D>> {
+    pub fn writer_for_tests<D: Document>(&self) -> crate::Result<IndexWriter<Codec, D>> {
        self.writer_with_num_threads(1, MEMORY_BUDGET_NUM_BYTES_MIN)
    }

@@ -613,7 +690,7 @@ impl Index {
    pub fn writer<D: Document>(
        &self,
        memory_budget_in_bytes: usize,
-    ) -> crate::Result<IndexWriter<D>> {
+    ) -> crate::Result<IndexWriter<Codec, D>> {
        let mut num_threads = std::cmp::min(available_parallelism()?.get(), MAX_NUM_THREAD);
        let memory_budget_num_bytes_per_thread = memory_budget_in_bytes / num_threads;
        if memory_budget_num_bytes_per_thread < MEMORY_BUDGET_NUM_BYTES_MIN {
@@ -640,7 +717,7 @@ impl Index {
    }

    /// Returns the list of segments that are searchable
-    pub fn searchable_segments(&self) -> crate::Result<Vec<Segment>> {
+    pub fn searchable_segments(&self) -> crate::Result<Vec<Segment<Codec>>> {
        Ok(self
            .searchable_segment_metas()?
            .into_iter()
@@ -649,12 +726,12 @@ impl Index {
    }

    #[doc(hidden)]
-    pub fn segment(&self, segment_meta: SegmentMeta) -> Segment {
+    pub fn segment(&self, segment_meta: SegmentMeta) -> Segment<Codec> {
        Segment::for_index(self.clone(), segment_meta)
    }

    /// Creates a new segment.
-    pub fn new_segment(&self) -> Segment {
+    pub fn new_segment(&self) -> Segment<Codec> {
        let segment_meta = self
            .inventory
            .new_segment_meta(SegmentId::generate_random(), 0);
@@ -708,7 +785,7 @@ impl Index {
 }

 impl fmt::Debug for Index {
-    fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
+    fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
        write!(f, "Index({:?})", self.directory)
    }
 }
--- a/src/index/index_meta.rs
+++ b/src/index/index_meta.rs
@@ -5,7 +5,8 @@ use std::path::PathBuf;
 use serde::{Deserialize, Serialize};

 use super::SegmentComponent;
-use crate::index::SegmentId;
+use crate::codec::Codec;
+use crate::index::{CodecConfiguration, SegmentId};
 use crate::schema::Schema;
 use crate::store::Compressor;
 use crate::{Inventory, Opstamp, TrackedObject};
@@ -286,8 +287,10 @@ pub struct IndexMeta {
    /// This payload is entirely unused by tantivy.
    #[serde(skip_serializing_if = "Option::is_none")]
    pub payload: Option<String>,
+    /// Codec configuration for the index.
+    #[serde(skip_serializing_if = "CodecConfiguration::is_standard")]
+    pub codec: CodecConfiguration,
 }
-
 #[derive(Deserialize, Debug)]
 struct UntrackedIndexMeta {
    pub segments: Vec<InnerSegmentMeta>,
@@ -297,6 +300,8 @@ struct UntrackedIndexMeta {
    pub opstamp: Opstamp,
    #[serde(skip_serializing_if = "Option::is_none")]
    pub payload: Option<String>,
+    #[serde(default)]
+    pub codec: CodecConfiguration,
 }

 impl UntrackedIndexMeta {
@@ -311,6 +316,7 @@ impl UntrackedIndexMeta {
            schema: self.schema,
            opstamp: self.opstamp,
            payload: self.payload,
+            codec: self.codec,
        }
    }
 }
@@ -321,13 +327,14 @@ impl IndexMeta {
    ///
    /// This new index does not contains any segments.
    /// Opstamp will the value `0u64`.
-    pub fn with_schema(schema: Schema) -> IndexMeta {
+    pub fn with_schema_and_codec<C: Codec>(schema: Schema, codec: &C) -> IndexMeta {
        IndexMeta {
            index_settings: IndexSettings::default(),
            segments: vec![],
            schema,
            opstamp: 0u64,
            payload: None,
+            codec: CodecConfiguration::from(codec),
        }
    }

@@ -378,14 +385,38 @@ mod tests {
            schema,
            opstamp: 0u64,
            payload: None,
+            codec: Default::default(),
        };
-        let json = serde_json::ser::to_string(&index_metas).expect("serialization failed");
+        let json_value: serde_json::Value =
+            serde_json::to_value(&index_metas).expect("serialization failed");
        assert_eq!(
-            json,
-            r#"{"index_settings":{"docstore_compression":"none","docstore_blocksize":16384},"segments":[],"schema":[{"name":"text","type":"text","options":{"indexing":{"record":"position","fieldnorms":true,"tokenizer":"default"},"stored":false,"fast":false}}],"opstamp":0}"#
+            &json_value,
+            &serde_json::json!(
+            {
+              "index_settings": {
+                "docstore_compression": "none",
+                "docstore_blocksize": 16384
+              },
+              "segments": [],
+              "schema": [
+                {
+                  "name": "text",
+                  "type": "text",
+                  "options": {
+                    "indexing": {
+                      "record": "position",
+                      "fieldnorms": true,
+                      "tokenizer": "default"
+                    },
+                    "stored": false,
+                    "fast": false
+                  }
+                }
+              ],
+              "opstamp": 0
+            })
        );
-
-        let deser_meta: UntrackedIndexMeta = serde_json::from_str(&json).unwrap();
+        let deser_meta: UntrackedIndexMeta = serde_json::from_value(json_value).unwrap();
        assert_eq!(index_metas.index_settings, deser_meta.index_settings);
        assert_eq!(index_metas.schema, deser_meta.schema);
        assert_eq!(index_metas.opstamp, deser_meta.opstamp);
@@ -411,14 +442,39 @@ mod tests {
            schema,
            opstamp: 0u64,
            payload: None,
+            codec: Default::default(),
        };
-        let json = serde_json::ser::to_string(&index_metas).expect("serialization failed");
+        let json_value = serde_json::to_value(&index_metas).expect("serialization failed");
        assert_eq!(
-            json,
-            r#"{"index_settings":{"docstore_compression":"zstd(compression_level=4)","docstore_blocksize":1000000},"segments":[],"schema":[{"name":"text","type":"text","options":{"indexing":{"record":"position","fieldnorms":true,"tokenizer":"default"},"stored":false,"fast":false}}],"opstamp":0}"#
+            &json_value,
+            &serde_json::json!(
+                {
+                  "index_settings": {
+                    "docstore_compression": "zstd(compression_level=4)",
+                    "docstore_blocksize": 1000000
+                  },
+                  "segments": [],
+                  "schema": [
+                    {
+                      "name": "text",
+                      "type": "text",
+                      "options": {
+                        "indexing": {
+                          "record": "position",
+                          "fieldnorms": true,
+                          "tokenizer": "default"
+                        },
+                        "stored": false,
+                        "fast": false
+                      }
+                    }
+                  ],
+                  "opstamp": 0
+                }
+            )
        );

-        let deser_meta: UntrackedIndexMeta = serde_json::from_str(&json).unwrap();
+        let deser_meta: UntrackedIndexMeta = serde_json::from_value(json_value).unwrap();
        assert_eq!(index_metas.index_settings, deser_meta.index_settings);
        assert_eq!(index_metas.schema, deser_meta.schema);
        assert_eq!(index_metas.opstamp, deser_meta.opstamp);
--- a/src/index/inverted_index_reader.rs
+++ b/src/index/inverted_index_reader.rs
@@ -1,7 +1,8 @@
 use std::io;
+use std::sync::Arc;

 use common::json_path_writer::JSON_END_OF_PATH;
-use common::{BinarySerializable, ByteCount};
+use common::{BinarySerializable, ByteCount, OwnedBytes};
 #[cfg(feature = "quickwit")]
 use futures_util::{FutureExt, StreamExt, TryStreamExt};
 #[cfg(feature = "quickwit")]
@@ -9,9 +10,13 @@ use itertools::Itertools;
 #[cfg(feature = "quickwit")]
 use tantivy_fst::automaton::{AlwaysMatch, Automaton};

+use crate::codec::postings::PostingsCodec;
+use crate::codec::{Codec, ObjectSafeCodec, StandardCodec};
 use crate::directory::FileSlice;
-use crate::positions::PositionReader;
-use crate::postings::{BlockSegmentPostings, SegmentPostings, TermInfo};
+use crate::fieldnorm::FieldNormReader;
+use crate::postings::{Postings, TermInfo};
+use crate::query::term_query::TermScorer;
+use crate::query::{Bm25Weight, PhraseScorer, Scorer};
 use crate::schema::{IndexRecordOption, Term, Type};
 use crate::termdict::TermDictionary;

@@ -33,6 +38,7 @@ pub struct InvertedIndexReader {
    positions_file_slice: FileSlice,
    record_option: IndexRecordOption,
    total_num_tokens: u64,
+    codec: Arc<dyn ObjectSafeCodec>,
 }

 /// Object that records the amount of space used by a field in an inverted index.
@@ -68,6 +74,7 @@ impl InvertedIndexReader {
        postings_file_slice: FileSlice,
        positions_file_slice: FileSlice,
        record_option: IndexRecordOption,
+        codec: Arc<dyn ObjectSafeCodec>,
    ) -> io::Result<InvertedIndexReader> {
        let (total_num_tokens_slice, postings_body) = postings_file_slice.split(8);
        let total_num_tokens = u64::deserialize(&mut total_num_tokens_slice.read_bytes()?)?;
@@ -77,6 +84,7 @@ impl InvertedIndexReader {
            positions_file_slice,
            record_option,
            total_num_tokens,
+            codec,
        })
    }

@@ -89,6 +97,7 @@ impl InvertedIndexReader {
            positions_file_slice: FileSlice::empty(),
            record_option,
            total_num_tokens: 0u64,
+            codec: Arc::new(StandardCodec),
        }
    }

@@ -160,61 +169,98 @@ impl InvertedIndexReader {
        Ok(fields)
    }

-    /// Resets the block segment to another position of the postings
-    /// file.
-    ///
-    /// This is useful for enumerating through a list of terms,
-    /// and consuming the associated posting lists while avoiding
-    /// reallocating a [`BlockSegmentPostings`].
-    ///
-    /// # Warning
-    ///
-    /// This does not reset the positions list.
-    pub fn reset_block_postings_from_terminfo(
+    pub(crate) fn new_term_scorer_specialized<C: Codec>(
        &self,
        term_info: &TermInfo,
-        block_postings: &mut BlockSegmentPostings,
-    ) -> io::Result<()> {
-        let postings_slice = self
-            .postings_file_slice
-            .slice(term_info.postings_range.clone());
-        let postings_bytes = postings_slice.read_bytes()?;
-        block_postings.reset(term_info.doc_freq, postings_bytes)?;
-        Ok(())
-    }
-
-    /// Returns a block postings given a `Term`.
-    /// This method is for an advanced usage only.
-    ///
-    /// Most users should prefer using [`Self::read_postings()`] instead.
-    pub fn read_block_postings(
-        &self,
-        term: &Term,
        option: IndexRecordOption,
-    ) -> io::Result<Option<BlockSegmentPostings>> {
-        self.get_term_info(term)?
-            .map(move |term_info| self.read_block_postings_from_terminfo(&term_info, option))
-            .transpose()
+        fieldnorm_reader: FieldNormReader,
+        similarity_weight: Bm25Weight,
+        codec: &C,
+    ) -> io::Result<TermScorer<<<C as Codec>::PostingsCodec as PostingsCodec>::Postings>> {
+        let postings = self.read_postings_from_terminfo_specialized(term_info, option, codec)?;
+        let term_scorer = TermScorer::new(postings, fieldnorm_reader, similarity_weight);
+        Ok(term_scorer)
    }

-    /// Returns a block postings given a `term_info`.
-    /// This method is for an advanced usage only.
-    ///
-    /// Most users should prefer using [`Self::read_postings()`] instead.
-    pub fn read_block_postings_from_terminfo(
+    pub(crate) fn new_phrase_scorer_type_specialized<C: Codec>(
+        &self,
+        term_infos: &[(usize, TermInfo)],
+        similarity_weight_opt: Option<Bm25Weight>,
+        fieldnorm_reader: FieldNormReader,
+        slop: u32,
+        codec: &C,
+    ) -> io::Result<PhraseScorer<<<C as Codec>::PostingsCodec as PostingsCodec>::Postings>> {
+        let mut offset_and_term_postings: Vec<(
+            usize,
+            <<C as Codec>::PostingsCodec as PostingsCodec>::Postings,
+        )> = Vec::with_capacity(term_infos.len());
+        for (offset, term_info) in term_infos {
+            let postings = self.read_postings_from_terminfo_specialized(
+                term_info,
+                IndexRecordOption::WithFreqsAndPositions,
+                codec,
+            )?;
+            offset_and_term_postings.push((*offset, postings));
+        }
+        let phrase_scorer = PhraseScorer::new(
+            offset_and_term_postings,
+            similarity_weight_opt,
+            fieldnorm_reader,
+            slop,
+        );
+        Ok(phrase_scorer)
+    }
+
+    /// Build a new term scorer.
+    pub fn new_term_scorer(
        &self,
        term_info: &TermInfo,
-        requested_option: IndexRecordOption,
-    ) -> io::Result<BlockSegmentPostings> {
+        option: IndexRecordOption,
+        fieldnorm_reader: FieldNormReader,
+        similarity_weight: Bm25Weight,
+    ) -> io::Result<Box<dyn Scorer>> {
+        let term_scorer = self.codec.load_term_scorer_type_erased(
+            term_info,
+            option,
+            self,
+            fieldnorm_reader,
+            similarity_weight,
+        )?;
+        Ok(term_scorer)
+    }
+
+    /// Returns a postings object specific with a concrete type.
+    ///
+    /// This requires you to provied the actual codec.
+    pub fn read_postings_from_terminfo_specialized<C: Codec>(
+        &self,
+        term_info: &TermInfo,
+        option: IndexRecordOption,
+        codec: &C,
+    ) -> io::Result<<<C as Codec>::PostingsCodec as PostingsCodec>::Postings> {
+        let option = option.downgrade(self.record_option);
        let postings_data = self
            .postings_file_slice
-            .slice(term_info.postings_range.clone());
-        BlockSegmentPostings::open(
-            term_info.doc_freq,
-            postings_data,
-            self.record_option,
-            requested_option,
-        )
+            .slice(term_info.postings_range.clone())
+            .read_bytes()?;
+        let positions_data: Option<OwnedBytes> = if option.has_positions() {
+            let positions_data = self
+                .positions_file_slice
+                .slice(term_info.positions_range.clone())
+                .read_bytes()?;
+            Some(positions_data)
+        } else {
+            None
+        };
+        let postings: <<C as Codec>::PostingsCodec as PostingsCodec>::Postings =
+            codec.postings_codec().load_postings(
+                term_info.doc_freq,
+                postings_data,
+                self.record_option,
+                option,
+                positions_data,
+            )?;
+        Ok(postings)
    }

    /// Returns a posting object given a `term_info`.
@@ -225,25 +271,9 @@ impl InvertedIndexReader {
        &self,
        term_info: &TermInfo,
        option: IndexRecordOption,
-    ) -> io::Result<SegmentPostings> {
-        let option = option.downgrade(self.record_option);
-
-        let block_postings = self.read_block_postings_from_terminfo(term_info, option)?;
-        let position_reader = {
-            if option.has_positions() {
-                let positions_data = self
-                    .positions_file_slice
-                    .read_bytes_slice(term_info.positions_range.clone())?;
-                let position_reader = PositionReader::open(positions_data)?;
-                Some(position_reader)
-            } else {
-                None
-            }
-        };
-        Ok(SegmentPostings::from_block_postings(
-            block_postings,
-            position_reader,
-        ))
+    ) -> io::Result<Box<dyn Postings>> {
+        self.codec
+            .load_postings_type_erased(term_info, option, self)
    }

    /// Returns the total number of tokens recorded for all documents
@@ -266,7 +296,7 @@ impl InvertedIndexReader {
        &self,
        term: &Term,
        option: IndexRecordOption,
-    ) -> io::Result<Option<SegmentPostings>> {
+    ) -> io::Result<Option<Box<dyn Postings>>> {
        self.get_term_info(term)?
            .map(move |term_info| self.read_postings_from_terminfo(&term_info, option))
            .transpose()
--- a/src/index/mod.rs
+++ b/src/index/mod.rs
@@ -2,6 +2,7 @@
 //!
 //! It contains `Index` and `Segment`, where a `Index` consists of one or more `Segment`s.

+mod codec_configuration;
 mod index;
 mod index_meta;
 mod inverted_index_reader;
@@ -10,6 +11,7 @@ mod segment_component;
 mod segment_id;
 mod segment_reader;

+pub use self::codec_configuration::CodecConfiguration;
 pub use self::index::{Index, IndexBuilder};
 pub(crate) use self::index_meta::SegmentMetaInventory;
 pub use self::index_meta::{IndexMeta, IndexSettings, Order, SegmentMeta};
--- a/src/index/segment.rs
+++ b/src/index/segment.rs
@@ -2,6 +2,7 @@ use std::fmt;
 use std::path::PathBuf;

 use super::SegmentComponent;
+use crate::codec::StandardCodec;
 use crate::directory::error::{OpenReadError, OpenWriteError};
 use crate::directory::{Directory, FileSlice, WritePtr};
 use crate::index::{Index, SegmentId, SegmentMeta};
@@ -10,25 +11,25 @@ use crate::Opstamp;

 /// A segment is a piece of the index.
 #[derive(Clone)]
-pub struct Segment {
-    index: Index,
+pub struct Segment<C: crate::codec::Codec = StandardCodec> {
+    index: Index<C>,
    meta: SegmentMeta,
 }

-impl fmt::Debug for Segment {
-    fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
+impl<C: crate::codec::Codec> fmt::Debug for Segment<C> {
+    fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
        write!(f, "Segment({:?})", self.id().uuid_string())
    }
 }

-impl Segment {
+impl<C: crate::codec::Codec> Segment<C> {
    /// Creates a new segment given an `Index` and a `SegmentId`
-    pub(crate) fn for_index(index: Index, meta: SegmentMeta) -> Segment {
+    pub(crate) fn for_index(index: Index<C>, meta: SegmentMeta) -> Segment<C> {
        Segment { index, meta }
    }

    /// Returns the index the segment belongs to.
-    pub fn index(&self) -> &Index {
+    pub fn index(&self) -> &Index<C> {
        &self.index
    }

@@ -46,7 +47,7 @@ impl Segment {
    ///
    /// This method is only used when updating `max_doc` from 0
    /// as we finalize a fresh new segment.
-    pub fn with_max_doc(self, max_doc: u32) -> Segment {
+    pub fn with_max_doc(self, max_doc: u32) -> Segment<C> {
        Segment {
            index: self.index,
            meta: self.meta.with_max_doc(max_doc),
@@ -55,7 +56,7 @@ impl Segment {

    #[doc(hidden)]
    #[must_use]
-    pub fn with_delete_meta(self, num_deleted_docs: u32, opstamp: Opstamp) -> Segment {
+    pub fn with_delete_meta(self, num_deleted_docs: u32, opstamp: Opstamp) -> Segment<C> {
        Segment {
            index: self.index,
            meta: self.meta.with_delete_meta(num_deleted_docs, opstamp),
--- a/src/index/segment_reader.rs
+++ b/src/index/segment_reader.rs
@@ -6,6 +6,8 @@ use common::{ByteCount, HasLen};
 use fnv::FnvHashMap;
 use itertools::Itertools;

+use crate::codec::ObjectSafeCodec;
+use crate::directory::error::OpenReadError;
 use crate::directory::{CompositeFile, FileSlice};
 use crate::error::DataCorruption;
 use crate::fastfield::{intersect_alive_bitsets, AliveBitSet, FacetReader, FastFieldReaders};
@@ -47,6 +49,7 @@ pub struct SegmentReader {
    store_file: FileSlice,
    alive_bitset_opt: Option<AliveBitSet>,
    schema: Schema,
+    codec: Arc<dyn ObjectSafeCodec>,
 }

 impl SegmentReader {
@@ -67,6 +70,11 @@ impl SegmentReader {
        &self.schema
    }

+    /// Returns the index codec.
+    pub fn codec(&self) -> &dyn ObjectSafeCodec {
+        &*self.codec
+    }
+
    /// Return the number of documents that have been
    /// deleted in the segment.
    pub fn num_deleted_docs(&self) -> DocId {
@@ -140,15 +148,16 @@ impl SegmentReader {
    }

    /// Open a new segment for reading.
-    pub fn open(segment: &Segment) -> crate::Result<SegmentReader> {
+    pub fn open<C: crate::codec::Codec>(segment: &Segment<C>) -> crate::Result<SegmentReader> {
        Self::open_with_custom_alive_set(segment, None)
    }

    /// Open a new segment for reading.
-    pub fn open_with_custom_alive_set(
-        segment: &Segment,
+    pub fn open_with_custom_alive_set<C: crate::codec::Codec>(
+        segment: &Segment<C>,
        custom_bitset: Option<AliveBitSet>,
    ) -> crate::Result<SegmentReader> {
+        let codec: Arc<dyn ObjectSafeCodec> = Arc::new(segment.index().codec().clone());
        let termdict_file = segment.open_read(SegmentComponent::Terms)?;
        let termdict_composite = CompositeFile::open(&termdict_file)?;

@@ -159,12 +168,10 @@ impl SegmentReader {
        let postings_file = segment.open_read(SegmentComponent::Postings)?;
        let postings_composite = CompositeFile::open(&postings_file)?;

-        let positions_composite = {
-            if let Ok(positions_file) = segment.open_read(SegmentComponent::Positions) {
-                CompositeFile::open(&positions_file)?
-            } else {
-                CompositeFile::empty()
-            }
+        let positions_composite = match segment.open_read(SegmentComponent::Positions) {
+            Ok(positions_file) => CompositeFile::open(&positions_file)?,
+            Err(OpenReadError::FileDoesNotExist(_)) => CompositeFile::empty(),
+            Err(open_read_error) => return Err(open_read_error.into()),
        };

        let schema = segment.schema();
@@ -204,6 +211,7 @@ impl SegmentReader {
            alive_bitset_opt,
            positions_composite,
            schema,
+            codec,
        })
    }

@@ -273,6 +281,7 @@ impl SegmentReader {
            postings_file,
            positions_file,
            record_option,
+            self.codec.clone(),
        )?);

        // by releasing the lock in between, we may end up opening the inverting index
@@ -323,7 +332,7 @@ impl SegmentReader {
                            // Without expand dots enabled dots need to be escaped.
                            let escaped_json_path = json_path.replace('.', "\\.");
                            let full_path = format!("{field_name}.{escaped_json_path}");
-                            let full_path_unescaped = format!("{}.{}", field_name, &json_path);
+                            let full_path_unescaped = format!("{}.{}", field_name, json_path);
                            map_to_canonical.insert(full_path_unescaped, full_path.to_string());
                            full_path
                        } else {
--- a/src/indexer/index_writer.rs
+++ b/src/indexer/index_writer.rs
@@ -9,6 +9,7 @@ use smallvec::smallvec;
 use super::operation::{AddOperation, UserOperation};
 use super::segment_updater::SegmentUpdater;
 use super::{AddBatch, AddBatchReceiver, AddBatchSender, PreparedCommit};
+use crate::codec::{Codec, StandardCodec};
 use crate::directory::{DirectoryLock, GarbageCollectionResult, TerminatingWrite};
 use crate::error::TantivyError;
 use crate::fastfield::write_alive_bitset;
@@ -68,12 +69,12 @@ pub struct IndexWriterOptions {
 /// indexing queue.
 /// Each indexing thread builds its own independent [`Segment`], via
 /// a `SegmentWriter` object.
-pub struct IndexWriter<D: Document = TantivyDocument> {
+pub struct IndexWriter<C: Codec = StandardCodec, D: Document = TantivyDocument> {
    // the lock is just used to bind the
    // lifetime of the lock with that of the IndexWriter.
    _directory_lock: Option<DirectoryLock>,

-    index: Index,
+    index: Index<C>,

    options: IndexWriterOptions,

@@ -82,7 +83,7 @@ pub struct IndexWriter<D: Document = TantivyDocument> {
    index_writer_status: IndexWriterStatus<D>,
    operation_sender: AddBatchSender<D>,

-    segment_updater: SegmentUpdater,
+    segment_updater: SegmentUpdater<C>,

    worker_id: usize,

@@ -128,8 +129,8 @@ fn compute_deleted_bitset(
 /// is `==` target_opstamp.
 /// For instance, there was no delete operation between the state of the `segment_entry` and
 /// the `target_opstamp`, `segment_entry` is not updated.
-pub fn advance_deletes(
-    mut segment: Segment,
+pub fn advance_deletes<C: Codec>(
+    mut segment: Segment<C>,
    segment_entry: &mut SegmentEntry,
    target_opstamp: Opstamp,
 ) -> crate::Result<()> {
@@ -179,11 +180,11 @@ pub fn advance_deletes(
    Ok(())
 }

-fn index_documents<D: Document>(
+fn index_documents<C: crate::codec::Codec, D: Document>(
    memory_budget: usize,
-    segment: Segment,
+    segment: Segment<C>,
    grouped_document_iterator: &mut dyn Iterator<Item = AddBatch<D>>,
-    segment_updater: &SegmentUpdater,
+    segment_updater: &SegmentUpdater<C>,
    mut delete_cursor: DeleteCursor,
 ) -> crate::Result<()> {
    let mut segment_writer = SegmentWriter::for_segment(memory_budget, segment.clone())?;
@@ -226,8 +227,8 @@ fn index_documents<D: Document>(
 }

 /// `doc_opstamps` is required to be non-empty.
-fn apply_deletes(
-    segment: &Segment,
+fn apply_deletes<C: crate::codec::Codec>(
+    segment: &Segment<C>,
    delete_cursor: &mut DeleteCursor,
    doc_opstamps: &[Opstamp],
 ) -> crate::Result<Option<BitSet>> {
@@ -262,7 +263,7 @@ fn apply_deletes(
    })
 }

-impl<D: Document> IndexWriter<D> {
+impl<C: Codec, D: Document> IndexWriter<C, D> {
    /// Create a new index writer. Attempts to acquire a lockfile.
    ///
    /// The lockfile should be deleted on drop, but it is possible
@@ -278,7 +279,7 @@ impl<D: Document> IndexWriter<D> {
    /// If the memory arena per thread is too small or too big, returns
    /// `TantivyError::InvalidArgument`
    pub(crate) fn new(
-        index: &Index,
+        index: &Index<C>,
        options: IndexWriterOptions,
        directory_lock: DirectoryLock,
    ) -> crate::Result<Self> {
@@ -345,7 +346,7 @@ impl<D: Document> IndexWriter<D> {
    }

    /// Accessor to the index.
-    pub fn index(&self) -> &Index {
+    pub fn index(&self) -> &Index<C> {
        &self.index
    }

@@ -393,7 +394,7 @@ impl<D: Document> IndexWriter<D> {
    /// It is safe to start writing file associated with the new `Segment`.
    /// These will not be garbage collected as long as an instance object of
    /// `SegmentMeta` object associated with the new `Segment` is "alive".
-    pub fn new_segment(&self) -> Segment {
+    pub fn new_segment(&self) -> Segment<C> {
        self.index.new_segment()
    }

@@ -615,7 +616,7 @@ impl<D: Document> IndexWriter<D> {
    /// It is also possible to add a payload to the `commit`
    /// using this API.
    /// See [`PreparedCommit::set_payload()`].
-    pub fn prepare_commit(&mut self) -> crate::Result<PreparedCommit<'_, D>> {
+    pub fn prepare_commit(&mut self) -> crate::Result<PreparedCommit<'_, C, D>> {
        // Here, because we join all of the worker threads,
        // all of the segment update for this commit have been
        // sent.
@@ -665,7 +666,7 @@ impl<D: Document> IndexWriter<D> {
        self.prepare_commit()?.commit()
    }

-    pub(crate) fn segment_updater(&self) -> &SegmentUpdater {
+    pub(crate) fn segment_updater(&self) -> &SegmentUpdater<C> {
        &self.segment_updater
    }

@@ -804,7 +805,7 @@ impl<D: Document> IndexWriter<D> {
    }
 }

-impl<D: Document> Drop for IndexWriter<D> {
+impl<C: Codec, D: Document> Drop for IndexWriter<C, D> {
    fn drop(&mut self) {
        self.segment_updater.kill();
        self.drop_sender();
--- a/src/indexer/log_merge_policy.rs
+++ b/src/indexer/log_merge_policy.rs
@@ -94,7 +94,7 @@ impl MergePolicy for LogMergePolicy {
    fn compute_merge_candidates(&self, segments: &[SegmentMeta]) -> Vec<MergeCandidate> {
        let size_sorted_segments = segments
            .iter()
-            .filter(|seg| seg.num_docs() <= (self.max_docs_before_merge as u32))
+            .filter(|seg| (seg.num_docs() as usize) <= self.max_docs_before_merge)
            .sorted_by_key(|seg| std::cmp::Reverse(seg.max_doc()))
            .collect::<Vec<&SegmentMeta>>();

@@ -372,4 +372,21 @@ mod tests {
        assert_eq!(merge_candidates[0].0.len(), 1);
        assert_eq!(merge_candidates[0].0[0], test_input[1].id());
    }
+
+    #[test]
+    fn test_max_docs_before_merge_large_value() {
+        // Regression test: (max_docs_before_merge as u32) truncates values > u32::MAX.
+        // Casting num_docs() to usize instead avoids the truncation.
+        let mut policy = LogMergePolicy::default();
+        policy.set_min_num_segments(2);
+        policy.set_max_docs_before_merge(5_000_000_000usize);
+        let test_input = vec![
+            create_random_segment_meta(100_000),
+            create_random_segment_meta(100_000),
+        ];
+        let result = policy.compute_merge_candidates(&test_input);
+        // Both segments should be eligible (100_000 < 5_000_000_000)
+        assert_eq!(result.len(), 1);
+        assert_eq!(result[0].0.len(), 2);
+    }
 }
--- a/src/indexer/merge_index_test.rs
+++ b/src/indexer/merge_index_test.rs
@@ -1,9 +1,10 @@
 #[cfg(test)]
 mod tests {
+    use crate::codec::StandardCodec;
    use crate::collector::TopDocs;
    use crate::fastfield::AliveBitSet;
    use crate::index::Index;
-    use crate::postings::Postings;
+    use crate::postings::{DocFreq, Postings};
    use crate::query::QueryParser;
    use crate::schema::{
        self, BytesOptions, Facet, FacetOptions, IndexRecordOption, NumericOptions,
@@ -121,21 +122,26 @@ mod tests {
            let my_text_field = index.schema().get_field("text_field").unwrap();
            let term_a = Term::from_field_text(my_text_field, "text");
            let inverted_index = segment_reader.inverted_index(my_text_field).unwrap();
+            let term_info = inverted_index.get_term_info(&term_a).unwrap().unwrap();
            let mut postings = inverted_index
-                .read_postings(&term_a, IndexRecordOption::WithFreqsAndPositions)
-                .unwrap()
+                .read_postings_from_terminfo_specialized(
+                    &term_info,
+                    IndexRecordOption::WithFreqsAndPositions,
+                    &StandardCodec,
+                )
                .unwrap();
-            assert_eq!(postings.doc_freq(), 2);
+            assert_eq!(postings.doc_freq(), DocFreq::Exact(2));
            let fallback_bitset = AliveBitSet::for_test_from_deleted_docs(&[0], 100);
            assert_eq!(
-                postings.doc_freq_given_deletes(
+                crate::indexer::merger::doc_freq_given_deletes(
+                    &postings,
                    segment_reader.alive_bitset().unwrap_or(&fallback_bitset)
                ),
                2
            );

            assert_eq!(postings.term_freq(), 1);
-            let mut output = vec![];
+            let mut output = Vec::new();
            postings.positions(&mut output);
            assert_eq!(output, vec![1]);
            postings.advance();
--- a/src/indexer/merger.rs
+++ b/src/indexer/merger.rs
@@ -7,6 +7,8 @@ use common::ReadOnlyBitSet;
 use itertools::Itertools;
 use measure_time::debug_time;

+use crate::codec::postings::PostingsCodec;
+use crate::codec::{Codec, StandardCodec};
 use crate::directory::WritePtr;
 use crate::docset::{DocSet, TERMINATED};
 use crate::error::DataCorruption;
@@ -15,7 +17,7 @@ use crate::fieldnorm::{FieldNormReader, FieldNormReaders, FieldNormsSerializer,
 use crate::index::{Segment, SegmentComponent, SegmentReader};
 use crate::indexer::doc_id_mapping::{MappingType, SegmentDocIdMapping};
 use crate::indexer::SegmentSerializer;
-use crate::postings::{InvertedIndexSerializer, Postings, SegmentPostings};
+use crate::postings::{InvertedIndexSerializer, Postings};
 use crate::schema::{value_type_to_column_type, Field, FieldType, Schema};
 use crate::store::StoreWriter;
 use crate::termdict::{TermMerger, TermOrdinal};
@@ -76,10 +78,11 @@ fn estimate_total_num_tokens(readers: &[SegmentReader], field: Field) -> crate::
    Ok(total_num_tokens)
 }

-pub struct IndexMerger {
+pub struct IndexMerger<C: Codec = StandardCodec> {
    schema: Schema,
    pub(crate) readers: Vec<SegmentReader>,
    max_doc: u32,
+    codec: C,
 }

 struct DeltaComputer {
@@ -144,8 +147,8 @@ fn extract_fast_field_required_columns(schema: &Schema) -> Vec<(String, ColumnTy
        .collect()
 }

-impl IndexMerger {
-    pub fn open(schema: Schema, segments: &[Segment]) -> crate::Result<IndexMerger> {
+impl<C: Codec> IndexMerger<C> {
+    pub fn open(schema: Schema, segments: &[Segment<C>]) -> crate::Result<IndexMerger<C>> {
        let alive_bitset = segments.iter().map(|_| None).collect_vec();
        Self::open_with_custom_alive_set(schema, segments, alive_bitset)
    }
@@ -162,11 +165,15 @@ impl IndexMerger {
    // This can be used to merge but also apply an additional filter.
    // One use case is demux, which is basically taking a list of
    // segments and partitions them e.g. by a value in a field.
+    //
+    // # Panics if segments is empty.
    pub fn open_with_custom_alive_set(
        schema: Schema,
-        segments: &[Segment],
+        segments: &[Segment<C>],
        alive_bitset_opt: Vec<Option<AliveBitSet>>,
-    ) -> crate::Result<IndexMerger> {
+    ) -> crate::Result<IndexMerger<C>> {
+        assert!(!segments.is_empty());
+        let codec = segments[0].index().codec().clone();
        let mut readers = vec![];
        for (segment, new_alive_bitset_opt) in segments.iter().zip(alive_bitset_opt) {
            if segment.meta().num_docs() > 0 {
@@ -189,6 +196,7 @@ impl IndexMerger {
            schema,
            readers,
            max_doc,
+            codec,
        })
    }

@@ -287,7 +295,7 @@ impl IndexMerger {
        &self,
        indexed_field: Field,
        _field_type: &FieldType,
-        serializer: &mut InvertedIndexSerializer,
+        serializer: &mut InvertedIndexSerializer<C>,
        fieldnorm_reader: Option<FieldNormReader>,
        doc_id_mapping: &SegmentDocIdMapping,
    ) -> crate::Result<()> {
@@ -355,7 +363,10 @@ impl IndexMerger {
                         indexed. Have you modified the schema?",
        );

-        let mut segment_postings_containing_the_term: Vec<(usize, SegmentPostings)> = vec![];
+        let mut segment_postings_containing_the_term: Vec<(
+            usize,
+            <C::PostingsCodec as PostingsCodec>::Postings,
+        )> = Vec::with_capacity(self.readers.len());

        while merged_terms.advance() {
            segment_postings_containing_the_term.clear();
@@ -367,17 +378,24 @@ impl IndexMerger {
            for (segment_ord, term_info) in merged_terms.current_segment_ords_and_term_infos() {
                let segment_reader = &self.readers[segment_ord];
                let inverted_index: &InvertedIndexReader = &field_readers[segment_ord];
-                let segment_postings = inverted_index
-                    .read_postings_from_terminfo(&term_info, segment_postings_option)?;
+                let postings = inverted_index.read_postings_from_terminfo_specialized(
+                    &term_info,
+                    segment_postings_option,
+                    &self.codec,
+                )?;
                let alive_bitset_opt = segment_reader.alive_bitset();
                let doc_freq = if let Some(alive_bitset) = alive_bitset_opt {
-                    segment_postings.doc_freq_given_deletes(alive_bitset)
+                    doc_freq_given_deletes(&postings, alive_bitset)
                } else {
-                    segment_postings.doc_freq()
+                    // We do not an exact document frequency here.
+                    match postings.doc_freq() {
+                        crate::postings::DocFreq::Approximate(_) => exact_doc_freq(&postings),
+                        crate::postings::DocFreq::Exact(doc_freq) => doc_freq,
+                    }
                };
                if doc_freq > 0u32 {
                    total_doc_freq += doc_freq;
-                    segment_postings_containing_the_term.push((segment_ord, segment_postings));
+                    segment_postings_containing_the_term.push((segment_ord, postings));
                }
            }

@@ -395,11 +413,7 @@ impl IndexMerger {
            assert!(!segment_postings_containing_the_term.is_empty());

            let has_term_freq = {
-                let has_term_freq = !segment_postings_containing_the_term[0]
-                    .1
-                    .block_cursor
-                    .freqs()
-                    .is_empty();
+                let has_term_freq = segment_postings_containing_the_term[0].1.has_freq();
                for (_, postings) in &segment_postings_containing_the_term[1..] {
                    // This may look at a strange way to test whether we have term freq or not.
                    // With JSON object, the schema is not sufficient to know whether a term
@@ -415,7 +429,7 @@ impl IndexMerger {
                    //
                    // Overall the reliable way to know if we have actual frequencies loaded or not
                    // is to check whether the actual decoded array is empty or not.
-                    if has_term_freq == postings.block_cursor.freqs().is_empty() {
+                    if postings.has_freq() != has_term_freq {
                        return Err(DataCorruption::comment_only(
                            "Term freqs are inconsistent across segments",
                        )
@@ -467,7 +481,7 @@ impl IndexMerger {

    fn write_postings(
        &self,
-        serializer: &mut InvertedIndexSerializer,
+        serializer: &mut InvertedIndexSerializer<C>,
        fieldnorm_readers: FieldNormReaders,
        doc_id_mapping: &SegmentDocIdMapping,
    ) -> crate::Result<()> {
@@ -525,7 +539,7 @@ impl IndexMerger {
    ///
    /// # Returns
    /// The number of documents in the resulting segment.
-    pub fn write(&self, mut serializer: SegmentSerializer) -> crate::Result<u32> {
+    pub fn write(&self, mut serializer: SegmentSerializer<C>) -> crate::Result<u32> {
        let doc_id_mapping = self.get_doc_id_from_concatenated_data()?;
        debug!("write-fieldnorms");
        if let Some(fieldnorms_serializer) = serializer.extract_fieldnorms_serializer() {
@@ -553,6 +567,43 @@ impl IndexMerger {
    }
 }

+/// Compute the number of non-deleted documents.
+///
+/// This method will clone and scan through the posting lists.
+/// (this is a rather expensive operation).
+pub(crate) fn doc_freq_given_deletes<P: Postings + Clone>(
+    postings: &P,
+    alive_bitset: &AliveBitSet,
+) -> u32 {
+    let mut docset = postings.clone();
+    let mut doc_freq = 0;
+    loop {
+        let doc = docset.doc();
+        if doc == TERMINATED {
+            return doc_freq;
+        }
+        if alive_bitset.is_alive(doc) {
+            doc_freq += 1u32;
+        }
+        docset.advance();
+    }
+}
+
+/// If the postings is not able to inform us of the document frequency,
+/// we just scan through it.
+pub(crate) fn exact_doc_freq<P: Postings + Clone>(postings: &P) -> u32 {
+    let mut docset = postings.clone();
+    let mut doc_freq = 0;
+    loop {
+        let doc = docset.doc();
+        if doc == TERMINATED {
+            return doc_freq;
+        }
+        doc_freq += 1u32;
+        docset.advance();
+    }
+}
+
 #[cfg(test)]
 mod tests {

@@ -561,12 +612,16 @@ mod tests {
    use proptest::strategy::Strategy;
    use schema::FAST;

+    use crate::codec::postings::PostingsCodec;
+    use crate::codec::standard::postings::StandardPostingsCodec;
    use crate::collector::tests::{
        BytesFastFieldTestCollector, FastFieldTestCollector, TEST_COLLECTOR_WITH_SCORE,
    };
    use crate::collector::{Count, FacetCollector};
+    use crate::fastfield::AliveBitSet;
    use crate::index::{Index, SegmentId};
    use crate::indexer::NoMergePolicy;
+    use crate::postings::{DocFreq, Postings as _};
    use crate::query::{AllQuery, BooleanQuery, EnableScoring, Scorer, TermQuery};
    use crate::schema::{
        Facet, FacetOptions, IndexRecordOption, NumericOptions, TantivyDocument, Term,
@@ -1518,10 +1573,10 @@ mod tests {
        let searcher = reader.searcher();
        let mut term_scorer = term_query
            .specialized_weight(EnableScoring::enabled_from_searcher(&searcher))?
-            .term_scorer_for_test(searcher.segment_reader(0u32), 1.0)?
+            .term_scorer_for_test(searcher.segment_reader(0u32), 1.0)
            .unwrap();
        assert_eq!(term_scorer.doc(), 0);
-        assert_nearly_equals!(term_scorer.block_max_score(), 0.0079681855);
+        assert_nearly_equals!(term_scorer.seek_block_max(0), 0.0079681855);
        assert_nearly_equals!(term_scorer.score(), 0.0079681855);
        for _ in 0..81 {
            writer.add_document(doc!(text=>"hello happy tax payer"))?;
@@ -1534,13 +1589,13 @@ mod tests {
        for segment_reader in searcher.segment_readers() {
            let mut term_scorer = term_query
                .specialized_weight(EnableScoring::enabled_from_searcher(&searcher))?
-                .term_scorer_for_test(segment_reader, 1.0)?
+                .term_scorer_for_test(segment_reader, 1.0)
                .unwrap();
            // the difference compared to before is intrinsic to the bm25 formula. no worries
            // there.
            for doc in segment_reader.doc_ids_alive() {
                assert_eq!(term_scorer.doc(), doc);
-                assert_nearly_equals!(term_scorer.block_max_score(), 0.003478312);
+                assert_nearly_equals!(term_scorer.seek_block_max(doc), 0.003478312);
                assert_nearly_equals!(term_scorer.score(), 0.003478312);
                term_scorer.advance();
            }
@@ -1560,12 +1615,12 @@ mod tests {
        let segment_reader = searcher.segment_reader(0u32);
        let mut term_scorer = term_query
            .specialized_weight(EnableScoring::enabled_from_searcher(&searcher))?
-            .term_scorer_for_test(segment_reader, 1.0)?
+            .term_scorer_for_test(segment_reader, 1.0)
            .unwrap();
        // the difference compared to before is intrinsic to the bm25 formula. no worries there.
        for doc in segment_reader.doc_ids_alive() {
            assert_eq!(term_scorer.doc(), doc);
-            assert_nearly_equals!(term_scorer.block_max_score(), 0.003478312);
+            assert_nearly_equals!(term_scorer.seek_block_max(doc), 0.003478312);
            assert_nearly_equals!(term_scorer.score(), 0.003478312);
            term_scorer.advance();
        }
@@ -1579,4 +1634,16 @@ mod tests {
        assert!(((super::MAX_DOC_LIMIT - 1) as i32) >= 0);
        assert!((super::MAX_DOC_LIMIT as i32) < 0);
    }
+
+    #[test]
+    fn test_doc_freq_given_delete() {
+        let docs =
+            <StandardPostingsCodec as PostingsCodec>::Postings::create_from_docs(&[0, 2, 10]);
+        assert_eq!(docs.doc_freq(), DocFreq::Exact(3));
+        let alive_bitset = AliveBitSet::for_test_from_deleted_docs(&[2], 12);
+        assert_eq!(super::doc_freq_given_deletes(&docs, &alive_bitset), 2);
+        let all_deleted =
+            AliveBitSet::for_test_from_deleted_docs(&[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11], 12);
+        assert_eq!(super::doc_freq_given_deletes(&docs, &all_deleted), 0);
+    }
 }
--- a/src/indexer/prepared_commit.rs
+++ b/src/indexer/prepared_commit.rs
@@ -1,16 +1,17 @@
 use super::IndexWriter;
+use crate::codec::Codec;
 use crate::schema::document::Document;
 use crate::{FutureResult, Opstamp, TantivyDocument};

 /// A prepared commit
-pub struct PreparedCommit<'a, D: Document = TantivyDocument> {
-    index_writer: &'a mut IndexWriter<D>,
+pub struct PreparedCommit<'a, C: Codec, D: Document = TantivyDocument> {
+    index_writer: &'a mut IndexWriter<C, D>,
    payload: Option<String>,
    opstamp: Opstamp,
 }

-impl<'a, D: Document> PreparedCommit<'a, D> {
-    pub(crate) fn new(index_writer: &'a mut IndexWriter<D>, opstamp: Opstamp) -> Self {
+impl<'a, C: Codec, D: Document> PreparedCommit<'a, C, D> {
+    pub(crate) fn new(index_writer: &'a mut IndexWriter<C, D>, opstamp: Opstamp) -> Self {
        Self {
            index_writer,
            payload: None,
--- a/src/indexer/segment_serializer.rs
+++ b/src/indexer/segment_serializer.rs
@@ -8,17 +8,17 @@ use crate::store::StoreWriter;

 /// Segment serializer is in charge of laying out on disk
 /// the data accumulated and sorted by the `SegmentWriter`.
-pub struct SegmentSerializer {
-    segment: Segment,
+pub struct SegmentSerializer<C: crate::codec::Codec> {
+    segment: Segment<C>,
    pub(crate) store_writer: StoreWriter,
    fast_field_write: WritePtr,
    fieldnorms_serializer: Option<FieldNormsSerializer>,
-    postings_serializer: InvertedIndexSerializer,
+    postings_serializer: InvertedIndexSerializer<C>,
 }

-impl SegmentSerializer {
+impl<C: crate::codec::Codec> SegmentSerializer<C> {
    /// Creates a new `SegmentSerializer`.
-    pub fn for_segment(mut segment: Segment) -> crate::Result<SegmentSerializer> {
+    pub fn for_segment(mut segment: Segment<C>) -> crate::Result<SegmentSerializer<C>> {
        let settings = segment.index().settings().clone();
        let store_writer = {
            let store_write = segment.open_write(SegmentComponent::Store)?;
@@ -50,12 +50,12 @@ impl SegmentSerializer {
        self.store_writer.mem_usage()
    }

-    pub fn segment(&self) -> &Segment {
+    pub fn segment(&self) -> &Segment<C> {
        &self.segment
    }

    /// Accessor to the `PostingsSerializer`.
-    pub fn get_postings_serializer(&mut self) -> &mut InvertedIndexSerializer {
+    pub fn get_postings_serializer(&mut self) -> &mut InvertedIndexSerializer<C> {
        &mut self.postings_serializer
    }

--- a/src/indexer/segment_updater.rs
+++ b/src/indexer/segment_updater.rs
@@ -10,10 +10,13 @@ use std::sync::{Arc, RwLock};
 use rayon::{ThreadPool, ThreadPoolBuilder};

 use super::segment_manager::SegmentManager;
+use crate::codec::Codec;
 use crate::core::META_FILEPATH;
 use crate::directory::{Directory, DirectoryClone, GarbageCollectionResult};
 use crate::fastfield::AliveBitSet;
-use crate::index::{Index, IndexMeta, IndexSettings, Segment, SegmentId, SegmentMeta};
+use crate::index::{
+    CodecConfiguration, Index, IndexMeta, IndexSettings, Segment, SegmentId, SegmentMeta,
+};
 use crate::indexer::delete_queue::DeleteCursor;
 use crate::indexer::index_writer::advance_deletes;
 use crate::indexer::merge_operation::MergeOperationInventory;
@@ -61,10 +64,10 @@ pub(crate) fn save_metas(metas: &IndexMeta, directory: &dyn Directory) -> crate:
 // We voluntarily pass a merge_operation ref to guarantee that
 // the merge_operation is alive during the process
 #[derive(Clone)]
-pub(crate) struct SegmentUpdater(Arc<InnerSegmentUpdater>);
+pub(crate) struct SegmentUpdater<C: Codec>(Arc<InnerSegmentUpdater<C>>);

-impl Deref for SegmentUpdater {
-    type Target = InnerSegmentUpdater;
+impl<C: Codec> Deref for SegmentUpdater<C> {
+    type Target = InnerSegmentUpdater<C>;

    #[inline]
    fn deref(&self) -> &Self::Target {
@@ -72,8 +75,8 @@ impl Deref for SegmentUpdater {
    }
 }

-fn garbage_collect_files(
-    segment_updater: SegmentUpdater,
+fn garbage_collect_files<C: Codec>(
+    segment_updater: SegmentUpdater<C>,
 ) -> crate::Result<GarbageCollectionResult> {
    info!("Running garbage collection");
    let mut index = segment_updater.index.clone();
@@ -84,8 +87,8 @@ fn garbage_collect_files(

 /// Merges a list of segments the list of segment givens in the `segment_entries`.
 /// This function happens in the calling thread and is computationally expensive.
-fn merge(
-    index: &Index,
+fn merge<Codec: crate::codec::Codec>(
+    index: &Index<Codec>,
    mut segment_entries: Vec<SegmentEntry>,
    target_opstamp: Opstamp,
 ) -> crate::Result<Option<SegmentEntry>> {
@@ -108,13 +111,13 @@ fn merge(

    let delete_cursor = segment_entries[0].delete_cursor().clone();

-    let segments: Vec<Segment> = segment_entries
+    let segments: Vec<Segment<Codec>> = segment_entries
        .iter()
        .map(|segment_entry| index.segment(segment_entry.meta().clone()))
        .collect();

    // An IndexMerger is like a "view" of our merged segments.
-    let merger: IndexMerger = IndexMerger::open(index.schema(), &segments[..])?;
+    let merger: IndexMerger<Codec> = IndexMerger::open(index.schema(), &segments[..])?;

    // ... we just serialize this index merger in our new segment to merge the segments.
    let segment_serializer = SegmentSerializer::for_segment(merged_segment.clone())?;
@@ -139,10 +142,10 @@ fn merge(
 /// meant to work if you have an `IndexWriter` running for the origin indices, or
 /// the destination `Index`.
 #[doc(hidden)]
-pub fn merge_indices<T: Into<Box<dyn Directory>>>(
-    indices: &[Index],
-    output_directory: T,
-) -> crate::Result<Index> {
+pub fn merge_indices<Codec: crate::codec::Codec>(
+    indices: &[Index<Codec>],
+    output_directory: Box<dyn Directory>,
+) -> crate::Result<Index<Codec>> {
    if indices.is_empty() {
        // If there are no indices to merge, there is no need to do anything.
        return Err(crate::TantivyError::InvalidArgument(
@@ -163,7 +166,7 @@ pub fn merge_indices<T: Into<Box<dyn Directory>>>(
        ));
    }

-    let mut segments: Vec<Segment> = Vec::new();
+    let mut segments: Vec<Segment<Codec>> = Vec::new();
    for index in indices {
        segments.extend(index.searchable_segments()?);
    }
@@ -185,12 +188,12 @@ pub fn merge_indices<T: Into<Box<dyn Directory>>>(
 /// meant to work if you have an `IndexWriter` running for the origin indices, or
 /// the destination `Index`.
 #[doc(hidden)]
-pub fn merge_filtered_segments<T: Into<Box<dyn Directory>>>(
-    segments: &[Segment],
+pub fn merge_filtered_segments<C: crate::codec::Codec, T: Into<Box<dyn Directory>>>(
+    segments: &[Segment<C>],
    target_settings: IndexSettings,
    filter_doc_ids: Vec<Option<AliveBitSet>>,
    output_directory: T,
-) -> crate::Result<Index> {
+) -> crate::Result<Index<C>> {
    if segments.is_empty() {
        // If there are no indices to merge, there is no need to do anything.
        return Err(crate::TantivyError::InvalidArgument(
@@ -211,14 +214,15 @@ pub fn merge_filtered_segments<T: Into<Box<dyn Directory>>>(
        ));
    }

-    let mut merged_index = Index::create(
-        output_directory,
-        target_schema.clone(),
-        target_settings.clone(),
-    )?;
+    let mut merged_index: Index<C> = Index::builder()
+        .schema(target_schema.clone())
+        .codec(segments[0].index().codec().clone())
+        .settings(target_settings.clone())
+        .create(output_directory.into())?;
+
    let merged_segment = merged_index.new_segment();
    let merged_segment_id = merged_segment.id();
-    let merger: IndexMerger =
+    let merger: IndexMerger<C> =
        IndexMerger::open_with_custom_alive_set(merged_index.schema(), segments, filter_doc_ids)?;
    let segment_serializer = SegmentSerializer::for_segment(merged_segment)?;
    let num_docs = merger.write(segment_serializer)?;
@@ -235,6 +239,7 @@ pub fn merge_filtered_segments<T: Into<Box<dyn Directory>>>(
            ))
            .trim_end()
    );
+    let codec_configuration = CodecConfiguration::from(segments[0].index().codec());

    let index_meta = IndexMeta {
        index_settings: target_settings, // index_settings of all segments should be the same
@@ -242,6 +247,7 @@ pub fn merge_filtered_segments<T: Into<Box<dyn Directory>>>(
        schema: target_schema,
        opstamp: 0u64,
        payload: Some(stats),
+        codec: codec_configuration,
    };

    // save the meta.json
@@ -250,7 +256,7 @@ pub fn merge_filtered_segments<T: Into<Box<dyn Directory>>>(
    Ok(merged_index)
 }

-pub(crate) struct InnerSegmentUpdater {
+pub(crate) struct InnerSegmentUpdater<C: Codec> {
    // we keep a copy of the current active IndexMeta to
    // avoid loading the file every time we need it in the
    // `SegmentUpdater`.
@@ -261,7 +267,7 @@ pub(crate) struct InnerSegmentUpdater {
    pool: ThreadPool,
    merge_thread_pool: ThreadPool,

-    index: Index,
+    index: Index<C>,
    segment_manager: SegmentManager,
    merge_policy: RwLock<Arc<dyn MergePolicy>>,
    killed: AtomicBool,
@@ -269,13 +275,13 @@ pub(crate) struct InnerSegmentUpdater {
    merge_operations: MergeOperationInventory,
 }

-impl SegmentUpdater {
+impl<Codec: crate::codec::Codec> SegmentUpdater<Codec> {
    pub fn create(
-        index: Index,
+        index: Index<Codec>,
        stamper: Stamper,
        delete_cursor: &DeleteCursor,
        num_merge_threads: usize,
-    ) -> crate::Result<SegmentUpdater> {
+    ) -> crate::Result<Self> {
        let segments = index.searchable_segment_metas()?;
        let segment_manager = SegmentManager::from_segments(segments, delete_cursor);
        let pool = ThreadPoolBuilder::new()
@@ -403,13 +409,16 @@ impl SegmentUpdater {
            // from the different drives.
            //
            // Segment 1 from disk 1, Segment 1 from disk 2, etc.
-            committed_segment_metas.sort_by_key(|segment_meta| -(segment_meta.max_doc() as i32));
+            committed_segment_metas
+                .sort_by_key(|segment_meta| std::cmp::Reverse(segment_meta.max_doc()));
+            let codec = CodecConfiguration::from(index.codec());
            let index_meta = IndexMeta {
                index_settings: index.settings().clone(),
                segments: committed_segment_metas,
                schema: index.schema(),
                opstamp,
                payload: commit_message,
+                codec,
            };
            // TODO add context to the error.
            save_metas(&index_meta, directory.box_clone().borrow_mut())?;
@@ -443,7 +452,7 @@ impl SegmentUpdater {
        opstamp: Opstamp,
        payload: Option<String>,
    ) -> FutureResult<Opstamp> {
-        let segment_updater: SegmentUpdater = self.clone();
+        let segment_updater: SegmentUpdater<Codec> = self.clone();
        self.schedule_task(move || {
            let segment_entries = segment_updater.purge_deletes(opstamp)?;
            segment_updater.segment_manager.commit(segment_entries);
@@ -648,9 +657,6 @@ impl SegmentUpdater {
                                    merge_operation.segment_ids(),
                                    advance_deletes_err
                                );
-                                assert!(!cfg!(test), "Merge failed.");
-
-                                // ... cancel merge
                                // `merge_operations` are tracked. As it is dropped, the
                                // the segment_ids will be available again for merge.
                                return Err(advance_deletes_err);
@@ -702,9 +708,11 @@ impl SegmentUpdater {
 #[cfg(test)]
 mod tests {
    use super::merge_indices;
+    use crate::codec::StandardCodec;
    use crate::collector::TopDocs;
    use crate::directory::RamDirectory;
    use crate::fastfield::AliveBitSet;
+    use crate::index::{SegmentId, SegmentMetaInventory};
    use crate::indexer::merge_policy::tests::MergeWheneverPossible;
    use crate::indexer::merger::IndexMerger;
    use crate::indexer::segment_updater::merge_filtered_segments;
@@ -712,6 +720,22 @@ mod tests {
    use crate::schema::*;
    use crate::{Directory, DocAddress, Index, Segment};

+    #[test]
+    fn test_segment_sort_large_max_doc() {
+        // Regression test: -(max_doc as i32) overflows for max_doc >= 2^31.
+        // Using std::cmp::Reverse avoids this.
+        let inventory = SegmentMetaInventory::default();
+        let mut metas = [
+            inventory.new_segment_meta(SegmentId::generate_random(), 100),
+            inventory.new_segment_meta(SegmentId::generate_random(), (1u32 << 31) - 1),
+            inventory.new_segment_meta(SegmentId::generate_random(), 50_000),
+        ];
+        metas.sort_by_key(|m| std::cmp::Reverse(m.max_doc()));
+        assert_eq!(metas[0].max_doc(), (1u32 << 31) - 1);
+        assert_eq!(metas[1].max_doc(), 50_000);
+        assert_eq!(metas[2].max_doc(), 100);
+    }
+
    #[test]
    fn test_delete_during_merge() -> crate::Result<()> {
        let mut schema_builder = Schema::builder();
@@ -915,7 +939,7 @@ mod tests {

    #[test]
    fn test_merge_empty_indices_array() {
-        let merge_result = merge_indices(&[], RamDirectory::default());
+        let merge_result = merge_indices::<StandardCodec>(&[], Box::new(RamDirectory::default()));
        assert!(merge_result.is_err());
    }

@@ -942,7 +966,10 @@ mod tests {
        };

        // mismatched schema index list
-        let result = merge_indices(&[first_index, second_index], RamDirectory::default());
+        let result = merge_indices(
+            &[first_index, second_index],
+            Box::new(RamDirectory::default()),
+        );
        assert!(result.is_err());

        Ok(())
--- a/src/indexer/segment_writer.rs
+++ b/src/indexer/segment_writer.rs
@@ -4,6 +4,7 @@ use itertools::Itertools;
 use tokenizer_api::BoxTokenStream;

 use super::operation::AddOperation;
+use crate::codec::Codec;
 use crate::fastfield::FastFieldsWriter;
 use crate::fieldnorm::{FieldNormReaders, FieldNormsWriter};
 use crate::index::{Segment, SegmentComponent};
@@ -12,7 +13,7 @@ use crate::indexer::segment_serializer::SegmentSerializer;
 use crate::json_utils::{index_json_value, IndexingPositionsPerPath};
 use crate::postings::{
    compute_table_memory_size, serialize_postings, IndexingContext, IndexingPosition,
-    PerFieldPostingsWriter, PostingsWriter,
+    PerFieldPostingsWriter, PostingsWriter, PostingsWriterEnum,
 };
 use crate::schema::document::{Document, Value};
 use crate::schema::{FieldEntry, FieldType, Schema, DATE_TIME_PRECISION_INDEXED};
@@ -45,11 +46,11 @@ fn compute_initial_table_size(per_thread_memory_budget: usize) -> crate::Result<
 ///
 /// They creates the postings list in anonymous memory.
 /// The segment is laid on disk when the segment gets `finalized`.
-pub struct SegmentWriter {
+pub struct SegmentWriter<Codec: crate::codec::Codec> {
    pub(crate) max_doc: DocId,
    pub(crate) ctx: IndexingContext,
    pub(crate) per_field_postings_writers: PerFieldPostingsWriter,
-    pub(crate) segment_serializer: SegmentSerializer,
+    pub(crate) segment_serializer: SegmentSerializer<Codec>,
    pub(crate) fast_field_writers: FastFieldsWriter,
    pub(crate) fieldnorms_writer: FieldNormsWriter,
    pub(crate) json_path_writer: JsonPathWriter,
@@ -60,7 +61,7 @@ pub struct SegmentWriter {
    schema: Schema,
 }

-impl SegmentWriter {
+impl<Codec: crate::codec::Codec> SegmentWriter<Codec> {
    /// Creates a new `SegmentWriter`
    ///
    /// The arguments are defined as follows
@@ -70,7 +71,10 @@ impl SegmentWriter {
    ///   behavior as a memory limit.
    /// - segment: The segment being written
    /// - schema
-    pub fn for_segment(memory_budget_in_bytes: usize, segment: Segment) -> crate::Result<Self> {
+    pub fn for_segment(
+        memory_budget_in_bytes: usize,
+        segment: Segment<Codec>,
+    ) -> crate::Result<Self> {
        let schema = segment.schema();
        let tokenizer_manager = segment.index().tokenizers().clone();
        let tokenizer_manager_fast_field = segment.index().fast_field_tokenizer().clone();
@@ -169,7 +173,7 @@ impl SegmentWriter {
            }

            let (term_buffer, ctx) = (&mut self.term_buffer, &mut self.ctx);
-            let postings_writer: &mut dyn PostingsWriter =
+            let postings_writer: &mut PostingsWriterEnum =
                self.per_field_postings_writers.get_for_field_mut(field);
            term_buffer.clear_with_field(field);

@@ -386,13 +390,13 @@ impl SegmentWriter {
 /// to the `SegmentSerializer`.
 ///
 /// `doc_id_map` is used to map to the new doc_id order.
-fn remap_and_write(
+fn remap_and_write<C: Codec>(
    schema: Schema,
    per_field_postings_writers: &PerFieldPostingsWriter,
    ctx: IndexingContext,
    fast_field_writers: FastFieldsWriter,
    fieldnorms_writer: &FieldNormsWriter,
-    mut serializer: SegmentSerializer,
+    mut serializer: SegmentSerializer<C>,
 ) -> crate::Result<()> {
    debug!("remap-and-write");
    if let Some(fieldnorms_serializer) = serializer.extract_fieldnorms_serializer() {
--- a/src/indexer/single_segment_index_writer.rs
+++ b/src/indexer/single_segment_index_writer.rs
@@ -1,5 +1,7 @@
 use std::marker::PhantomData;

+use crate::codec::StandardCodec;
+use crate::index::CodecConfiguration;
 use crate::indexer::operation::AddOperation;
 use crate::indexer::segment_updater::save_metas;
 use crate::indexer::SegmentWriter;
@@ -7,22 +9,25 @@ use crate::schema::document::Document;
 use crate::{Directory, Index, IndexMeta, Opstamp, Segment, TantivyDocument};

 #[doc(hidden)]
-pub struct SingleSegmentIndexWriter<D: Document = TantivyDocument> {
-    segment_writer: SegmentWriter,
-    segment: Segment,
+pub struct SingleSegmentIndexWriter<
+    Codec: crate::codec::Codec = StandardCodec,
+    D: Document = TantivyDocument,
+> {
+    pub segment_writer: SegmentWriter<Codec>,
+    segment: Segment<Codec>,
    opstamp: Opstamp,
-    _phantom: PhantomData<D>,
+    _doc: PhantomData<D>,
 }

-impl<D: Document> SingleSegmentIndexWriter<D> {
-    pub fn new(index: Index, mem_budget: usize) -> crate::Result<Self> {
+impl<Codec: crate::codec::Codec, D: Document> SingleSegmentIndexWriter<Codec, D> {
+    pub fn new(index: Index<Codec>, mem_budget: usize) -> crate::Result<Self> {
        let segment = index.new_segment();
        let segment_writer = SegmentWriter::for_segment(mem_budget, segment.clone())?;
        Ok(Self {
            segment_writer,
            segment,
            opstamp: 0,
-            _phantom: PhantomData,
+            _doc: PhantomData,
        })
    }

@@ -37,10 +42,10 @@ impl<D: Document> SingleSegmentIndexWriter<D> {
            .add_document(AddOperation { opstamp, document })
    }

-    pub fn finalize(self) -> crate::Result<Index> {
+    pub fn finalize(self) -> crate::Result<Index<Codec>> {
        let max_doc = self.segment_writer.max_doc();
        self.segment_writer.finalize()?;
-        let segment: Segment = self.segment.with_max_doc(max_doc);
+        let segment: Segment<Codec> = self.segment.with_max_doc(max_doc);
        let index = segment.index();
        let index_meta = IndexMeta {
            index_settings: index.settings().clone(),
@@ -48,6 +53,7 @@ impl<D: Document> SingleSegmentIndexWriter<D> {
            schema: index.schema(),
            opstamp: 0,
            payload: None,
+            codec: CodecConfiguration::from(index.codec()),
        };
        save_metas(&index_meta, index.directory())?;
        index.directory().sync_directory()?;
--- a/Show More
+++ b/Show More