bump version

increase min memory to 15MB for indexing
With tantivy 0.20 the minimum memory consumption per SegmentWriter increased to 12MB. 7MB are for the different fast field collectors types (they could be lazily created). Increase the minimum memory from 3MB to 15MB. Change memory variable naming from arena to budget. closes #2156
2026-01-03 07:42:54 +00:00 · 2023-10-25 20:41:07 +08:00 · 2023-10-25 20:37:47 +08:00 · 2023-10-25 20:37:36 +08:00 · 2023-09-01 13:58:58 +02:00 · 2023-08-31 10:01:44 +02:00
167 changed files with 6794 additions and 2175 deletions
--- a/.github/workflows/coverage.yml
+++ b/.github/workflows/coverage.yml
@@ -6,6 +6,11 @@ on:
  pull_request:
    branches: [main]

+# Ensures that we cancel running jobs for the same PR / same workflow.
+concurrency:
+  group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }}
+  cancel-in-progress: true
+
 jobs:
  coverage:
    runs-on: ubuntu-latest
--- a/.github/workflows/long_running.yml
+++ b/.github/workflows/long_running.yml
@@ -8,6 +8,11 @@ env:
  CARGO_TERM_COLOR: always
  NUM_FUNCTIONAL_TEST_ITERATIONS: 20000

+# Ensures that we cancel running jobs for the same PR / same workflow.
+concurrency:
+  group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }}
+  cancel-in-progress: true
+
 jobs:
  test:

--- a/.github/workflows/test.yml
+++ b/.github/workflows/test.yml
@@ -9,6 +9,11 @@ on:
 env:
  CARGO_TERM_COLOR: always

+# Ensures that we cancel running jobs for the same PR / same workflow.
+concurrency:
+  group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }}
+  cancel-in-progress: true
+
 jobs:
  check:

@@ -48,7 +53,7 @@ jobs:
    strategy:
      matrix:
        features: [
-            { label: "all", flags: "mmap,stopwords,brotli-compression,lz4-compression,snappy-compression,zstd-compression,failpoints" },
+            { label: "all", flags: "mmap,stopwords,lz4-compression,zstd-compression,failpoints" },
            { label: "quickwit", flags: "mmap,quickwit,failpoints" }
        ]

--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -1,3 +1,119 @@
+Tantivy 0.21
+================================
+#### Bugfixes
+- Fix track fast field memory consumption, which led to higher memory consumption than the budget allowed during indexing [#2148](https://github.com/quickwit-oss/tantivy/issues/2148)[#2147](https://github.com/quickwit-oss/tantivy/issues/2147)(@PSeitz)
+- Fix a regression from 0.20 where sort index by date wasn't working anymore [#2124](https://github.com/quickwit-oss/tantivy/issues/2124)(@PSeitz)
+- Fix getting the root facet on the `FacetCollector`. [#2086](https://github.com/quickwit-oss/tantivy/issues/2086)(@adamreichold)
+- Align numerical type priority order of columnar and query. [#2088](https://github.com/quickwit-oss/tantivy/issues/2088)(@fmassot)
+#### Breaking Changes
+- Remove support for Brotli and Snappy compression [#2123](https://github.com/quickwit-oss/tantivy/issues/2123)(@adamreichold)
+#### Features/Improvements
+- Implement lenient query parser [#2129](https://github.com/quickwit-oss/tantivy/pull/2129)(@trinity-1686a)
+- order_by_u64_field and order_by_fast_field allow sorting in ascending and descending order [#2111](https://github.com/quickwit-oss/tantivy/issues/2111)(@naveenann)
+- Allow dynamic filters in text analyzer builder [#2110](https://github.com/quickwit-oss/tantivy/issues/2110)(@fulmicoton @fmassot)
+- **Aggregation**
+  - Add missing parameter for term aggregation [#2149](https://github.com/quickwit-oss/tantivy/issues/2149)[#2103](https://github.com/quickwit-oss/tantivy/issues/2103)(@PSeitz)
+  - Add missing parameter for percentiles [#2157](https://github.com/quickwit-oss/tantivy/issues/2157)(@PSeitz)
+  - Add missing parameter for stats,min,max,count,sum,avg [#2151](https://github.com/quickwit-oss/tantivy/issues/2151)(@PSeitz)
+  - Improve aggregation deserialization error message [#2150](https://github.com/quickwit-oss/tantivy/issues/2150)(@PSeitz)
+  - Add validation for type Bytes to term_agg [#2077](https://github.com/quickwit-oss/tantivy/issues/2077)(@PSeitz)
+  - Alternative mixed field collection [#2135](https://github.com/quickwit-oss/tantivy/issues/2135)(@PSeitz)
+- Add missing query_terms impl for TermSetQuery. [#2120](https://github.com/quickwit-oss/tantivy/issues/2120)(@adamreichold)
+- Minor improvements to OwnedBytes [#2134](https://github.com/quickwit-oss/tantivy/issues/2134)(@adamreichold)
+- Remove allocations in split compound words [#2080](https://github.com/quickwit-oss/tantivy/issues/2080)(@PSeitz)
+- Ngram tokenizer now returns an error with invalid arguments [#2102](https://github.com/quickwit-oss/tantivy/issues/2102)(@fmassot)
+- Make TextAnalyzerBuilder public [#2097](https://github.com/quickwit-oss/tantivy/issues/2097)(@adamreichold)
+- Return an error when tokenizer is not found while indexing [#2093](https://github.com/quickwit-oss/tantivy/issues/2093)(@naveenann)
+- Delayed column opening during merge [#2132](https://github.com/quickwit-oss/tantivy/issues/2132)(@PSeitz)
+
+Tantivy 0.20.2
+================================
+- Align numerical type priority order on the search side.  [#2088](https://github.com/quickwit-oss/tantivy/issues/2088) (@fmassot)
+- Fix is_child_of function not considering the root facet. [#2086](https://github.com/quickwit-oss/tantivy/issues/2086) (@adamreichhold)
+
+Tantivy 0.20.1
+================================
+- Fix building on windows with mmap [#2070](https://github.com/quickwit-oss/tantivy/issues/2070) (@ChillFish8)
+
+Tantivy 0.20
+================================
+#### Bugfixes
+- Fix phrase queries with slop (slop supports now transpositions, algorithm that carries slop so far for num terms > 2) [#2031](https://github.com/quickwit-oss/tantivy/issues/2031)[#2020](https://github.com/quickwit-oss/tantivy/issues/2020)(@PSeitz)
+- Handle error for exists on MMapDirectory [#1988](https://github.com/quickwit-oss/tantivy/issues/1988) (@PSeitz)
+- Aggregation
+  - Fix min doc_count empty merge bug [#2057](https://github.com/quickwit-oss/tantivy/issues/2057) (@PSeitz)
+  - Fix: Sort order for term aggregations (sort order on key was inverted) [#1858](https://github.com/quickwit-oss/tantivy/issues/1858) (@PSeitz)
+
+#### Features/Improvements
+- Add PhrasePrefixQuery [#1842](https://github.com/quickwit-oss/tantivy/issues/1842) (@trinity-1686a)
+- Add `coerce` option for text and numbers types (convert the value instead of returning an error during indexing) [#1904](https://github.com/quickwit-oss/tantivy/issues/1904) (@PSeitz)
+- Add regex tokenizer [#1759](https://github.com/quickwit-oss/tantivy/issues/1759)(@mkleen)
+- Move tokenizer API to seperate crate. Having a seperate crate with a stable API will allow us to use tokenizers with different tantivy versions. [#1767](https://github.com/quickwit-oss/tantivy/issues/1767) (@PSeitz)
+- **Columnar crate**: New fast field handling (@fulmicoton @PSeitz) [#1806](https://github.com/quickwit-oss/tantivy/issues/1806)[#1809](https://github.com/quickwit-oss/tantivy/issues/1809)
+  - Support for fast fields with optional values. Previously tantivy supported only single-valued and multi-value fast fields. The encoding of optional fast fields is now very compact.
+  - Fast field Support for JSON (schemaless fast fields). Support multiple types on the same column. [#1876](https://github.com/quickwit-oss/tantivy/issues/1876) (@fulmicoton)
+  - Unified access for fast fields over different cardinalities.
+  - Unified storage for typed and untyped fields.
+  - Move fastfield codecs into columnar. [#1782](https://github.com/quickwit-oss/tantivy/issues/1782) (@fulmicoton)
+  - Sparse dense index for optional values [#1716](https://github.com/quickwit-oss/tantivy/issues/1716) (@PSeitz)
+  - Switch to nanosecond precision in DateTime fastfield [#2016](https://github.com/quickwit-oss/tantivy/issues/2016) (@PSeitz)
+- **Aggregation**
+  - Add `date_histogram` aggregation (only `fixed_interval` for now) [#1900](https://github.com/quickwit-oss/tantivy/issues/1900) (@PSeitz)
+  - Add `percentiles` aggregations [#1984](https://github.com/quickwit-oss/tantivy/issues/1984) (@PSeitz)
+  - [**breaking**] Drop JSON support on intermediate agg result (we use postcard as format in `quickwit` to send intermediate results) [#1992](https://github.com/quickwit-oss/tantivy/issues/1992) (@PSeitz)
+  - Set memory limit in bytes for aggregations after which they abort (Previously there was only the bucket limit) [#1942](https://github.com/quickwit-oss/tantivy/issues/1942)[#1957](https://github.com/quickwit-oss/tantivy/issues/1957)(@PSeitz)
+  - Add support for u64,i64,f64 fields in term aggregation [#1883](https://github.com/quickwit-oss/tantivy/issues/1883) (@PSeitz)
+  - Allow histogram bounds to be passed as Rfc3339 [#2076](https://github.com/quickwit-oss/tantivy/issues/2076) (@PSeitz)
+  - Add count, min, max, and sum aggregations [#1794](https://github.com/quickwit-oss/tantivy/issues/1794) (@guilload)
+  - Switch to Aggregation without serde_untagged => better deserialization errors. [#2003](https://github.com/quickwit-oss/tantivy/issues/2003) (@PSeitz)
+  - Switch to ms in histogram for date type (ES compatibility) [#2045](https://github.com/quickwit-oss/tantivy/issues/2045) (@PSeitz)
+  - Reduce term aggregation memory consumption [#2013](https://github.com/quickwit-oss/tantivy/issues/2013) (@PSeitz)
+  - Reduce agg memory consumption: Replace generic aggregation collector (which has a high memory requirement per instance) in aggregation tree with optimized versions behind a trait.
+  - Split term collection count and sub_agg (Faster term agg with less memory consumption for cases without sub-aggs) [#1921](https://github.com/quickwit-oss/tantivy/issues/1921) (@PSeitz)
+  - Schemaless aggregations: In combination with stacker tantivy supports now schemaless aggregations via the JSON type.
+    - Add aggregation support for JSON type [#1888](https://github.com/quickwit-oss/tantivy/issues/1888) (@PSeitz)
+    - Mixed types support on JSON fields in aggs [#1971](https://github.com/quickwit-oss/tantivy/issues/1971) (@PSeitz)
+  - Perf: Fetch blocks of vals in aggregation for all cardinality [#1950](https://github.com/quickwit-oss/tantivy/issues/1950) (@PSeitz)
+  - Allow histogram bounds to be passed as Rfc3339 [#2076](https://github.com/quickwit-oss/tantivy/issues/2076) (@PSeitz)
+- `Searcher` with disabled scoring via `EnableScoring::Disabled` [#1780](https://github.com/quickwit-oss/tantivy/issues/1780) (@shikhar)
+- Enable tokenizer on json fields [#2053](https://github.com/quickwit-oss/tantivy/issues/2053) (@PSeitz)
+- Enforcing "NOT" and "-" queries consistency in UserInputAst [#1609](https://github.com/quickwit-oss/tantivy/issues/1609) (@bazhenov)
+- Faster indexing
+  - Refactor tokenization pipeline to use GATs [#1924](https://github.com/quickwit-oss/tantivy/issues/1924) (@trinity-1686a)
+  - Faster term hash map [#2058](https://github.com/quickwit-oss/tantivy/issues/2058)[#1940](https://github.com/quickwit-oss/tantivy/issues/1940) (@PSeitz)
+  - tokenizer-api: reduce Tokenizer allocation overhead [#2062](https://github.com/quickwit-oss/tantivy/issues/2062) (@PSeitz)
+  - Refactor vint [#2010](https://github.com/quickwit-oss/tantivy/issues/2010) (@PSeitz)
+- Faster search
+  - Work in batches of docs on the SegmentCollector (Only for cases without score for now) [#1937](https://github.com/quickwit-oss/tantivy/issues/1937) (@PSeitz)
+  - Faster fast field range queries using SIMD [#1954](https://github.com/quickwit-oss/tantivy/issues/1954) (@fulmicoton)
+  - Improve fast field range query performance [#1864](https://github.com/quickwit-oss/tantivy/issues/1864) (@PSeitz)
+- Make BM25 scoring more flexible [#1855](https://github.com/quickwit-oss/tantivy/issues/1855) (@alexcole)
+- Switch fs2 to fs4 as it is now unmaintained and does not support illumos [#1944](https://github.com/quickwit-oss/tantivy/issues/1944) (@Toasterson)
+- Made BooleanWeight and BoostWeight public [#1991](https://github.com/quickwit-oss/tantivy/issues/1991) (@fulmicoton)
+- Make index compatible with virtual drives on Windows [#1843](https://github.com/quickwit-oss/tantivy/issues/1843) (@gyk)
+- Add stop words for Hungarian language [#2069](https://github.com/quickwit-oss/tantivy/issues/2069) (@tnxbutno)
+- Auto downgrade index record option, instead of vint error [#1857](https://github.com/quickwit-oss/tantivy/issues/1857) (@PSeitz)
+- Enable range query on fast field for u64 compatible types [#1762](https://github.com/quickwit-oss/tantivy/issues/1762) (@PSeitz) [#1876]
+- sstable
+  - Isolating sstable and stacker in independant crates. [#1718](https://github.com/quickwit-oss/tantivy/issues/1718) (@fulmicoton)
+  - New sstable format [#1943](https://github.com/quickwit-oss/tantivy/issues/1943)[#1953](https://github.com/quickwit-oss/tantivy/issues/1953) (@trinity-1686a)
+  - Use DeltaReader directly to implement Dictionnary::ord_to_term [#1928](https://github.com/quickwit-oss/tantivy/issues/1928) (@trinity-1686a)
+  - Use DeltaReader directly to implement Dictionnary::term_ord [#1925](https://github.com/quickwit-oss/tantivy/issues/1925) (@trinity-1686a)
+- Add seperate tokenizer manager for fast fields [#2019](https://github.com/quickwit-oss/tantivy/issues/2019) (@PSeitz)
+- Make construction of LevenshteinAutomatonBuilder for FuzzyTermQuery instances lazy. [#1756](https://github.com/quickwit-oss/tantivy/issues/1756) (@adamreichold)
+- Added support for madvise when opening an mmaped Index [#2036](https://github.com/quickwit-oss/tantivy/issues/2036) (@fulmicoton)
+- Rename `DatePrecision` to `DateTimePrecision` [#2051](https://github.com/quickwit-oss/tantivy/issues/2051) (@guilload)
+- Query Parser
+  - Quotation mark can now be used for phrase queries. [#2050](https://github.com/quickwit-oss/tantivy/issues/2050) (@fulmicoton)
+  - PhrasePrefixQuery is supported in the query parser via: `field:"phrase ter"*` [#2044](https://github.com/quickwit-oss/tantivy/issues/2044) (@adamreichold)
+- Docs
+  - Update examples for literate docs [#1880](https://github.com/quickwit-oss/tantivy/issues/1880) (@PSeitz)
+  - Add ip field example [#1775](https://github.com/quickwit-oss/tantivy/issues/1775) (@PSeitz)
+  - Fix doc store cache documentation [#1821](https://github.com/quickwit-oss/tantivy/issues/1821) (@PSeitz)
+  - Fix BooleanQuery document [#1999](https://github.com/quickwit-oss/tantivy/issues/1999) (@RT_Enzyme)
+  - Update comments in the faceted search example [#1737](https://github.com/quickwit-oss/tantivy/issues/1737) (@DawChihLiou)
+
+
 Tantivy 0.19
 ================================
 #### Bugfixes
--- a/Cargo.toml
+++ b/Cargo.toml
@@ -1,6 +1,6 @@
 [package]
 name = "tantivy"
-version = "0.19.0"
+version = "0.21.1"
 authors = ["Paul Masurel <paul.masurel@gmail.com>"]
 license = "MIT"
 categories = ["database-implementations", "data-structures"]
@@ -12,6 +12,7 @@ readme = "README.md"
 keywords = ["search", "information", "retrieval"]
 edition = "2021"
 rust-version = "1.62"
+exclude = ["benches/*.json", "benches/*.txt"]

 [dependencies]
 oneshot = "0.1.5"
@@ -22,11 +23,9 @@ once_cell = "1.10.0"
 regex = { version = "1.5.5", default-features = false, features = ["std", "unicode"] }
 aho-corasick = "1.0"
 tantivy-fst = "0.4.0"
-memmap2 = { version = "0.6.0", optional = true }
-lz4_flex = { version = "0.10", default-features = false, features = ["checked-decode"], optional = true }
-brotli = { version = "3.3.4", optional = true }
+memmap2 = { version = "0.7.1", optional = true }
+lz4_flex = { version = "0.11", default-features = false, optional = true }
 zstd = { version = "0.12", optional = true, default-features = false }
-snap = { version = "1.0.5", optional = true }
 tempfile = { version = "3.3.0", optional = true }
 log = "0.4.16"
 serde = { version = "1.0.136", features = ["derive"] }
@@ -43,25 +42,25 @@ census = "0.4.0"
 rustc-hash = "1.1.0"
 thiserror = "1.0.30"
 htmlescape = "0.3.1"
-fail = "0.5.0"
+fail = { version = "0.5.0", optional = true }
 murmurhash32 = "0.3.0"
 time = { version = "0.3.10", features = ["serde-well-known"] }
 smallvec = "1.8.0"
 rayon = "1.5.2"
-lru = "0.10.0"
+lru = "0.11.0"
 fastdivide = "0.4.0"
-itertools = "0.10.3"
+itertools = "0.11.0"
 measure_time = "0.8.2"
 async-trait = "0.1.53"
 arc-swap = "1.5.0"

-columnar = { version="0.1", path="./columnar", package ="tantivy-columnar" }
-sstable = { version="0.1", path="./sstable", package ="tantivy-sstable", optional = true }
-stacker = { version="0.1", path="./stacker", package ="tantivy-stacker" }
-query-grammar = { version= "0.19.0", path="./query-grammar", package = "tantivy-query-grammar" }
-tantivy-bitpacker = { version= "0.3", path="./bitpacker" }
-common = { version= "0.5", path = "./common/", package = "tantivy-common" }
-tokenizer-api = { version="0.1", path="./tokenizer-api", package="tantivy-tokenizer-api" }
+columnar = { version= "0.2", path="./columnar", package ="tantivy-columnar" }
+sstable = { version= "0.2", path="./sstable", package ="tantivy-sstable", optional = true }
+stacker = { version= "0.2", path="./stacker", package ="tantivy-stacker" }
+query-grammar = { version= "0.21.0", path="./query-grammar", package = "tantivy-query-grammar" }
+tantivy-bitpacker = { version= "0.5", path="./bitpacker" }
+common = { version= "0.6", path = "./common/", package = "tantivy-common" }
+tokenizer-api = { version= "0.2", path="./tokenizer-api", package="tantivy-tokenizer-api" }
 sketches-ddsketch = { version = "0.2.1", features = ["use_serde"] }
 futures-util = { version = "0.3.28", optional = true }

@@ -74,15 +73,17 @@ maplit = "1.0.2"
 matches = "0.1.9"
 pretty_assertions = "1.2.1"
 proptest = "1.0.0"
-criterion = "0.5"
 test-log = "0.2.10"
 env_logger = "0.10.0"
-pprof = { version = "0.11.0", features = ["flamegraph", "criterion"] }
 futures = "0.3.21"
 paste = "1.0.11"
 more-asserts = "0.3.1"
 rand_distr = "0.4.3"

+[target.'cfg(not(windows))'.dev-dependencies]
+criterion = "0.5"
+pprof = { git = "https://github.com/PSeitz/pprof-rs/", rev = "53af24b", features = ["flamegraph", "criterion"] } # temp fork that works with criterion 0.5
+
 [dev-dependencies.fail]
 version = "0.5.0"
 features = ["failpoints"]
@@ -106,12 +107,10 @@ default = ["mmap", "stopwords", "lz4-compression"]
 mmap = ["fs4", "tempfile", "memmap2"]
 stopwords = []

-brotli-compression = ["brotli"]
 lz4-compression = ["lz4_flex"]
-snappy-compression = ["snap"]
 zstd-compression = ["zstd"]

-failpoints = ["fail/failpoints"]
+failpoints = ["fail", "fail/failpoints"]
 unstable = [] # useful for benches.

 quickwit = ["sstable", "futures-util"]
@@ -129,7 +128,7 @@ members = ["query-grammar", "bitpacker", "common", "ownedbytes", "stacker", "sst
 [[test]]
 name = "failpoints"
 path = "tests/failpoints/mod.rs"
-required-features = ["fail/failpoints"]
+required-features = ["failpoints"]

 [[bench]]
 name = "analyzer"
--- a/README.md
+++ b/README.md
@@ -44,7 +44,7 @@ Details about the benchmark can be found at this [repository](https://github.com
 - Single valued and multivalued u64, i64, and f64 fast fields (equivalent of doc values in Lucene)
 - `&[u8]` fast fields
 - Text, i64, u64, f64, dates, ip, bool, and hierarchical facet fields
- Compressed document store (LZ4, Zstd, None, Brotli, Snap)
+- Compressed document store (LZ4, Zstd, None)
 - Range queries
 - Faceted search
 - Configurable indexing (optional term frequency and position indexing)
--- a/RELEASE.md
+++ b/RELEASE.md
@@ -0,0 +1,21 @@
+# Release a new Tantivy Version
+
+## Steps
+
+1. Identify new packages in workspace since last release
+2. Identify changed packages in workspace since last release
+3. Bump version in `Cargo.toml` and their dependents for all changed packages
+4. Update version of root `Cargo.toml`
+5. Publish version starting with leaf nodes
+6. Set git tag with new version
+
+
+In conjucation with `cargo-release` Steps 1-4 (I'm not sure if the change detection works):
+Set new packages to version 0.0.0
+
+Replace prev-tag-name
+```bash
+cargo release --workspace --no-publish -v --prev-tag-name 0.19 --push-remote origin minor --no-tag --execute
+```
+
+no-tag or it will create tags for all the subpackages
--- a/appveyor.yml
+++ b/appveyor.yml
@@ -1,23 +0,0 @@
-# Appveyor configuration template for Rust using rustup for Rust installation
-# https://github.com/starkat99/appveyor-rust
-
-os: Visual Studio 2015
-environment:
-  matrix:
-    - channel: stable
-      target: x86_64-pc-windows-msvc
-
-install:
-  - appveyor DownloadFile https://win.rustup.rs/ -FileName rustup-init.exe
-  - rustup-init -yv --default-toolchain %channel% --default-host %target%
-  - set PATH=%PATH%;%USERPROFILE%\.cargo\bin
-  - if defined msys_bits set PATH=%PATH%;C:\msys64\mingw%msys_bits%\bin
-  - rustc -vV
-  - cargo -vV
-
-build: false
-
-test_script:
-  - REM SET RUST_LOG=tantivy,test & cargo test --all --verbose --no-default-features --features lz4-compression --features mmap
-  - REM SET RUST_LOG=tantivy,test & cargo test test_store --verbose --no-default-features --features lz4-compression --features snappy-compression --features brotli-compression --features mmap
-  - REM SET RUST_BACKTRACE=1 & cargo build --examples
--- a/benches/analyzer.rs
+++ b/benches/analyzer.rs
@@ -1,11 +1,13 @@
 use criterion::{criterion_group, criterion_main, Criterion};
-use tantivy::tokenizer::TokenizerManager;
+use tantivy::tokenizer::{
+    LowerCaser, RemoveLongFilter, SimpleTokenizer, TextAnalyzer, TokenizerManager,
+};

 const ALICE_TXT: &str = include_str!("alice.txt");

 pub fn criterion_benchmark(c: &mut Criterion) {
    let tokenizer_manager = TokenizerManager::default();
-    let tokenizer = tokenizer_manager.get("default").unwrap();
+    let mut tokenizer = tokenizer_manager.get("default").unwrap();
    c.bench_function("default-tokenize-alice", |b| {
        b.iter(|| {
            let mut word_count = 0;
@@ -16,7 +18,26 @@ pub fn criterion_benchmark(c: &mut Criterion) {
            assert_eq!(word_count, 30_731);
        })
    });
+    let mut dynamic_analyzer = TextAnalyzer::builder(SimpleTokenizer::default())
+        .dynamic()
+        .filter_dynamic(RemoveLongFilter::limit(40))
+        .filter_dynamic(LowerCaser)
+        .build();
+    c.bench_function("dynamic-tokenize-alice", |b| {
+        b.iter(|| {
+            let mut word_count = 0;
+            let mut token_stream = dynamic_analyzer.token_stream(ALICE_TXT);
+            while token_stream.advance() {
+                word_count += 1;
+            }
+            assert_eq!(word_count, 30_731);
+        })
+    });
 }

-criterion_group!(benches, criterion_benchmark);
+criterion_group! {
+    name = benches;
+    config = Criterion::default().sample_size(200);
+    targets = criterion_benchmark
+}
 criterion_main!(benches);
--- a/bitpacker/Cargo.toml
+++ b/bitpacker/Cargo.toml
@@ -1,6 +1,6 @@
 [package]
 name = "tantivy-bitpacker"
-version = "0.3.0"
+version = "0.5.0"
 edition = "2021"
 authors = ["Paul Masurel <paul.masurel@gmail.com>"]
 license = "MIT"
--- a/bitpacker/src/blocked_bitpacker.rs
+++ b/bitpacker/src/blocked_bitpacker.rs
@@ -64,10 +64,8 @@ fn mem_usage<T>(items: &Vec<T>) -> usize {

 impl BlockedBitpacker {
    pub fn new() -> Self {
-        let mut compressed_blocks = vec![];
-        compressed_blocks.resize(8, 0);
        Self {
-            compressed_blocks,
+            compressed_blocks: vec![0; 8],
            buffer: vec![],
            offset_and_bits: vec![],
        }
--- a/bitpacker/src/filter_vec/mod.rs
+++ b/bitpacker/src/filter_vec/mod.rs
@@ -1,6 +1,6 @@
 use std::ops::RangeInclusive;

-#[cfg(any(target_arch = "x86_64"))]
+#[cfg(target_arch = "x86_64")]
 mod avx2;

 mod scalar;
--- a/cliff.toml
+++ b/cliff.toml
@@ -0,0 +1,90 @@
+# configuration file for git-cliff{ pattern = "foo", replace = "bar"}
+# see https://github.com/orhun/git-cliff#configuration-file
+
+[changelog]
+# changelog header
+header = """
+"""
+# template for the changelog body
+# https://tera.netlify.app/docs/#introduction
+body = """
+{% if version %}\
+    {{ version | trim_start_matches(pat="v") }} ({{ timestamp | date(format="%Y-%m-%d") }})
+    ==================
+{% else %}\
+    ## [unreleased]
+{% endif %}\
+{% for commit in commits %}
+    - {% if commit.breaking %}[**breaking**] {% endif %}{{ commit.message | split(pat="\n") | first | trim | upper_first }}(@{{ commit.author.name }})\
+{% endfor %}
+"""
+# remove the leading and trailing whitespace from the template
+trim = true
+# changelog footer
+footer = """
+"""
+
+postprocessors = [
+    { pattern = 'Paul Masurel', replace = "fulmicoton"}, # replace with github user
+    { pattern = 'PSeitz', replace = "PSeitz"}, # replace with github user
+    { pattern = 'Adam Reichold', replace = "adamreichold"}, # replace with github user
+    { pattern = 'trinity-1686a', replace = "trinity-1686a"}, # replace with github user
+    { pattern = 'Michael Kleen', replace = "mkleen"}, # replace with github user
+    { pattern = 'Adrien Guillo', replace = "guilload"}, # replace with github user
+    { pattern = 'François Massot', replace = "fmassot"}, # replace with github user
+    { pattern = 'Naveen Aiathurai', replace = "naveenann"}, # replace with github user
+    { pattern = '', replace = ""}, # replace with github user
+]
+
+[git]
+# parse the commits based on https://www.conventionalcommits.org
+# This is required or commit.message contains the whole commit message and not just the title
+conventional_commits = true
+# filter out the commits that are not conventional
+filter_unconventional = false
+# process each line of a commit as an individual commit
+split_commits = false
+# regex for preprocessing the commit messages
+commit_preprocessors = [
+    { pattern = '\((\w+\s)?#([0-9]+)\)', replace = "[#${2}](https://github.com/quickwit-oss/tantivy/issues/${2})"}, # replace issue numbers
+]
+#link_parsers = [
+    #{ pattern = "#(\\d+)", href = "https://github.com/quickwit-oss/tantivy/pulls/$1"},
+#]
+# regex for parsing and grouping commits
+commit_parsers = [
+    { message = "^feat", group = "Features"},
+    { message = "^fix", group = "Bug Fixes"},
+    { message = "^doc", group = "Documentation"},
+    { message = "^perf", group = "Performance"},
+    { message = "^refactor", group = "Refactor"},
+    { message = "^style", group = "Styling"},
+    { message = "^test", group = "Testing"},
+    { message = "^chore\\(release\\): prepare for", skip = true},
+    { message = "(?i)clippy", skip = true},
+    { message = "(?i)dependabot", skip = true},
+    { message = "(?i)fmt", skip = true},
+    { message = "(?i)bump", skip = true},
+    { message = "(?i)readme", skip = true},
+    { message = "(?i)comment", skip = true},
+    { message = "(?i)spelling", skip = true},
+    { message = "^chore", group = "Miscellaneous Tasks"},
+    { body = ".*security", group = "Security"},
+    { message = ".*", group = "Other", default_scope = "other"},
+]
+# protect breaking changes from being skipped due to matching a skipping commit_parser
+protect_breaking_commits = false
+# filter out the commits that are not matched by commit parsers
+filter_commits = false
+# glob pattern for matching git tags
+tag_pattern = "v[0-9]*"
+# regex for skipping tags
+skip_tags = "v0.1.0-beta.1"
+# regex for ignoring tags
+ignore_tags = ""
+# sort the tags topologically
+topo_order = false
+# sort the commits inside sections by oldest/newest order
+sort_commits = "newest"
+# limit the number of commits included in the changelog.
+# limit_commits = 42
--- a/columnar/Cargo.toml
+++ b/columnar/Cargo.toml
@@ -1,18 +1,22 @@
 [package]
 name = "tantivy-columnar"
-version = "0.1.0"
+version = "0.2.0"
 edition = "2021"
 license = "MIT"
+homepage = "https://github.com/quickwit-oss/tantivy"
+repository = "https://github.com/quickwit-oss/tantivy"
+description = "column oriented storage for tantivy"
+categories = ["database-implementations", "data-structures", "compression"]

 [dependencies]
-itertools = "0.10.5"
+itertools = "0.11.0"
 fnv = "1.0.7"
 fastdivide = "0.4.0"

-stacker = { path = "../stacker", package="tantivy-stacker"}
-sstable = { path = "../sstable", package = "tantivy-sstable" }
-common = { path = "../common", package = "tantivy-common" }
-tantivy-bitpacker = { version= "0.3", path = "../bitpacker/" }
+stacker = { version= "0.2", path = "../stacker", package="tantivy-stacker"}
+sstable = { version= "0.2", path = "../sstable", package = "tantivy-sstable" }
+common = { version= "0.6", path = "../common", package = "tantivy-common" }
+tantivy-bitpacker = { version= "0.5", path = "../bitpacker/" }
 serde = "1.0.152"

 [dev-dependencies]
--- a/columnar/src/block_accessor.rs
+++ b/columnar/src/block_accessor.rs
@@ -1,9 +1,12 @@
+use std::cmp::Ordering;
+
 use crate::{Column, DocId, RowId};

 #[derive(Debug, Default, Clone)]
 pub struct ColumnBlockAccessor<T> {
    val_cache: Vec<T>,
    docid_cache: Vec<DocId>,
+    missing_docids_cache: Vec<DocId>,
    row_id_cache: Vec<RowId>,
 }

@@ -20,6 +23,20 @@ impl<T: PartialOrd + Copy + std::fmt::Debug + Send + Sync + 'static + Default>
            .values
            .get_vals(&self.row_id_cache, &mut self.val_cache);
    }
+    #[inline]
+    pub fn fetch_block_with_missing(&mut self, docs: &[u32], accessor: &Column<T>, missing: T) {
+        self.fetch_block(docs, accessor);
+        // We can compare docid_cache with docs to find missing docs
+        if docs.len() != self.docid_cache.len() || accessor.index.is_multivalue() {
+            self.missing_docids_cache.clear();
+            find_missing_docs(docs, &self.docid_cache, |doc| {
+                self.missing_docids_cache.push(doc);
+                self.val_cache.push(missing);
+            });
+            self.docid_cache
+                .extend_from_slice(&self.missing_docids_cache);
+        }
+    }

    #[inline]
    pub fn iter_vals(&self) -> impl Iterator<Item = T> + '_ {
@@ -34,3 +51,82 @@ impl<T: PartialOrd + Copy + std::fmt::Debug + Send + Sync + 'static + Default>
            .zip(self.val_cache.iter().cloned())
    }
 }
+
+/// Given two sorted lists of docids `docs` and `hits`, hits is a subset of `docs`.
+/// Return all docs that are not in `hits`.
+fn find_missing_docs<F>(docs: &[u32], hits: &[u32], mut callback: F)
+where F: FnMut(u32) {
+    let mut docs_iter = docs.iter();
+    let mut hits_iter = hits.iter();
+
+    let mut doc = docs_iter.next();
+    let mut hit = hits_iter.next();
+
+    while let (Some(&current_doc), Some(&current_hit)) = (doc, hit) {
+        match current_doc.cmp(&current_hit) {
+            Ordering::Less => {
+                callback(current_doc);
+                doc = docs_iter.next();
+            }
+            Ordering::Equal => {
+                doc = docs_iter.next();
+                hit = hits_iter.next();
+            }
+            Ordering::Greater => {
+                hit = hits_iter.next();
+            }
+        }
+    }
+
+    while let Some(&current_doc) = doc {
+        callback(current_doc);
+        doc = docs_iter.next();
+    }
+}
+
+#[cfg(test)]
+mod tests {
+    use super::*;
+
+    #[test]
+    fn test_find_missing_docs() {
+        let docs: Vec<u32> = vec![1, 2, 3, 4, 5, 6, 7, 8, 9, 10];
+        let hits: Vec<u32> = vec![2, 4, 6, 8, 10];
+
+        let mut missing_docs: Vec<u32> = Vec::new();
+
+        find_missing_docs(&docs, &hits, |missing_doc| {
+            missing_docs.push(missing_doc);
+        });
+
+        assert_eq!(missing_docs, vec![1, 3, 5, 7, 9]);
+    }
+
+    #[test]
+    fn test_find_missing_docs_empty() {
+        let docs: Vec<u32> = Vec::new();
+        let hits: Vec<u32> = vec![2, 4, 6, 8, 10];
+
+        let mut missing_docs: Vec<u32> = Vec::new();
+
+        find_missing_docs(&docs, &hits, |missing_doc| {
+            missing_docs.push(missing_doc);
+        });
+
+        assert_eq!(missing_docs, vec![]);
+    }
+
+    #[test]
+    fn test_find_missing_docs_all_missing() {
+        let docs: Vec<u32> = vec![1, 2, 3, 4, 5];
+        let hits: Vec<u32> = Vec::new();
+
+        let mut missing_docs: Vec<u32> = Vec::new();
+
+        find_missing_docs(&docs, &hits, |missing_doc| {
+            missing_docs.push(missing_doc);
+        });
+
+        assert_eq!(missing_docs, vec![1, 2, 3, 4, 5]);
+    }
+}
--- a/columnar/src/column/dictionary_encoded.rs
+++ b/columnar/src/column/dictionary_encoded.rs
@@ -30,6 +30,13 @@ impl fmt::Debug for BytesColumn {
 }

 impl BytesColumn {
+    pub fn empty(num_docs: u32) -> BytesColumn {
+        BytesColumn {
+            dictionary: Arc::new(Dictionary::empty()),
+            term_ord_column: Column::build_empty_column(num_docs),
+        }
+    }
+
    /// Fills the given `output` buffer with the term associated to the ordinal `ord`.
    ///
    /// Returns `false` if the term does not exist (e.g. `term_ord` is greater or equal to the
@@ -77,7 +84,7 @@ impl From<StrColumn> for BytesColumn {
 }

 impl StrColumn {
-    pub(crate) fn wrap(bytes_column: BytesColumn) -> StrColumn {
+    pub fn wrap(bytes_column: BytesColumn) -> StrColumn {
        StrColumn(bytes_column)
    }

--- a/columnar/src/column/mod.rs
+++ b/columnar/src/column/mod.rs
@@ -130,7 +130,7 @@ impl<T: PartialOrd + Copy + Debug + Send + Sync + 'static> Column<T> {
            .select_batch_in_place(selected_docid_range.start, doc_ids);
    }

-    /// Fils the output vector with the (possibly multiple values that are associated_with
+    /// Fills the output vector with the (possibly multiple values that are associated_with
    /// `row_id`.
    ///
    /// This method clears the `output` vector.
--- a/columnar/src/column_index/merge/mod.rs
+++ b/columnar/src/column_index/merge/mod.rs
@@ -168,8 +168,9 @@ mod tests {
        )
        .into();
        let merged_column_index = merge_column_index(&column_indexes[..], &merge_row_order);
-        let SerializableColumnIndex::Multivalued(start_index_iterable) = merged_column_index
-        else { panic!("Excpected a multivalued index") };
+        let SerializableColumnIndex::Multivalued(start_index_iterable) = merged_column_index else {
+            panic!("Excpected a multivalued index")
+        };
        let start_indexes: Vec<RowId> = start_index_iterable.boxed_iter().collect();
        assert_eq!(&start_indexes, &[0, 3, 5]);
    }
@@ -200,8 +201,9 @@ mod tests {
        )
        .into();
        let merged_column_index = merge_column_index(&column_indexes[..], &merge_row_order);
-        let SerializableColumnIndex::Multivalued(start_index_iterable) = merged_column_index
-        else { panic!("Excpected a multivalued index") };
+        let SerializableColumnIndex::Multivalued(start_index_iterable) = merged_column_index else {
+            panic!("Excpected a multivalued index")
+        };
        let start_indexes: Vec<RowId> = start_index_iterable.boxed_iter().collect();
        assert_eq!(&start_indexes, &[0, 3, 5, 6]);
    }
--- a/columnar/src/column_index/merge/shuffled.rs
+++ b/columnar/src/column_index/merge/shuffled.rs
@@ -157,7 +157,13 @@ mod tests {
            Cardinality::Optional,
            &shuffle_merge_order,
        );
-        let SerializableColumnIndex::Optional { non_null_row_ids, num_rows } = serializable_index else { panic!() };
+        let SerializableColumnIndex::Optional {
+            non_null_row_ids,
+            num_rows,
+        } = serializable_index
+        else {
+            panic!()
+        };
        assert_eq!(num_rows, 2);
        let non_null_rows: Vec<RowId> = non_null_row_ids.boxed_iter().collect();
        assert_eq!(&non_null_rows, &[1]);
--- a/columnar/src/column_index/mod.rs
+++ b/columnar/src/column_index/mod.rs
@@ -37,6 +37,10 @@ impl From<MultiValueIndex> for ColumnIndex {
 }

 impl ColumnIndex {
+    #[inline]
+    pub fn is_multivalue(&self) -> bool {
+        matches!(self, ColumnIndex::Multivalued(_))
+    }
    // Returns the cardinality of the column index.
    //
    // By convention, if the column contains no docs, we consider that it is
--- a/columnar/src/column_values/mod.rs
+++ b/columnar/src/column_values/mod.rs
@@ -2,7 +2,7 @@

 //! # `fastfield_codecs`
 //!
-//! - Columnar storage of data for tantivy [`Column`].
+//! - Columnar storage of data for tantivy [`crate::Column`].
 //! - Encode data in different codecs.
 //! - Monotonically map values to u64/u128

--- a/columnar/src/column_values/u128_based/compact_space/blank_range.rs
+++ b/columnar/src/column_values/u128_based/compact_space/blank_range.rs
@@ -38,6 +38,6 @@ impl Ord for BlankRange {
 }
 impl PartialOrd for BlankRange {
    fn partial_cmp(&self, other: &Self) -> Option<std::cmp::Ordering> {
-        Some(self.blank_size().cmp(&other.blank_size()))
+        Some(self.cmp(other))
    }
 }
--- a/columnar/src/column_values/u64_based/bitpacked.rs
+++ b/columnar/src/column_values/u64_based/bitpacked.rs
@@ -83,7 +83,8 @@ impl ColumnValues for BitpackedReader {
        doc_id_range: Range<u32>,
        positions: &mut Vec<u32>,
    ) {
-        let Some(transformed_range) = transform_range_before_linear_transformation(&self.stats, range)
+        let Some(transformed_range) =
+            transform_range_before_linear_transformation(&self.stats, range)
        else {
            positions.clear();
            return;
--- a/columnar/src/columnar/merge/merge_mapping.rs
+++ b/columnar/src/columnar/merge/merge_mapping.rs
@@ -52,8 +52,8 @@ pub enum MergeRowOrder {
    /// Columnar tables are simply stacked one above the other.
    /// If the i-th columnar_readers has n_rows_i rows, then
    /// in the resulting columnar,
-    /// rows [r0..n_row_0) contains the row of columnar_readers[0], in ordder
-    /// rows [n_row_0..n_row_0 + n_row_1 contains the row of columnar_readers[1], in order.
+    /// rows [r0..n_row_0) contains the row of `columnar_readers[0]`, in ordder
+    /// rows [n_row_0..n_row_0 + n_row_1 contains the row of `columnar_readers[1]`, in order.
    /// ..
    /// No documents is deleted.
    Stack(StackMergeOrder),
--- a/columnar/src/columnar/merge/mod.rs
+++ b/columnar/src/columnar/merge/mod.rs
@@ -2,7 +2,7 @@ mod merge_dict_column;
 mod merge_mapping;
 mod term_merger;

-use std::collections::{BTreeMap, HashMap, HashSet};
+use std::collections::{BTreeMap, HashSet};
 use std::io;
 use std::net::Ipv6Addr;
 use std::sync::Arc;
@@ -18,7 +18,8 @@ use crate::columnar::writer::CompatibleNumericalTypes;
 use crate::columnar::ColumnarReader;
 use crate::dynamic_column::DynamicColumn;
 use crate::{
-    BytesColumn, Column, ColumnIndex, ColumnType, ColumnValues, NumericalType, NumericalValue,
+    BytesColumn, Column, ColumnIndex, ColumnType, ColumnValues, DynamicColumnHandle, NumericalType,
+    NumericalValue,
 };

 /// Column types are grouped into different categories.
@@ -28,14 +29,16 @@ use crate::{
 /// In practise, today, only Numerical colummns are coerced into one type today.
 ///
 /// See also [README.md].
-#[derive(Copy, Clone, Eq, PartialEq, Hash, Debug)]
+///
+/// The ordering has to match the ordering of the variants in [ColumnType].
+#[derive(Copy, Clone, Eq, PartialOrd, Ord, PartialEq, Hash, Debug)]
 pub(crate) enum ColumnTypeCategory {
-    Bool,
-    Str,
    Numerical,
-    DateTime,
    Bytes,
+    Str,
+    Bool,
    IpAddr,
+    DateTime,
 }

 impl From<ColumnType> for ColumnTypeCategory {
@@ -83,9 +86,20 @@ pub fn merge_columnar(
        .iter()
        .map(|reader| reader.num_rows())
        .collect::<Vec<u32>>();
+
    let columns_to_merge =
        group_columns_for_merge(columnar_readers, required_columns, &merge_row_order)?;
-    for ((column_name, column_type), columns) in columns_to_merge {
+    for res in columns_to_merge {
+        let ((column_name, _column_type_category), grouped_columns) = res;
+        let grouped_columns = grouped_columns.open(&merge_row_order)?;
+        if grouped_columns.is_empty() {
+            continue;
+        }
+
+        let column_type = grouped_columns.column_type_after_merge();
+        let mut columns = grouped_columns.columns;
+        coerce_columns(column_type, &mut columns)?;
+
        let mut column_serializer =
            serializer.start_serialize_column(column_name.as_bytes(), column_type);
        merge_column(
@@ -97,6 +111,7 @@ pub fn merge_columnar(
        )?;
        column_serializer.finalize()?;
    }
+
    serializer.finalize(merge_row_order.num_rows())?;
    Ok(())
 }
@@ -210,40 +225,12 @@ fn merge_column(
 struct GroupedColumns {
    required_column_type: Option<ColumnType>,
    columns: Vec<Option<DynamicColumn>>,
-    column_category: ColumnTypeCategory,
 }

 impl GroupedColumns {
-    fn for_category(column_category: ColumnTypeCategory, num_columnars: usize) -> Self {
-        GroupedColumns {
-            required_column_type: None,
-            columns: vec![None; num_columnars],
-            column_category,
-        }
-    }
-
-    /// Set the dynamic column for a given columnar.
-    fn set_column(&mut self, columnar_id: usize, column: DynamicColumn) {
-        self.columns[columnar_id] = Some(column);
-    }
-
-    /// Force the existence of a column, as well as its type.
-    fn require_type(&mut self, required_type: ColumnType) -> io::Result<()> {
-        if let Some(existing_required_type) = self.required_column_type {
-            if existing_required_type == required_type {
-                // This was just a duplicate in the `required_columns`.
-                // Nothing to do.
-                return Ok(());
-            } else {
-                return Err(io::Error::new(
-                    io::ErrorKind::InvalidInput,
-                    "Required column conflicts with another required column of the same type \
-                     category.",
-                ));
-            }
-        }
-        self.required_column_type = Some(required_type);
-        Ok(())
+    /// Check is column group can be skipped during serialization.
+    fn is_empty(&self) -> bool {
+        self.required_column_type.is_none() && self.columns.iter().all(Option::is_none)
    }

    /// Returns the column type after merge.
@@ -265,11 +252,76 @@ impl GroupedColumns {
        }
        // At the moment, only the numerical categorical column type has more than one possible
        // column type.
-        assert_eq!(self.column_category, ColumnTypeCategory::Numerical);
+        assert!(self
+            .columns
+            .iter()
+            .flatten()
+            .all(|el| ColumnTypeCategory::from(el.column_type()) == ColumnTypeCategory::Numerical));
        merged_numerical_columns_type(self.columns.iter().flatten()).into()
    }
 }

+struct GroupedColumnsHandle {
+    required_column_type: Option<ColumnType>,
+    columns: Vec<Option<DynamicColumnHandle>>,
+}
+
+impl GroupedColumnsHandle {
+    fn new(num_columnars: usize) -> Self {
+        GroupedColumnsHandle {
+            required_column_type: None,
+            columns: vec![None; num_columnars],
+        }
+    }
+    fn open(self, merge_row_order: &MergeRowOrder) -> io::Result<GroupedColumns> {
+        let mut columns: Vec<Option<DynamicColumn>> = Vec::new();
+        for (columnar_id, column) in self.columns.iter().enumerate() {
+            if let Some(column) = column {
+                let column = column.open()?;
+                // We skip columns that end up with 0 documents.
+                // That way, we make sure they don't end up influencing the merge type or
+                // creating empty columns.
+
+                if is_empty_after_merge(merge_row_order, &column, columnar_id) {
+                    columns.push(None);
+                } else {
+                    columns.push(Some(column));
+                }
+            } else {
+                columns.push(None);
+            }
+        }
+        Ok(GroupedColumns {
+            required_column_type: self.required_column_type,
+            columns,
+        })
+    }
+
+    /// Set the dynamic column for a given columnar.
+    fn set_column(&mut self, columnar_id: usize, column: DynamicColumnHandle) {
+        self.columns[columnar_id] = Some(column);
+    }
+
+    /// Force the existence of a column, as well as its type.
+    fn require_type(&mut self, required_type: ColumnType) -> io::Result<()> {
+        if let Some(existing_required_type) = self.required_column_type {
+            if existing_required_type == required_type {
+                // This was just a duplicate in the `required_columns`.
+                // Nothing to do.
+                return Ok(());
+            } else {
+                return Err(io::Error::new(
+                    io::ErrorKind::InvalidInput,
+                    "Required column conflicts with another required column of the same type \
+                     category.",
+                ));
+            }
+        }
+        self.required_column_type = Some(required_type);
+        Ok(())
+    }
+}
+
 /// Returns the type of the merged numerical column.
 ///
 /// This function picks the first numerical type out of i64, u64, f64 (order matters
@@ -293,7 +345,7 @@ fn merged_numerical_columns_type<'a>(
 fn is_empty_after_merge(
    merge_row_order: &MergeRowOrder,
    column: &DynamicColumn,
-    columnar_id: usize,
+    columnar_ord: usize,
 ) -> bool {
    if column.num_values() == 0u32 {
        // It was empty before the merge.
@@ -305,7 +357,7 @@ fn is_empty_after_merge(
            false
        }
        MergeRowOrder::Shuffled(shuffled) => {
-            if let Some(alive_bitset) = &shuffled.alive_bitsets[columnar_id] {
+            if let Some(alive_bitset) = &shuffled.alive_bitsets[columnar_ord] {
                let column_index = column.column_index();
                match column_index {
                    ColumnIndex::Empty { .. } => true,
@@ -348,56 +400,34 @@ fn is_empty_after_merge(
    }
 }

-#[allow(clippy::type_complexity)]
-fn group_columns_for_merge(
-    columnar_readers: &[&ColumnarReader],
-    required_columns: &[(String, ColumnType)],
-    merge_row_order: &MergeRowOrder,
-) -> io::Result<BTreeMap<(String, ColumnType), Vec<Option<DynamicColumn>>>> {
-    // Each column name may have multiple types of column associated.
-    // For merging we are interested in the same column type category since they can be merged.
-    let mut columns_grouped: HashMap<(String, ColumnTypeCategory), GroupedColumns> = HashMap::new();
+/// Iterates over the columns of the columnar readers, grouped by column name.
+/// Key functionality is that `open` of the Columns is done lazy per group.
+fn group_columns_for_merge<'a>(
+    columnar_readers: &'a [&'a ColumnarReader],
+    required_columns: &'a [(String, ColumnType)],
+    _merge_row_order: &'a MergeRowOrder,
+) -> io::Result<BTreeMap<(String, ColumnTypeCategory), GroupedColumnsHandle>> {
+    let mut columns: BTreeMap<(String, ColumnTypeCategory), GroupedColumnsHandle> = BTreeMap::new();

    for &(ref column_name, column_type) in required_columns {
-        columns_grouped
+        columns
            .entry((column_name.clone(), column_type.into()))
-            .or_insert_with(|| {
-                GroupedColumns::for_category(column_type.into(), columnar_readers.len())
-            })
+            .or_insert_with(|| GroupedColumnsHandle::new(columnar_readers.len()))
            .require_type(column_type)?;
    }

    for (columnar_id, columnar_reader) in columnar_readers.iter().enumerate() {
-        let column_name_and_handle = columnar_reader.list_columns()?;
-        // We skip columns that end up with 0 documents.
-        // That way, we make sure they don't end up influencing the merge type or
-        // creating empty columns.
+        let column_name_and_handle = columnar_reader.iter_columns()?;

        for (column_name, handle) in column_name_and_handle {
            let column_category: ColumnTypeCategory = handle.column_type().into();
-            let column = handle.open()?;
-            if is_empty_after_merge(merge_row_order, &column, columnar_id) {
-                continue;
-            }
-            columns_grouped
+            columns
                .entry((column_name, column_category))
-                .or_insert_with(|| {
-                    GroupedColumns::for_category(column_category, columnar_readers.len())
-                })
-                .set_column(columnar_id, column);
+                .or_insert_with(|| GroupedColumnsHandle::new(columnar_readers.len()))
+                .set_column(columnar_id, handle);
        }
    }
-
-    let mut merge_columns: BTreeMap<(String, ColumnType), Vec<Option<DynamicColumn>>> =
-        Default::default();
-
-    for ((column_name, _), mut grouped_columns) in columns_grouped {
-        let column_type = grouped_columns.column_type_after_merge();
-        coerce_columns(column_type, &mut grouped_columns.columns)?;
-        merge_columns.insert((column_name, column_type), grouped_columns.columns);
-    }
-
-    Ok(merge_columns)
+    Ok(columns)
 }

 fn coerce_columns(
--- a/columnar/src/columnar/merge/tests.rs
+++ b/columnar/src/columnar/merge/tests.rs
@@ -1,3 +1,5 @@
+use std::collections::BTreeMap;
+
 use itertools::Itertools;

 use super::*;
@@ -27,22 +29,10 @@ fn test_column_coercion_to_u64() {
    let columnar2 = make_columnar("numbers", &[u64::MAX]);
    let columnars = &[&columnar1, &columnar2];
    let merge_order = StackMergeOrder::stack(columnars).into();
-    let column_map: BTreeMap<(String, ColumnType), Vec<Option<DynamicColumn>>> =
+    let column_map: BTreeMap<(String, ColumnTypeCategory), GroupedColumnsHandle> =
        group_columns_for_merge(columnars, &[], &merge_order).unwrap();
    assert_eq!(column_map.len(), 1);
-    assert!(column_map.contains_key(&("numbers".to_string(), ColumnType::U64)));
-}
-
-#[test]
-fn test_column_no_coercion_if_all_the_same() {
-    let columnar1 = make_columnar("numbers", &[1u64]);
-    let columnar2 = make_columnar("numbers", &[2u64]);
-    let columnars = &[&columnar1, &columnar2];
-    let merge_order = StackMergeOrder::stack(columnars).into();
-    let column_map: BTreeMap<(String, ColumnType), Vec<Option<DynamicColumn>>> =
-        group_columns_for_merge(columnars, &[], &merge_order).unwrap();
-    assert_eq!(column_map.len(), 1);
-    assert!(column_map.contains_key(&("numbers".to_string(), ColumnType::U64)));
+    assert!(column_map.contains_key(&("numbers".to_string(), ColumnTypeCategory::Numerical)));
 }

 #[test]
@@ -51,24 +41,24 @@ fn test_column_coercion_to_i64() {
    let columnar2 = make_columnar("numbers", &[2u64]);
    let columnars = &[&columnar1, &columnar2];
    let merge_order = StackMergeOrder::stack(columnars).into();
-    let column_map: BTreeMap<(String, ColumnType), Vec<Option<DynamicColumn>>> =
+    let column_map: BTreeMap<(String, ColumnTypeCategory), GroupedColumnsHandle> =
        group_columns_for_merge(columnars, &[], &merge_order).unwrap();
    assert_eq!(column_map.len(), 1);
-    assert!(column_map.contains_key(&("numbers".to_string(), ColumnType::I64)));
+    assert!(column_map.contains_key(&("numbers".to_string(), ColumnTypeCategory::Numerical)));
 }

-#[test]
-fn test_impossible_coercion_returns_an_error() {
-    let columnar1 = make_columnar("numbers", &[u64::MAX]);
-    let merge_order = StackMergeOrder::stack(&[&columnar1]).into();
-    let group_error = group_columns_for_merge(
-        &[&columnar1],
-        &[("numbers".to_string(), ColumnType::I64)],
-        &merge_order,
-    )
-    .unwrap_err();
-    assert_eq!(group_error.kind(), io::ErrorKind::InvalidInput);
-}
+//#[test]
+// fn test_impossible_coercion_returns_an_error() {
+// let columnar1 = make_columnar("numbers", &[u64::MAX]);
+// let merge_order = StackMergeOrder::stack(&[&columnar1]).into();
+// let group_error = group_columns_for_merge_iter(
+//&[&columnar1],
+//&[("numbers".to_string(), ColumnType::I64)],
+//&merge_order,
+//)
+//.unwrap_err();
+// assert_eq!(group_error.kind(), io::ErrorKind::InvalidInput);
+//}

 #[test]
 fn test_group_columns_with_required_column() {
@@ -76,7 +66,7 @@ fn test_group_columns_with_required_column() {
    let columnar2 = make_columnar("numbers", &[2u64]);
    let columnars = &[&columnar1, &columnar2];
    let merge_order = StackMergeOrder::stack(columnars).into();
-    let column_map: BTreeMap<(String, ColumnType), Vec<Option<DynamicColumn>>> =
+    let column_map: BTreeMap<(String, ColumnTypeCategory), GroupedColumnsHandle> =
        group_columns_for_merge(
            &[&columnar1, &columnar2],
            &[("numbers".to_string(), ColumnType::U64)],
@@ -84,7 +74,7 @@ fn test_group_columns_with_required_column() {
        )
        .unwrap();
    assert_eq!(column_map.len(), 1);
-    assert!(column_map.contains_key(&("numbers".to_string(), ColumnType::U64)));
+    assert!(column_map.contains_key(&("numbers".to_string(), ColumnTypeCategory::Numerical)));
 }

 #[test]
@@ -93,17 +83,17 @@ fn test_group_columns_required_column_with_no_existing_columns() {
    let columnar2 = make_columnar("numbers", &[2u64]);
    let columnars = &[&columnar1, &columnar2];
    let merge_order = StackMergeOrder::stack(columnars).into();
-    let column_map: BTreeMap<(String, ColumnType), Vec<Option<DynamicColumn>>> =
-        group_columns_for_merge(
-            columnars,
-            &[("required_col".to_string(), ColumnType::Str)],
-            &merge_order,
-        )
-        .unwrap();
+    let column_map: BTreeMap<_, _> = group_columns_for_merge(
+        columnars,
+        &[("required_col".to_string(), ColumnType::Str)],
+        &merge_order,
+    )
+    .unwrap();
    assert_eq!(column_map.len(), 2);
-    let columns = column_map
-        .get(&("required_col".to_string(), ColumnType::Str))
-        .unwrap();
+    let columns = &column_map
+        .get(&("required_col".to_string(), ColumnTypeCategory::Str))
+        .unwrap()
+        .columns;
    assert_eq!(columns.len(), 2);
    assert!(columns[0].is_none());
    assert!(columns[1].is_none());
@@ -115,7 +105,7 @@ fn test_group_columns_required_column_is_above_all_columns_have_the_same_type_ru
    let columnar2 = make_columnar("numbers", &[2i64]);
    let columnars = &[&columnar1, &columnar2];
    let merge_order = StackMergeOrder::stack(columnars).into();
-    let column_map: BTreeMap<(String, ColumnType), Vec<Option<DynamicColumn>>> =
+    let column_map: BTreeMap<(String, ColumnTypeCategory), GroupedColumnsHandle> =
        group_columns_for_merge(
            columnars,
            &[("numbers".to_string(), ColumnType::U64)],
@@ -123,7 +113,7 @@ fn test_group_columns_required_column_is_above_all_columns_have_the_same_type_ru
        )
        .unwrap();
    assert_eq!(column_map.len(), 1);
-    assert!(column_map.contains_key(&("numbers".to_string(), ColumnType::U64)));
+    assert!(column_map.contains_key(&("numbers".to_string(), ColumnTypeCategory::Numerical)));
 }

 #[test]
@@ -132,21 +122,23 @@ fn test_missing_column() {
    let columnar2 = make_columnar("numbers2", &[2u64]);
    let columnars = &[&columnar1, &columnar2];
    let merge_order = StackMergeOrder::stack(columnars).into();
-    let column_map: BTreeMap<(String, ColumnType), Vec<Option<DynamicColumn>>> =
+    let column_map: BTreeMap<(String, ColumnTypeCategory), GroupedColumnsHandle> =
        group_columns_for_merge(columnars, &[], &merge_order).unwrap();
    assert_eq!(column_map.len(), 2);
-    assert!(column_map.contains_key(&("numbers".to_string(), ColumnType::I64)));
+    assert!(column_map.contains_key(&("numbers".to_string(), ColumnTypeCategory::Numerical)));
    {
-        let columns = column_map
-            .get(&("numbers".to_string(), ColumnType::I64))
-            .unwrap();
+        let columns = &column_map
+            .get(&("numbers".to_string(), ColumnTypeCategory::Numerical))
+            .unwrap()
+            .columns;
        assert!(columns[0].is_some());
        assert!(columns[1].is_none());
    }
    {
-        let columns = column_map
-            .get(&("numbers2".to_string(), ColumnType::U64))
-            .unwrap();
+        let columns = &column_map
+            .get(&("numbers2".to_string(), ColumnTypeCategory::Numerical))
+            .unwrap()
+            .columns;
        assert!(columns[0].is_none());
        assert!(columns[1].is_some());
    }
@@ -244,7 +236,9 @@ fn test_merge_columnar_numbers() {
    assert_eq!(columnar_reader.num_columns(), 1);
    let cols = columnar_reader.read_columns("numbers").unwrap();
    let dynamic_column = cols[0].open().unwrap();
-    let DynamicColumn::F64(vals) = dynamic_column else { panic!() };
+    let DynamicColumn::F64(vals) = dynamic_column else {
+        panic!()
+    };
    assert_eq!(vals.get_cardinality(), Cardinality::Optional);
    assert_eq!(vals.first(0u32), Some(-1f64));
    assert_eq!(vals.first(1u32), None);
@@ -270,7 +264,9 @@ fn test_merge_columnar_texts() {
    assert_eq!(columnar_reader.num_columns(), 1);
    let cols = columnar_reader.read_columns("texts").unwrap();
    let dynamic_column = cols[0].open().unwrap();
-    let DynamicColumn::Str(vals) = dynamic_column else { panic!() };
+    let DynamicColumn::Str(vals) = dynamic_column else {
+        panic!()
+    };
    assert_eq!(vals.ords().get_cardinality(), Cardinality::Optional);

    let get_str_for_ord = |ord| {
@@ -317,7 +313,9 @@ fn test_merge_columnar_byte() {
    assert_eq!(columnar_reader.num_columns(), 1);
    let cols = columnar_reader.read_columns("bytes").unwrap();
    let dynamic_column = cols[0].open().unwrap();
-    let DynamicColumn::Bytes(vals) = dynamic_column else { panic!() };
+    let DynamicColumn::Bytes(vals) = dynamic_column else {
+        panic!()
+    };
    let get_bytes_for_ord = |ord| {
        let mut out = Vec::new();
        vals.ord_to_bytes(ord, &mut out).unwrap();
@@ -371,7 +369,9 @@ fn test_merge_columnar_byte_with_missing() {
    assert_eq!(columnar_reader.num_columns(), 2);
    let cols = columnar_reader.read_columns("col").unwrap();
    let dynamic_column = cols[0].open().unwrap();
-    let DynamicColumn::Bytes(vals) = dynamic_column else { panic!() };
+    let DynamicColumn::Bytes(vals) = dynamic_column else {
+        panic!()
+    };
    let get_bytes_for_ord = |ord| {
        let mut out = Vec::new();
        vals.ord_to_bytes(ord, &mut out).unwrap();
@@ -423,7 +423,9 @@ fn test_merge_columnar_different_types() {

    // numeric column
    let dynamic_column = cols[0].open().unwrap();
-    let DynamicColumn::I64(vals) = dynamic_column else { panic!() };
+    let DynamicColumn::I64(vals) = dynamic_column else {
+        panic!()
+    };
    assert_eq!(vals.get_cardinality(), Cardinality::Optional);
    assert_eq!(vals.values_for_doc(0).collect_vec(), vec![]);
    assert_eq!(vals.values_for_doc(1).collect_vec(), vec![]);
@@ -433,7 +435,9 @@ fn test_merge_columnar_different_types() {

    // text column
    let dynamic_column = cols[1].open().unwrap();
-    let DynamicColumn::Str(vals) = dynamic_column else { panic!() };
+    let DynamicColumn::Str(vals) = dynamic_column else {
+        panic!()
+    };
    assert_eq!(vals.ords().get_cardinality(), Cardinality::Optional);
    let get_str_for_ord = |ord| {
        let mut out = String::new();
--- a/columnar/src/columnar/reader/mod.rs
+++ b/columnar/src/columnar/reader/mod.rs
@@ -102,30 +102,41 @@ impl ColumnarReader {
    pub fn num_rows(&self) -> RowId {
        self.num_rows
    }
+    // Iterate over the columns in a sorted way
+    pub fn iter_columns(
+        &self,
+    ) -> io::Result<impl Iterator<Item = (String, DynamicColumnHandle)> + '_> {
+        let mut stream = self.column_dictionary.stream()?;
+        Ok(std::iter::from_fn(move || {
+            if stream.advance() {
+                let key_bytes: &[u8] = stream.key();
+                let column_code: u8 = key_bytes.last().cloned().unwrap();
+                // TODO Error Handling. The API gets quite ugly when returning the error here, so
+                // instead we could just check the first N columns upfront.
+                let column_type: ColumnType = ColumnType::try_from_code(column_code)
+                    .map_err(|_| io_invalid_data(format!("Unknown column code `{column_code}`")))
+                    .unwrap();
+                let range = stream.value().clone();
+                let column_name =
+                // The last two bytes are respectively the 0u8 separator and the column_type.
+                String::from_utf8_lossy(&key_bytes[..key_bytes.len() - 2]).to_string();
+                let file_slice = self
+                    .column_data
+                    .slice(range.start as usize..range.end as usize);
+                let column_handle = DynamicColumnHandle {
+                    file_slice,
+                    column_type,
+                };
+                Some((column_name, column_handle))
+            } else {
+                None
+            }
+        }))
+    }

    // TODO Add unit tests
    pub fn list_columns(&self) -> io::Result<Vec<(String, DynamicColumnHandle)>> {
-        let mut stream = self.column_dictionary.stream()?;
-        let mut results = Vec::new();
-        while stream.advance() {
-            let key_bytes: &[u8] = stream.key();
-            let column_code: u8 = key_bytes.last().cloned().unwrap();
-            let column_type: ColumnType = ColumnType::try_from_code(column_code)
-                .map_err(|_| io_invalid_data(format!("Unknown column code `{column_code}`")))?;
-            let range = stream.value().clone();
-            let column_name =
-                // The last two bytes are respectively the 0u8 separator and the column_type.
-                String::from_utf8_lossy(&key_bytes[..key_bytes.len() - 2]).to_string();
-            let file_slice = self
-                .column_data
-                .slice(range.start as usize..range.end as usize);
-            let column_handle = DynamicColumnHandle {
-                file_slice,
-                column_type,
-            };
-            results.push((column_name, column_handle));
-        }
-        Ok(results)
+        Ok(self.iter_columns()?.collect())
    }

    fn stream_for_column_range(&self, column_name: &str) -> sstable::StreamerBuilder<RangeSSTable> {
--- a/columnar/src/columnar/writer/mod.rs
+++ b/columnar/src/columnar/writer/mod.rs
@@ -79,7 +79,6 @@ fn mutate_or_create_column<V, TMutator>(

 impl ColumnarWriter {
    pub fn mem_usage(&self) -> usize {
-        // TODO add dictionary builders.
        self.arena.mem_usage()
            + self.numerical_field_hash_map.mem_usage()
            + self.bool_field_hash_map.mem_usage()
@@ -87,6 +86,11 @@ impl ColumnarWriter {
            + self.str_field_hash_map.mem_usage()
            + self.ip_addr_field_hash_map.mem_usage()
            + self.datetime_field_hash_map.mem_usage()
+            + self
+                .dictionaries
+                .iter()
+                .map(|dict| dict.mem_usage())
+                .sum::<usize>()
    }

    /// Returns the list of doc ids from 0..num_docs sorted by the `sort_field`
@@ -98,9 +102,15 @@ impl ColumnarWriter {
    ///
    /// The sort applied is stable.
    pub fn sort_order(&self, sort_field: &str, num_docs: RowId, reversed: bool) -> Vec<u32> {
-        let Some(numerical_col_writer) =
-            self.numerical_field_hash_map.get::<NumericalColumnWriter>(sort_field.as_bytes()) else {
-                return Vec::new();
+        let Some(numerical_col_writer) = self
+            .numerical_field_hash_map
+            .get::<NumericalColumnWriter>(sort_field.as_bytes())
+            .or_else(|| {
+                self.datetime_field_hash_map
+                    .get::<NumericalColumnWriter>(sort_field.as_bytes())
+            })
+        else {
+            return Vec::new();
        };
        let mut symbols_buffer = Vec::new();
        let mut values = Vec::new();
--- a/columnar/src/dictionary.rs
+++ b/columnar/src/dictionary.rs
@@ -32,6 +32,7 @@ pub struct OrderedId(pub u32);
 #[derive(Default)]
 pub(crate) struct DictionaryBuilder {
    dict: FnvHashMap<Vec<u8>, UnorderedId>,
+    memory_consumption: usize,
 }

 impl DictionaryBuilder {
@@ -43,6 +44,8 @@ impl DictionaryBuilder {
        }
        let new_id = UnorderedId(self.dict.len() as u32);
        self.dict.insert(term.to_vec(), new_id);
+        self.memory_consumption += term.len();
+        self.memory_consumption += 40; // Term Metadata + HashMap overhead
        new_id
    }

@@ -63,6 +66,10 @@ impl DictionaryBuilder {
        sstable_builder.finish()?;
        Ok(TermIdMapping { unordered_to_ord })
    }
+
+    pub(crate) fn mem_usage(&self) -> usize {
+        self.memory_consumption
+    }
 }

 #[cfg(test)]
--- a/columnar/src/dynamic_column.rs
+++ b/columnar/src/dynamic_column.rs
@@ -228,7 +228,7 @@ static_dynamic_conversions!(StrColumn, Str);
 static_dynamic_conversions!(BytesColumn, Bytes);
 static_dynamic_conversions!(Column<Ipv6Addr>, IpAddr);

-#[derive(Clone)]
+#[derive(Clone, Debug)]
 pub struct DynamicColumnHandle {
    pub(crate) file_slice: FileSlice,
    pub(crate) column_type: ColumnType,
@@ -247,7 +247,7 @@ impl DynamicColumnHandle {
    }

    /// Returns the `u64` fast field reader reader associated with `fields` of types
-    /// Str, u64, i64, f64, or datetime.
+    /// Str, u64, i64, f64, bool, or datetime.
    ///
    /// If not, the fastfield reader will returns the u64-value associated with the original
    /// FastValue.
@@ -258,9 +258,12 @@ impl DynamicColumnHandle {
                let column: BytesColumn = crate::column::open_column_bytes(column_bytes)?;
                Ok(Some(column.term_ord_column))
            }
-            ColumnType::Bool => Ok(None),
            ColumnType::IpAddr => Ok(None),
-            ColumnType::I64 | ColumnType::U64 | ColumnType::F64 | ColumnType::DateTime => {
+            ColumnType::Bool
+            | ColumnType::I64
+            | ColumnType::U64
+            | ColumnType::F64
+            | ColumnType::DateTime => {
                let column = crate::column::open_column_u64::<u64>(column_bytes)?;
                Ok(Some(column))
            }
--- a/columnar/src/tests.rs
+++ b/columnar/src/tests.rs
@@ -57,7 +57,9 @@ fn test_dataframe_writer_bool() {
    assert_eq!(cols[0].num_bytes(), 22);
    assert_eq!(cols[0].column_type(), ColumnType::Bool);
    let dyn_bool_col = cols[0].open().unwrap();
-    let DynamicColumn::Bool(bool_col) = dyn_bool_col else { panic!(); };
+    let DynamicColumn::Bool(bool_col) = dyn_bool_col else {
+        panic!();
+    };
    let vals: Vec<Option<bool>> = (0..5).map(|row_id| bool_col.first(row_id)).collect();
    assert_eq!(&vals, &[None, Some(false), None, Some(true), None,]);
 }
@@ -79,7 +81,9 @@ fn test_dataframe_writer_u64_multivalued() {
    assert_eq!(cols.len(), 1);
    assert_eq!(cols[0].num_bytes(), 29);
    let dyn_i64_col = cols[0].open().unwrap();
-    let DynamicColumn::I64(divisor_col) = dyn_i64_col else { panic!(); };
+    let DynamicColumn::I64(divisor_col) = dyn_i64_col else {
+        panic!();
+    };
    assert_eq!(
        divisor_col.get_cardinality(),
        crate::Cardinality::Multivalued
@@ -101,7 +105,9 @@ fn test_dataframe_writer_ip_addr() {
    assert_eq!(cols[0].num_bytes(), 42);
    assert_eq!(cols[0].column_type(), ColumnType::IpAddr);
    let dyn_bool_col = cols[0].open().unwrap();
-    let DynamicColumn::IpAddr(ip_col) = dyn_bool_col else { panic!(); };
+    let DynamicColumn::IpAddr(ip_col) = dyn_bool_col else {
+        panic!();
+    };
    let vals: Vec<Option<Ipv6Addr>> = (0..5).map(|row_id| ip_col.first(row_id)).collect();
    assert_eq!(
        &vals,
@@ -134,7 +140,9 @@ fn test_dataframe_writer_numerical() {
    // - null footer 6 bytes
    assert_eq!(cols[0].num_bytes(), 33);
    let column = cols[0].open().unwrap();
-    let DynamicColumn::I64(column_i64) = column else { panic!(); };
+    let DynamicColumn::I64(column_i64) = column else {
+        panic!();
+    };
    assert_eq!(column_i64.index.get_cardinality(), Cardinality::Optional);
    assert_eq!(column_i64.first(0), None);
    assert_eq!(column_i64.first(1), Some(12i64));
@@ -198,7 +206,9 @@ fn test_dictionary_encoded_str() {
    assert_eq!(columnar_reader.num_columns(), 2);
    let col_handles = columnar_reader.read_columns("my.column").unwrap();
    assert_eq!(col_handles.len(), 1);
-    let DynamicColumn::Str(str_col) = col_handles[0].open().unwrap() else  { panic!(); };
+    let DynamicColumn::Str(str_col) = col_handles[0].open().unwrap() else {
+        panic!();
+    };
    let index: Vec<Option<u64>> = (0..5).map(|row_id| str_col.ords().first(row_id)).collect();
    assert_eq!(index, &[None, Some(0), None, Some(2), Some(1)]);
    assert_eq!(str_col.num_rows(), 5);
@@ -230,7 +240,9 @@ fn test_dictionary_encoded_bytes() {
    assert_eq!(columnar_reader.num_columns(), 2);
    let col_handles = columnar_reader.read_columns("my.column").unwrap();
    assert_eq!(col_handles.len(), 1);
-    let DynamicColumn::Bytes(bytes_col) = col_handles[0].open().unwrap() else  { panic!(); };
+    let DynamicColumn::Bytes(bytes_col) = col_handles[0].open().unwrap() else {
+        panic!();
+    };
    let index: Vec<Option<u64>> = (0..5)
        .map(|row_id| bytes_col.ords().first(row_id))
        .collect();
@@ -533,28 +545,36 @@ trait AssertEqualToColumnValue {

 impl AssertEqualToColumnValue for bool {
    fn assert_equal_to_column_value(&self, column_value: &ColumnValue) {
-        let ColumnValue::Bool(val) = column_value else { panic!() };
+        let ColumnValue::Bool(val) = column_value else {
+            panic!()
+        };
        assert_eq!(self, val);
    }
 }

 impl AssertEqualToColumnValue for Ipv6Addr {
    fn assert_equal_to_column_value(&self, column_value: &ColumnValue) {
-        let ColumnValue::IpAddr(val) = column_value else { panic!() };
+        let ColumnValue::IpAddr(val) = column_value else {
+            panic!()
+        };
        assert_eq!(self, val);
    }
 }

 impl<T: Coerce + PartialEq + Debug + Into<NumericalValue>> AssertEqualToColumnValue for T {
    fn assert_equal_to_column_value(&self, column_value: &ColumnValue) {
-        let ColumnValue::Numerical(num) = column_value else { panic!() };
+        let ColumnValue::Numerical(num) = column_value else {
+            panic!()
+        };
        assert_eq!(self, &T::coerce(*num));
    }
 }

 impl AssertEqualToColumnValue for DateTime {
    fn assert_equal_to_column_value(&self, column_value: &ColumnValue) {
-        let ColumnValue::DateTime(dt) = column_value else { panic!() };
+        let ColumnValue::DateTime(dt) = column_value else {
+            panic!()
+        };
        assert_eq!(self, dt);
    }
 }
--- a/common/Cargo.toml
+++ b/common/Cargo.toml
@@ -1,6 +1,6 @@
 [package]
 name = "tantivy-common"
-version = "0.5.0"
+version = "0.6.0"
 authors = ["Paul Masurel <paul@quickwit.io>", "Pascal Seitz <pascal@quickwit.io>"]
 license = "MIT"
 edition = "2021"
@@ -14,7 +14,7 @@ repository = "https://github.com/quickwit-oss/tantivy"

 [dependencies]
 byteorder = "1.4.3"
-ownedbytes = { version= "0.5", path="../ownedbytes" }
+ownedbytes = { version= "0.6", path="../ownedbytes" }
 async-trait = "0.1"
 time = { version = "0.3.10", features = ["serde-well-known"] }
 serde = { version = "1.0.136", features = ["derive"] }
--- a/common/src/datetime.rs
+++ b/common/src/datetime.rs
@@ -15,21 +15,12 @@ use time::{OffsetDateTime, PrimitiveDateTime, UtcOffset};
 pub enum DateTimePrecision {
    /// Second precision.
    #[default]
-    Second,
-    /// Millisecond precision.
-    Millisecond,
-    /// Microsecond precision.
-    Microsecond,
-    /// Nanosecond precision.
-    Nanosecond,
-    // TODO: Remove deprecated variants after 2 releases.
-    #[deprecated(since = "0.20.0", note = "Use `Second` instead")]
    Seconds,
-    #[deprecated(since = "0.20.0", note = "Use `Millisecond` instead")]
+    /// Millisecond precision.
    Milliseconds,
-    #[deprecated(since = "0.20.0", note = "Use `Microsecond` instead")]
+    /// Microsecond precision.
    Microseconds,
-    #[deprecated(since = "0.20.0", note = "Use `Nanosecond` instead")]
+    /// Nanosecond precision.
    Nanoseconds,
 }

@@ -156,16 +147,10 @@ impl DateTime {
    /// Truncates the microseconds value to the corresponding precision.
    pub fn truncate(self, precision: DateTimePrecision) -> Self {
        let truncated_timestamp_micros = match precision {
-            DateTimePrecision::Second | DateTimePrecision::Seconds => {
-                (self.timestamp_nanos / 1_000_000_000) * 1_000_000_000
-            }
-            DateTimePrecision::Millisecond | DateTimePrecision::Milliseconds => {
-                (self.timestamp_nanos / 1_000_000) * 1_000_000
-            }
-            DateTimePrecision::Microsecond | DateTimePrecision::Microseconds => {
-                (self.timestamp_nanos / 1_000) * 1_000
-            }
-            DateTimePrecision::Nanosecond | DateTimePrecision::Nanoseconds => self.timestamp_nanos,
+            DateTimePrecision::Seconds => (self.timestamp_nanos / 1_000_000_000) * 1_000_000_000,
+            DateTimePrecision::Milliseconds => (self.timestamp_nanos / 1_000_000) * 1_000_000,
+            DateTimePrecision::Microseconds => (self.timestamp_nanos / 1_000) * 1_000,
+            DateTimePrecision::Nanoseconds => self.timestamp_nanos,
        };
        Self {
            timestamp_nanos: truncated_timestamp_micros,
@@ -174,7 +159,7 @@ impl DateTime {
 }

 impl fmt::Debug for DateTime {
-    fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
+    fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
        let utc_rfc3339 = self.into_utc().format(&Rfc3339).map_err(|_| fmt::Error)?;
        f.write_str(&utc_rfc3339)
    }
--- a/common/src/file_slice.rs
+++ b/common/src/file_slice.rs
@@ -1,3 +1,4 @@
+use std::fs::File;
 use std::ops::{Deref, Range, RangeBounds};
 use std::sync::Arc;
 use std::{fmt, io};
@@ -32,6 +33,62 @@ pub trait FileHandle: 'static + Send + Sync + HasLen + fmt::Debug {
    }
 }

+#[derive(Debug)]
+/// A File with it's length included.
+pub struct WrapFile {
+    file: File,
+    len: usize,
+}
+impl WrapFile {
+    /// Creates a new WrapFile and stores its length.
+    pub fn new(file: File) -> io::Result<Self> {
+        let len = file.metadata()?.len() as usize;
+        Ok(WrapFile { file, len })
+    }
+}
+
+#[async_trait]
+impl FileHandle for WrapFile {
+    fn read_bytes(&self, range: Range<usize>) -> io::Result<OwnedBytes> {
+        let file_len = self.len();
+
+        // Calculate the actual range to read, ensuring it stays within file boundaries
+        let start = range.start;
+        let end = range.end.min(file_len);
+
+        // Ensure the start is before the end of the range
+        if start >= end {
+            return Err(io::Error::new(io::ErrorKind::InvalidInput, "Invalid range"));
+        }
+
+        let mut buffer = vec![0; end - start];
+
+        #[cfg(unix)]
+        {
+            use std::os::unix::prelude::FileExt;
+            self.file.read_exact_at(&mut buffer, start as u64)?;
+        }
+
+        #[cfg(not(unix))]
+        {
+            use std::io::{Read, Seek};
+            let mut file = self.file.try_clone()?; // Clone the file to read from it separately
+                                                   // Seek to the start position in the file
+            file.seek(io::SeekFrom::Start(start as u64))?;
+            // Read the data into the buffer
+            file.read_exact(&mut buffer)?;
+        }
+
+        Ok(OwnedBytes::new(buffer))
+    }
+    // todo implement async
+}
+impl HasLen for WrapFile {
+    fn len(&self) -> usize {
+        self.len
+    }
+}
+
 #[async_trait]
 impl FileHandle for &'static [u8] {
    fn read_bytes(&self, range: Range<usize>) -> io::Result<OwnedBytes> {
@@ -67,6 +124,30 @@ impl fmt::Debug for FileSlice {
    }
 }

+impl FileSlice {
+    pub fn stream_file_chunks(&self) -> impl Iterator<Item = io::Result<OwnedBytes>> + '_ {
+        let len = self.range.end;
+        let mut start = self.range.start;
+        std::iter::from_fn(move || {
+            /// Returns chunks of 1MB of data from the FileHandle.
+            const CHUNK_SIZE: usize = 1024 * 1024; // 1MB
+
+            if start < len {
+                let end = (start + CHUNK_SIZE).min(len);
+                let range = start..end;
+                let chunk = self.data.read_bytes(range);
+                start += CHUNK_SIZE;
+                match chunk {
+                    Ok(chunk) => Some(Ok(chunk)),
+                    Err(e) => Some(Err(e)),
+                }
+            } else {
+                None
+            }
+        })
+    }
+}
+
 /// Takes a range, a `RangeBounds` object, and returns
 /// a `Range` that corresponds to the relative application of the
 /// `RangeBounds` object to the original `Range`.
--- a/common/src/group_by.rs
+++ b/common/src/group_by.rs
@@ -27,15 +27,15 @@ pub trait GroupByIteratorExtended: Iterator {
    where
        Self: Sized,
        F: FnMut(&Self::Item) -> K,
-        K: PartialEq + Copy,
-        Self::Item: Copy,
+        K: PartialEq + Clone,
+        Self::Item: Clone,
    {
        GroupByIterator::new(self, key)
    }
 }
 impl<I: Iterator> GroupByIteratorExtended for I {}

-pub struct GroupByIterator<I, F, K: Copy>
+pub struct GroupByIterator<I, F, K: Clone>
 where
    I: Iterator,
    F: FnMut(&I::Item) -> K,
@@ -50,7 +50,7 @@ where
    inner: Rc<RefCell<GroupByShared<I, F, K>>>,
 }

-struct GroupByShared<I, F, K: Copy>
+struct GroupByShared<I, F, K: Clone>
 where
    I: Iterator,
    F: FnMut(&I::Item) -> K,
@@ -63,7 +63,7 @@ impl<I, F, K> GroupByIterator<I, F, K>
 where
    I: Iterator,
    F: FnMut(&I::Item) -> K,
-    K: Copy,
+    K: Clone,
 {
    fn new(inner: I, group_by_fn: F) -> Self {
        let inner = GroupByShared {
@@ -80,28 +80,28 @@ where
 impl<I, F, K> Iterator for GroupByIterator<I, F, K>
 where
    I: Iterator,
-    I::Item: Copy,
+    I::Item: Clone,
    F: FnMut(&I::Item) -> K,
-    K: Copy,
+    K: Clone,
 {
    type Item = (K, GroupIterator<I, F, K>);

    fn next(&mut self) -> Option<Self::Item> {
        let mut inner = self.inner.borrow_mut();
-        let value = *inner.iter.peek()?;
+        let value = inner.iter.peek()?.clone();
        let key = (inner.group_by_fn)(&value);

        let inner = self.inner.clone();

        let group_iter = GroupIterator {
            inner,
-            group_key: key,
+            group_key: key.clone(),
        };
        Some((key, group_iter))
    }
 }

-pub struct GroupIterator<I, F, K: Copy>
+pub struct GroupIterator<I, F, K: Clone>
 where
    I: Iterator,
    F: FnMut(&I::Item) -> K,
@@ -110,10 +110,10 @@ where
    group_key: K,
 }

-impl<I, F, K: PartialEq + Copy> Iterator for GroupIterator<I, F, K>
+impl<I, F, K: PartialEq + Clone> Iterator for GroupIterator<I, F, K>
 where
    I: Iterator,
-    I::Item: Copy,
+    I::Item: Clone,
    F: FnMut(&I::Item) -> K,
 {
    type Item = I::Item;
@@ -121,7 +121,7 @@ where
    fn next(&mut self) -> Option<Self::Item> {
        let mut inner = self.inner.borrow_mut();
        // peek if next value is in group
-        let peek_val = *inner.iter.peek()?;
+        let peek_val = inner.iter.peek()?.clone();
        if (inner.group_by_fn)(&peek_val) == self.group_key {
            inner.iter.next()
        } else {
--- a/examples/basic_search.rs
+++ b/examples/basic_search.rs
@@ -221,5 +221,19 @@ fn main() -> tantivy::Result<()> {
        println!("{}", schema.to_json(&retrieved_doc));
    }

+    // We can also get an explanation to understand
+    // how a found document got its score.
+    let query = query_parser.parse_query("title:sea^20 body:whale^70")?;
+
+    let (_score, doc_address) = searcher
+        .search(&query, &TopDocs::with_limit(1))?
+        .into_iter()
+        .next()
+        .unwrap();
+
+    let explanation = query.explain(&searcher, doc_address)?;
+
+    println!("{}", explanation.to_pretty_json());
+
    Ok(())
 }
--- a/examples/custom_tokenizer.rs
+++ b/examples/custom_tokenizer.rs
@@ -53,7 +53,7 @@ fn main() -> tantivy::Result<()> {
    // this will store tokens of 3 characters each
    index
        .tokenizers()
-        .register("ngram3", NgramTokenizer::new(3, 3, false));
+        .register("ngram3", NgramTokenizer::new(3, 3, false).unwrap());

    // To insert document we need an index writer.
    // There must be only one writer at a time.
--- a/examples/date_time_field.rs
+++ b/examples/date_time_field.rs
@@ -13,7 +13,7 @@ fn main() -> tantivy::Result<()> {
    let opts = DateOptions::from(INDEXED)
        .set_stored()
        .set_fast()
-        .set_precision(tantivy::DateTimePrecision::Second);
+        .set_precision(tantivy::DateTimePrecision::Seconds);
    // Add `occurred_at` date field type
    let occurred_at = schema_builder.add_date_field("occurred_at", opts);
    let event_type = schema_builder.add_text_field("event", STRING | STORED);
--- a/examples/phrase_prefix_search.rs
+++ b/examples/phrase_prefix_search.rs
@@ -0,0 +1,79 @@
+use tantivy::collector::TopDocs;
+use tantivy::query::QueryParser;
+use tantivy::schema::*;
+use tantivy::{doc, Index, ReloadPolicy, Result};
+use tempfile::TempDir;
+
+fn main() -> Result<()> {
+    let index_path = TempDir::new()?;
+
+    let mut schema_builder = Schema::builder();
+    schema_builder.add_text_field("title", TEXT | STORED);
+    schema_builder.add_text_field("body", TEXT);
+    let schema = schema_builder.build();
+
+    let title = schema.get_field("title").unwrap();
+    let body = schema.get_field("body").unwrap();
+
+    let index = Index::create_in_dir(&index_path, schema)?;
+
+    let mut index_writer = index.writer(50_000_000)?;
+
+    index_writer.add_document(doc!(
+    title => "The Old Man and the Sea",
+    body => "He was an old man who fished alone in a skiff in the Gulf Stream and he had gone \
+            eighty-four days now without taking a fish.",
+    ))?;
+
+    index_writer.add_document(doc!(
+    title => "Of Mice and Men",
+    body => "A few miles south of Soledad, the Salinas River drops in close to the hillside \
+            bank and runs deep and green. The water is warm too, for it has slipped twinkling \
+            over the yellow sands in the sunlight before reaching the narrow pool. On one \
+            side of the river the golden foothill slopes curve up to the strong and rocky \
+            Gabilan Mountains, but on the valley side the water is lined with trees—willows \
+            fresh and green with every spring, carrying in their lower leaf junctures the \
+            debris of the winter’s flooding; and sycamores with mottled, white, recumbent \
+            limbs and branches that arch over the pool"
+    ))?;
+
+    // Multivalued field just need to be repeated.
+    index_writer.add_document(doc!(
+    title => "Frankenstein",
+    title => "The Modern Prometheus",
+    body => "You will rejoice to hear that no disaster has accompanied the commencement of an \
+             enterprise which you have regarded with such evil forebodings.  I arrived here \
+             yesterday, and my first task is to assure my dear sister of my welfare and \
+             increasing confidence in the success of my undertaking."
+    ))?;
+
+    index_writer.commit()?;
+
+    let reader = index
+        .reader_builder()
+        .reload_policy(ReloadPolicy::OnCommit)
+        .try_into()?;
+
+    let searcher = reader.searcher();
+
+    let query_parser = QueryParser::for_index(&index, vec![title, body]);
+    // This will match documents containing the phrase "in the"
+    // followed by some word starting with "su",
+    // i.e. it will match "in the sunlight" and "in the success",
+    // but not "in the Gulf Stream".
+    let query = query_parser.parse_query("\"in the su\"*")?;
+
+    let top_docs = searcher.search(&query, &TopDocs::with_limit(10))?;
+    let mut titles = top_docs
+        .into_iter()
+        .map(|(_score, doc_address)| {
+            let doc = searcher.doc(doc_address)?;
+            let title = doc.get_first(title).unwrap().as_text().unwrap().to_owned();
+            Ok(title)
+        })
+        .collect::<Result<Vec<_>>>()?;
+    titles.sort_unstable();
+    assert_eq!(titles, ["Frankenstein", "Of Mice and Men"]);
+
+    Ok(())
+}
--- a/examples/pre_tokenized_text.rs
+++ b/examples/pre_tokenized_text.rs
@@ -17,7 +17,8 @@ use tantivy::{doc, Index, ReloadPolicy};
 use tempfile::TempDir;

 fn pre_tokenize_text(text: &str) -> Vec<Token> {
-    let mut token_stream = SimpleTokenizer.token_stream(text);
+    let mut tokenizer = SimpleTokenizer::default();
+    let mut token_stream = tokenizer.token_stream(text);
    let mut tokens = vec![];
    while token_stream.advance() {
        tokens.push(token_stream.token().clone());
--- a/examples/stop_words.rs
+++ b/examples/stop_words.rs
@@ -50,7 +50,7 @@ fn main() -> tantivy::Result<()> {

    // This tokenizer lowers all of the text (to help with stop word matching)
    // then removes all instances of `the` and `and` from the corpus
-    let tokenizer = TextAnalyzer::builder(SimpleTokenizer)
+    let tokenizer = TextAnalyzer::builder(SimpleTokenizer::default())
        .filter(LowerCaser)
        .filter(StopWordFilter::remove(vec![
            "the".to_string(),
--- a/examples/warmer.rs
+++ b/examples/warmer.rs
@@ -6,12 +6,14 @@ use tantivy::collector::TopDocs;
 use tantivy::query::QueryParser;
 use tantivy::schema::{Schema, FAST, TEXT};
 use tantivy::{
-    doc, DocAddress, DocId, Index, IndexReader, Opstamp, Searcher, SearcherGeneration, SegmentId,
-    SegmentReader, Warmer,
+    doc, DocAddress, DocId, Index, Opstamp, Searcher, SearcherGeneration, SegmentId, SegmentReader,
+    Warmer,
 };

 // This example shows how warmers can be used to
-// load a values from an external sources using the Warmer API.
+// load values from an external sources and
+// tie their lifecycle to that of the index segments
+// using the Warmer API.
 //
 // In this example, we assume an e-commerce search engine.

@@ -23,9 +25,11 @@ pub trait PriceFetcher: Send + Sync + 'static {
    fn fetch_prices(&self, product_ids: &[ProductId]) -> Vec<Price>;
 }

+type SegmentKey = (SegmentId, Option<Opstamp>);
+
 struct DynamicPriceColumn {
    field: String,
-    price_cache: RwLock<HashMap<(SegmentId, Option<Opstamp>), Arc<Vec<Price>>>>,
+    price_cache: RwLock<HashMap<SegmentKey, Arc<Vec<Price>>>>,
    price_fetcher: Box<dyn PriceFetcher>,
 }

@@ -46,7 +50,6 @@ impl DynamicPriceColumn {
 impl Warmer for DynamicPriceColumn {
    fn warm(&self, searcher: &Searcher) -> tantivy::Result<()> {
        for segment in searcher.segment_readers() {
-            let key = (segment.segment_id(), segment.delete_opstamp());
            let product_id_reader = segment
                .fast_fields()
                .u64(&self.field)?
@@ -55,37 +58,40 @@ impl Warmer for DynamicPriceColumn {
                .doc_ids_alive()
                .map(|doc| product_id_reader.get_val(doc))
                .collect();
-            let mut prices_it = self.price_fetcher.fetch_prices(&product_ids).into_iter();
-            let mut price_vals: Vec<Price> = Vec::new();
-            for doc in 0..segment.max_doc() {
-                if segment.is_deleted(doc) {
-                    price_vals.push(0);
-                } else {
-                    price_vals.push(prices_it.next().unwrap())
-                }
-            }
+
+            let mut prices = self.price_fetcher.fetch_prices(&product_ids).into_iter();
+
+            let prices: Vec<Price> = (0..segment.max_doc())
+                .map(|doc| {
+                    if !segment.is_deleted(doc) {
+                        prices.next().unwrap()
+                    } else {
+                        0
+                    }
+                })
+                .collect();
+
+            let key = (segment.segment_id(), segment.delete_opstamp());
            self.price_cache
                .write()
                .unwrap()
-                .insert(key, Arc::new(price_vals));
+                .insert(key, Arc::new(prices));
        }
+
        Ok(())
    }

    fn garbage_collect(&self, live_generations: &[&SearcherGeneration]) {
-        let live_segment_id_and_delete_ops: HashSet<(SegmentId, Option<Opstamp>)> =
-            live_generations
-                .iter()
-                .flat_map(|gen| gen.segments())
-                .map(|(&segment_id, &opstamp)| (segment_id, opstamp))
-                .collect();
-        let mut price_cache_wrt = self.price_cache.write().unwrap();
-        // let price_cache = std::mem::take(&mut *price_cache_wrt);
-        // Drain would be nicer here.
-        *price_cache_wrt = std::mem::take(&mut *price_cache_wrt)
-            .into_iter()
-            .filter(|(seg_id_and_op, _)| !live_segment_id_and_delete_ops.contains(seg_id_and_op))
+        let live_keys: HashSet<SegmentKey> = live_generations
+            .iter()
+            .flat_map(|gen| gen.segments())
+            .map(|(&segment_id, &opstamp)| (segment_id, opstamp))
            .collect();
+
+        self.price_cache
+            .write()
+            .unwrap()
+            .retain(|key, _| live_keys.contains(key));
    }
 }

@@ -100,17 +106,17 @@ pub struct ExternalPriceTable {

 impl ExternalPriceTable {
    pub fn update_price(&self, product_id: ProductId, price: Price) {
-        let mut prices_wrt = self.prices.write().unwrap();
-        prices_wrt.insert(product_id, price);
+        self.prices.write().unwrap().insert(product_id, price);
    }
 }

 impl PriceFetcher for ExternalPriceTable {
    fn fetch_prices(&self, product_ids: &[ProductId]) -> Vec<Price> {
-        let prices_read = self.prices.read().unwrap();
+        let prices = self.prices.read().unwrap();
+
        product_ids
            .iter()
-            .map(|product_id| prices_read.get(product_id).cloned().unwrap_or(0))
+            .map(|product_id| prices.get(product_id).cloned().unwrap_or(0))
            .collect()
    }
 }
@@ -137,17 +143,14 @@ fn main() -> tantivy::Result<()> {
    const SNEAKERS: ProductId = 23222;

    let index = Index::create_in_ram(schema);
-    let mut writer = index.writer_with_num_threads(1, 10_000_000)?;
+    let mut writer = index.writer_with_num_threads(1, 15_000_000)?;
    writer.add_document(doc!(product_id=>OLIVE_OIL, text=>"cooking olive oil from greece"))?;
    writer.add_document(doc!(product_id=>GLOVES, text=>"kitchen gloves, perfect for cooking"))?;
    writer.add_document(doc!(product_id=>SNEAKERS, text=>"uber sweet sneakers"))?;
    writer.commit()?;

-    let warmers: Vec<Weak<dyn Warmer>> = vec![Arc::downgrade(
-        &(price_dynamic_column.clone() as Arc<dyn Warmer>),
-    )];
-    let reader: IndexReader = index.reader_builder().warmers(warmers).try_into()?;
-    reader.reload()?;
+    let warmers = vec![Arc::downgrade(&price_dynamic_column) as Weak<dyn Warmer>];
+    let reader = index.reader_builder().warmers(warmers).try_into()?;

    let query_parser = QueryParser::for_index(&index, vec![text]);
    let query = query_parser.parse_query("cooking")?;
--- a/ownedbytes/Cargo.toml
+++ b/ownedbytes/Cargo.toml
@@ -1,7 +1,7 @@
 [package]
 authors = ["Paul Masurel <paul@quickwit.io>", "Pascal Seitz <pascal@quickwit.io>"]
 name = "ownedbytes"
-version = "0.5.0"
+version = "0.6.0"
 edition = "2021"
 description = "Expose data as static slice"
 license = "MIT"
--- a/ownedbytes/src/lib.rs
+++ b/ownedbytes/src/lib.rs
@@ -1,7 +1,7 @@
 use std::convert::TryInto;
 use std::ops::{Deref, Range};
 use std::sync::Arc;
-use std::{fmt, io, mem};
+use std::{fmt, io};

 pub use stable_deref_trait::StableDeref;

@@ -26,8 +26,8 @@ impl OwnedBytes {
        data_holder: T,
    ) -> OwnedBytes {
        let box_stable_deref = Arc::new(data_holder);
-        let bytes: &[u8] = box_stable_deref.as_ref();
-        let data = unsafe { mem::transmute::<_, &'static [u8]>(bytes.deref()) };
+        let bytes: &[u8] = box_stable_deref.deref();
+        let data = unsafe { &*(bytes as *const [u8]) };
        OwnedBytes {
            data,
            box_stable_deref,
@@ -57,6 +57,12 @@ impl OwnedBytes {
        self.data.len()
    }

+    /// Returns true iff this `OwnedBytes` is empty.
+    #[inline]
+    pub fn is_empty(&self) -> bool {
+        self.data.is_empty()
+    }
+
    /// Splits the OwnedBytes into two OwnedBytes `(left, right)`.
    ///
    /// Left will hold `split_len` bytes.
@@ -68,13 +74,14 @@ impl OwnedBytes {
    #[inline]
    #[must_use]
    pub fn split(self, split_len: usize) -> (OwnedBytes, OwnedBytes) {
+        let (left_data, right_data) = self.data.split_at(split_len);
        let right_box_stable_deref = self.box_stable_deref.clone();
        let left = OwnedBytes {
-            data: &self.data[..split_len],
+            data: left_data,
            box_stable_deref: self.box_stable_deref,
        };
        let right = OwnedBytes {
-            data: &self.data[split_len..],
+            data: right_data,
            box_stable_deref: right_box_stable_deref,
        };
        (left, right)
@@ -99,55 +106,45 @@ impl OwnedBytes {
    ///
    /// `self` is truncated to `split_len`, left with the remaining bytes.
    pub fn split_off(&mut self, split_len: usize) -> OwnedBytes {
+        let (left, right) = self.data.split_at(split_len);
        let right_box_stable_deref = self.box_stable_deref.clone();
        let right_piece = OwnedBytes {
-            data: &self.data[split_len..],
+            data: right,
            box_stable_deref: right_box_stable_deref,
        };
-        self.data = &self.data[..split_len];
+        self.data = left;
        right_piece
    }

-    /// Returns true iff this `OwnedBytes` is empty.
-    #[inline]
-    pub fn is_empty(&self) -> bool {
-        self.as_slice().is_empty()
-    }
-
    /// Drops the left most `advance_len` bytes.
    #[inline]
-    pub fn advance(&mut self, advance_len: usize) {
-        self.data = &self.data[advance_len..]
+    pub fn advance(&mut self, advance_len: usize) -> &[u8] {
+        let (data, rest) = self.data.split_at(advance_len);
+        self.data = rest;
+        data
    }

    /// Reads an `u8` from the `OwnedBytes` and advance by one byte.
    #[inline]
    pub fn read_u8(&mut self) -> u8 {
-        assert!(!self.is_empty());
-
-        let byte = self.as_slice()[0];
-        self.advance(1);
-        byte
+        self.advance(1)[0]
    }

-    /// Reads an `u64` encoded as little-endian from the `OwnedBytes` and advance by 8 bytes.
    #[inline]
-    pub fn read_u64(&mut self) -> u64 {
-        assert!(self.len() > 7);
-
-        let octlet: [u8; 8] = self.as_slice()[..8].try_into().unwrap();
-        self.advance(8);
-        u64::from_le_bytes(octlet)
+    fn read_n<const N: usize>(&mut self) -> [u8; N] {
+        self.advance(N).try_into().unwrap()
    }

    /// Reads an `u32` encoded as little-endian from the `OwnedBytes` and advance by 4 bytes.
    #[inline]
    pub fn read_u32(&mut self) -> u32 {
-        assert!(self.len() > 3);
+        u32::from_le_bytes(self.read_n())
+    }

-        let quad: [u8; 4] = self.as_slice()[..4].try_into().unwrap();
-        self.advance(4);
-        u32::from_le_bytes(quad)
+    /// Reads an `u64` encoded as little-endian from the `OwnedBytes` and advance by 8 bytes.
+    #[inline]
+    pub fn read_u64(&mut self) -> u64 {
+        u64::from_le_bytes(self.read_n())
    }
 }

@@ -201,32 +198,33 @@ impl Deref for OwnedBytes {
    }
 }

+impl AsRef<[u8]> for OwnedBytes {
+    #[inline]
+    fn as_ref(&self) -> &[u8] {
+        self.as_slice()
+    }
+}
+
 impl io::Read for OwnedBytes {
    #[inline]
    fn read(&mut self, buf: &mut [u8]) -> io::Result<usize> {
-        let read_len = {
-            let data = self.as_slice();
-            if data.len() >= buf.len() {
-                let buf_len = buf.len();
-                buf.copy_from_slice(&data[..buf_len]);
-                buf.len()
-            } else {
-                let data_len = data.len();
-                buf[..data_len].copy_from_slice(data);
-                data_len
-            }
-        };
-        self.advance(read_len);
-        Ok(read_len)
+        let data_len = self.data.len();
+        let buf_len = buf.len();
+        if data_len >= buf_len {
+            let data = self.advance(buf_len);
+            buf.copy_from_slice(data);
+            Ok(buf_len)
+        } else {
+            buf[..data_len].copy_from_slice(self.data);
+            self.data = &[];
+            Ok(data_len)
+        }
    }
    #[inline]
    fn read_to_end(&mut self, buf: &mut Vec<u8>) -> io::Result<usize> {
-        let read_len = {
-            let data = self.as_slice();
-            buf.extend(data);
-            data.len()
-        };
-        self.advance(read_len);
+        buf.extend(self.data);
+        let read_len = self.data.len();
+        self.data = &[];
        Ok(read_len)
    }
    #[inline]
@@ -242,13 +240,6 @@ impl io::Read for OwnedBytes {
    }
 }

-impl AsRef<[u8]> for OwnedBytes {
-    #[inline]
-    fn as_ref(&self) -> &[u8] {
-        self.as_slice()
-    }
-}
-
 #[cfg(test)]
 mod tests {
    use std::io::{self, Read};
--- a/query-grammar/Cargo.toml
+++ b/query-grammar/Cargo.toml
@@ -1,6 +1,6 @@
 [package]
 name = "tantivy-query-grammar"
-version = "0.19.0"
+version = "0.21.0"
 authors = ["Paul Masurel <paul.masurel@gmail.com>"]
 license = "MIT"
 categories = ["database-implementations", "data-structures"]
@@ -12,6 +12,4 @@ keywords = ["search", "information", "retrieval"]
 edition = "2021"

 [dependencies]
-combine = {version="4", default-features=false, features=[] }
-once_cell = "1.7.2"
-regex ={ version = "1.5.4", default-features = false, features = ["std", "unicode"] }
+nom = "7"
--- a/query-grammar/src/infallible.rs
+++ b/query-grammar/src/infallible.rs
@@ -0,0 +1,353 @@
+//! nom combinators for infallible operations
+
+use std::convert::Infallible;
+
+use nom::{AsChar, IResult, InputLength, InputTakeAtPosition};
+
+pub(crate) type ErrorList = Vec<LenientErrorInternal>;
+pub(crate) type JResult<I, O> = IResult<I, (O, ErrorList), Infallible>;
+
+/// An error, with an end-of-string based offset
+#[derive(Debug)]
+pub(crate) struct LenientErrorInternal {
+    pub pos: usize,
+    pub message: String,
+}
+
+/// A recoverable error and the position it happened at
+#[derive(Debug, PartialEq)]
+pub struct LenientError {
+    pub pos: usize,
+    pub message: String,
+}
+
+impl LenientError {
+    pub(crate) fn from_internal(internal: LenientErrorInternal, str_len: usize) -> LenientError {
+        LenientError {
+            pos: str_len - internal.pos,
+            message: internal.message,
+        }
+    }
+}
+
+fn unwrap_infallible<T>(res: Result<T, nom::Err<Infallible>>) -> T {
+    match res {
+        Ok(val) => val,
+        Err(_) => unreachable!(),
+    }
+}
+
+// when rfcs#1733 get stabilized, this can make things clearer
+// trait InfallibleParser<I, O> = nom::Parser<I, (O, ErrorList), std::convert::Infallible>;
+
+/// A variant of the classical `opt` parser, except it returns an infallible error type.
+///
+/// It's less generic than the original to ease type resolution in the rest of the code.
+pub(crate) fn opt_i<I: Clone, O, F>(mut f: F) -> impl FnMut(I) -> JResult<I, Option<O>>
+where F: nom::Parser<I, O, nom::error::Error<I>> {
+    move |input: I| {
+        let i = input.clone();
+        match f.parse(input) {
+            Ok((i, o)) => Ok((i, (Some(o), Vec::new()))),
+            Err(_) => Ok((i, (None, Vec::new()))),
+        }
+    }
+}
+
+pub(crate) fn opt_i_err<'a, I: Clone + InputLength, O, F>(
+    mut f: F,
+    message: impl ToString + 'a,
+) -> impl FnMut(I) -> JResult<I, Option<O>> + 'a
+where
+    F: nom::Parser<I, O, nom::error::Error<I>> + 'a,
+{
+    move |input: I| {
+        let i = input.clone();
+        match f.parse(input) {
+            Ok((i, o)) => Ok((i, (Some(o), Vec::new()))),
+            Err(_) => {
+                let errs = vec![LenientErrorInternal {
+                    pos: i.input_len(),
+                    message: message.to_string(),
+                }];
+                Ok((i, (None, errs)))
+            }
+        }
+    }
+}
+
+pub(crate) fn space0_infallible<T>(input: T) -> JResult<T, T>
+where
+    T: InputTakeAtPosition + Clone,
+    <T as InputTakeAtPosition>::Item: AsChar + Clone,
+{
+    opt_i(nom::character::complete::space0)(input)
+        .map(|(left, (spaces, errors))| (left, (spaces.expect("space0 can't fail"), errors)))
+}
+
+pub(crate) fn space1_infallible<T>(input: T) -> JResult<T, Option<T>>
+where
+    T: InputTakeAtPosition + Clone + InputLength,
+    <T as InputTakeAtPosition>::Item: AsChar + Clone,
+{
+    opt_i(nom::character::complete::space1)(input).map(|(left, (spaces, mut errors))| {
+        if spaces.is_none() {
+            errors.push(LenientErrorInternal {
+                pos: left.input_len(),
+                message: "missing space".to_string(),
+            })
+        }
+        (left, (spaces, errors))
+    })
+}
+
+pub(crate) fn fallible<I, O, E: nom::error::ParseError<I>, F>(
+    mut f: F,
+) -> impl FnMut(I) -> IResult<I, O, E>
+where F: nom::Parser<I, (O, ErrorList), Infallible> {
+    use nom::Err;
+    move |input: I| match f.parse(input) {
+        Ok((input, (output, _err))) => Ok((input, output)),
+        Err(Err::Incomplete(needed)) => Err(Err::Incomplete(needed)),
+        Err(Err::Error(val)) | Err(Err::Failure(val)) => match val {},
+    }
+}
+
+pub(crate) fn delimited_infallible<I, O1, O2, O3, F, G, H>(
+    mut first: F,
+    mut second: G,
+    mut third: H,
+) -> impl FnMut(I) -> JResult<I, O2>
+where
+    F: nom::Parser<I, (O1, ErrorList), Infallible>,
+    G: nom::Parser<I, (O2, ErrorList), Infallible>,
+    H: nom::Parser<I, (O3, ErrorList), Infallible>,
+{
+    move |input: I| {
+        let (input, (_, mut err)) = first.parse(input)?;
+        let (input, (o2, mut err2)) = second.parse(input)?;
+        err.append(&mut err2);
+        let (input, (_, mut err3)) = third.parse(input)?;
+        err.append(&mut err3);
+        Ok((input, (o2, err)))
+    }
+}
+
+// Parse nothing. Just a lazy way to not implement terminated/preceded and use delimited instead
+pub(crate) fn nothing(i: &str) -> JResult<&str, ()> {
+    Ok((i, ((), Vec::new())))
+}
+
+pub(crate) trait TupleInfallible<I, O> {
+    /// Parses the input and returns a tuple of results of each parser.
+    fn parse(&mut self, input: I) -> JResult<I, O>;
+}
+
+impl<Input, Output, F: nom::Parser<Input, (Output, ErrorList), Infallible>>
+    TupleInfallible<Input, (Output,)> for (F,)
+{
+    fn parse(&mut self, input: Input) -> JResult<Input, (Output,)> {
+        self.0.parse(input).map(|(i, (o, e))| (i, ((o,), e)))
+    }
+}
+
+// these macros are heavily copied from nom, with some minor adaptations for our type
+macro_rules! tuple_trait(
+  ($name1:ident $ty1:ident, $name2: ident $ty2:ident, $($name:ident $ty:ident),*) => (
+    tuple_trait!(__impl $name1 $ty1, $name2 $ty2; $($name $ty),*);
+  );
+  (__impl $($name:ident $ty: ident),+; $name1:ident $ty1:ident, $($name2:ident $ty2:ident),*) => (
+    tuple_trait_impl!($($name $ty),+);
+    tuple_trait!(__impl $($name $ty),+ , $name1 $ty1; $($name2 $ty2),*);
+  );
+  (__impl $($name:ident $ty: ident),+; $name1:ident $ty1:ident) => (
+    tuple_trait_impl!($($name $ty),+);
+    tuple_trait_impl!($($name $ty),+, $name1 $ty1);
+  );
+);
+
+macro_rules! tuple_trait_impl(
+  ($($name:ident $ty: ident),+) => (
+    impl<
+      Input: Clone, $($ty),+ ,
+      $($name: nom::Parser<Input, ($ty, ErrorList), Infallible>),+
+    > TupleInfallible<Input, ( $($ty),+ )> for ( $($name),+ ) {
+
+      fn parse(&mut self, input: Input) -> JResult<Input, ( $($ty),+ )> {
+        let mut error_list = Vec::new();
+        tuple_trait_inner!(0, self, input, (), error_list, $($name)+)
+      }
+    }
+  );
+);
+
+macro_rules! tuple_trait_inner(
+  ($it:tt, $self:expr, $input:expr, (), $error_list:expr, $head:ident $($id:ident)+) => ({
+    let (i, (o, mut err)) = $self.$it.parse($input.clone())?;
+    $error_list.append(&mut err);
+
+    succ!($it, tuple_trait_inner!($self, i, ( o ), $error_list, $($id)+))
+  });
+  ($it:tt, $self:expr, $input:expr, ($($parsed:tt)*), $error_list:expr, $head:ident $($id:ident)+) => ({
+    let (i, (o, mut err)) = $self.$it.parse($input.clone())?;
+    $error_list.append(&mut err);
+
+    succ!($it, tuple_trait_inner!($self, i, ($($parsed)* , o), $error_list, $($id)+))
+  });
+  ($it:tt, $self:expr, $input:expr, ($($parsed:tt)*), $error_list:expr, $head:ident) => ({
+    let (i, (o, mut err)) = $self.$it.parse($input.clone())?;
+    $error_list.append(&mut err);
+
+    Ok((i, (($($parsed)* , o), $error_list)))
+  });
+);
+
+macro_rules! succ (
+  (0, $submac:ident ! ($($rest:tt)*)) => ($submac!(1, $($rest)*));
+  (1, $submac:ident ! ($($rest:tt)*)) => ($submac!(2, $($rest)*));
+  (2, $submac:ident ! ($($rest:tt)*)) => ($submac!(3, $($rest)*));
+  (3, $submac:ident ! ($($rest:tt)*)) => ($submac!(4, $($rest)*));
+  (4, $submac:ident ! ($($rest:tt)*)) => ($submac!(5, $($rest)*));
+  (5, $submac:ident ! ($($rest:tt)*)) => ($submac!(6, $($rest)*));
+  (6, $submac:ident ! ($($rest:tt)*)) => ($submac!(7, $($rest)*));
+  (7, $submac:ident ! ($($rest:tt)*)) => ($submac!(8, $($rest)*));
+  (8, $submac:ident ! ($($rest:tt)*)) => ($submac!(9, $($rest)*));
+  (9, $submac:ident ! ($($rest:tt)*)) => ($submac!(10, $($rest)*));
+  (10, $submac:ident ! ($($rest:tt)*)) => ($submac!(11, $($rest)*));
+  (11, $submac:ident ! ($($rest:tt)*)) => ($submac!(12, $($rest)*));
+  (12, $submac:ident ! ($($rest:tt)*)) => ($submac!(13, $($rest)*));
+  (13, $submac:ident ! ($($rest:tt)*)) => ($submac!(14, $($rest)*));
+  (14, $submac:ident ! ($($rest:tt)*)) => ($submac!(15, $($rest)*));
+  (15, $submac:ident ! ($($rest:tt)*)) => ($submac!(16, $($rest)*));
+  (16, $submac:ident ! ($($rest:tt)*)) => ($submac!(17, $($rest)*));
+  (17, $submac:ident ! ($($rest:tt)*)) => ($submac!(18, $($rest)*));
+  (18, $submac:ident ! ($($rest:tt)*)) => ($submac!(19, $($rest)*));
+  (19, $submac:ident ! ($($rest:tt)*)) => ($submac!(20, $($rest)*));
+  (20, $submac:ident ! ($($rest:tt)*)) => ($submac!(21, $($rest)*));
+);
+
+tuple_trait!(FnA A, FnB B, FnC C, FnD D, FnE E, FnF F, FnG G, FnH H, FnI I, FnJ J, FnK K, FnL L,
+  FnM M, FnN N, FnO O, FnP P, FnQ Q, FnR R, FnS S, FnT T, FnU U);
+
+// Special case: implement `TupleInfallible` for `()`, the unit type.
+// This can come up in macros which accept a variable number of arguments.
+// Literally, `()` is an empty tuple, so it should simply parse nothing.
+impl<I> TupleInfallible<I, ()> for () {
+    fn parse(&mut self, input: I) -> JResult<I, ()> {
+        Ok((input, ((), Vec::new())))
+    }
+}
+
+pub(crate) fn tuple_infallible<I, O, List: TupleInfallible<I, O>>(
+    mut l: List,
+) -> impl FnMut(I) -> JResult<I, O> {
+    move |i: I| l.parse(i)
+}
+
+pub(crate) fn separated_list_infallible<I, O, O2, F, G>(
+    mut sep: G,
+    mut f: F,
+) -> impl FnMut(I) -> JResult<I, Vec<O>>
+where
+    I: Clone + InputLength,
+    F: nom::Parser<I, (O, ErrorList), Infallible>,
+    G: nom::Parser<I, (O2, ErrorList), Infallible>,
+{
+    move |i: I| {
+        let mut res: Vec<O> = Vec::new();
+        let mut errors: ErrorList = Vec::new();
+
+        let (mut i, (o, mut err)) = unwrap_infallible(f.parse(i.clone()));
+        errors.append(&mut err);
+        res.push(o);
+
+        loop {
+            let (i_sep_parsed, (_, mut err_sep)) = unwrap_infallible(sep.parse(i.clone()));
+            let len_before = i_sep_parsed.input_len();
+
+            let (i_elem_parsed, (o, mut err_elem)) =
+                unwrap_infallible(f.parse(i_sep_parsed.clone()));
+
+            // infinite loop check: the parser must always consume
+            // if we consumed nothing here, don't produce an element.
+            if i_elem_parsed.input_len() == len_before {
+                return Ok((i, (res, errors)));
+            }
+            res.push(o);
+            errors.append(&mut err_sep);
+            errors.append(&mut err_elem);
+            i = i_elem_parsed;
+        }
+    }
+}
+
+pub(crate) trait Alt<I, O> {
+    /// Tests each parser in the tuple and returns the result of the first one that succeeds
+    fn choice(&mut self, input: I) -> Option<JResult<I, O>>;
+}
+
+macro_rules! alt_trait(
+  ($first_cond:ident $first:ident, $($id_cond:ident $id: ident),+) => (
+    alt_trait!(__impl $first_cond $first; $($id_cond $id),+);
+  );
+  (__impl $($current_cond:ident $current:ident),*; $head_cond:ident $head:ident, $($id_cond:ident $id:ident),+) => (
+    alt_trait_impl!($($current_cond $current),*);
+
+    alt_trait!(__impl $($current_cond $current,)* $head_cond $head; $($id_cond $id),+);
+  );
+  (__impl $($current_cond:ident $current:ident),*; $head_cond:ident $head:ident) => (
+    alt_trait_impl!($($current_cond $current),*);
+    alt_trait_impl!($($current_cond $current,)* $head_cond $head);
+  );
+);
+
+macro_rules! alt_trait_impl(
+  ($($id_cond:ident $id:ident),+) => (
+    impl<
+      Input: Clone, Output,
+      $(
+          // () are to make things easier on me, but I'm not entirely sure whether we can do better
+          // with rule E0207
+          $id_cond: nom::Parser<Input, (), ()>,
+          $id: nom::Parser<Input, (Output, ErrorList), Infallible>
+      ),+
+    > Alt<Input, Output> for ( $(($id_cond, $id),)+ ) {
+
+      fn choice(&mut self, input: Input) -> Option<JResult<Input, Output>> {
+        match self.0.0.parse(input.clone()) {
+          Err(_) => alt_trait_inner!(1, self, input, $($id_cond $id),+),
+          Ok((input_left, _)) => Some(self.0.1.parse(input_left)),
+        }
+      }
+    }
+  );
+);
+
+macro_rules! alt_trait_inner(
+  ($it:tt, $self:expr, $input:expr, $head_cond:ident $head:ident, $($id_cond:ident $id:ident),+) => (
+    match $self.$it.0.parse($input.clone()) {
+      Err(_) => succ!($it, alt_trait_inner!($self, $input, $($id_cond $id),+)),
+      Ok((input_left, _)) => Some($self.$it.1.parse(input_left)),
+    }
+  );
+  ($it:tt, $self:expr, $input:expr, $head_cond:ident $head:ident) => (
+    None
+  );
+);
+
+alt_trait!(A1 A, B1 B, C1 C, D1 D, E1 E, F1 F, G1 G, H1 H, I1 I, J1 J, K1 K,
+           L1 L, M1 M, N1 N, O1 O, P1 P, Q1 Q, R1 R, S1 S, T1 T, U1 U);
+
+/// An alt() like combinator. For each branch, it first tries a fallible parser, which commits to
+/// this branch, or tells to check next branch, and the execute the infallible parser which follow.
+///
+/// In case no branch match, the default (fallible) parser is executed.
+pub(crate) fn alt_infallible<I: Clone, O, F, List: Alt<I, O>>(
+    mut l: List,
+    mut default: F,
+) -> impl FnMut(I) -> JResult<I, O>
+where
+    F: nom::Parser<I, (O, ErrorList), Infallible>,
+{
+    move |i: I| l.choice(i.clone()).unwrap_or_else(|| default.parse(i))
+}
--- a/query-grammar/src/lib.rs
+++ b/query-grammar/src/lib.rs
@@ -1,19 +1,26 @@
 #![allow(clippy::derive_partial_eq_without_eq)]

+mod infallible;
 mod occur;
 mod query_grammar;
 mod user_input_ast;
-use combine::parser::Parser;

+pub use crate::infallible::LenientError;
 pub use crate::occur::Occur;
-use crate::query_grammar::parse_to_ast;
+use crate::query_grammar::{parse_to_ast, parse_to_ast_lenient};
 pub use crate::user_input_ast::{
    Delimiter, UserInputAst, UserInputBound, UserInputLeaf, UserInputLiteral,
 };

 pub struct Error;

+/// Parse a query
 pub fn parse_query(query: &str) -> Result<UserInputAst, Error> {
-    let (user_input_ast, _remaining) = parse_to_ast().parse(query).map_err(|_| Error)?;
+    let (_remaining, user_input_ast) = parse_to_ast(query).map_err(|_| Error)?;
    Ok(user_input_ast)
 }
+
+/// Parse a query, trying to recover from syntax errors, and giving hints toward fixing errors.
+pub fn parse_query_lenient(query: &str) -> (UserInputAst, Vec<LenientError>) {
+    parse_to_ast_lenient(query)
+}
--- a/query-grammar/src/query_grammar.rs
+++ b/query-grammar/src/query_grammar.rs
--- a/query-grammar/src/user_input_ast.rs
+++ b/query-grammar/src/user_input_ast.rs
@@ -3,7 +3,7 @@ use std::fmt::{Debug, Formatter};

 use crate::Occur;

-#[derive(PartialEq)]
+#[derive(PartialEq, Clone)]
 pub enum UserInputLeaf {
    Literal(UserInputLiteral),
    All,
@@ -18,6 +18,28 @@ pub enum UserInputLeaf {
    },
 }

+impl UserInputLeaf {
+    pub(crate) fn set_field(self, field: Option<String>) -> Self {
+        match self {
+            UserInputLeaf::Literal(mut literal) => {
+                literal.field_name = field;
+                UserInputLeaf::Literal(literal)
+            }
+            UserInputLeaf::All => UserInputLeaf::All,
+            UserInputLeaf::Range {
+                field: _,
+                lower,
+                upper,
+            } => UserInputLeaf::Range {
+                field,
+                lower,
+                upper,
+            },
+            UserInputLeaf::Set { field: _, elements } => UserInputLeaf::Set { field, elements },
+        }
+    }
+}
+
 impl Debug for UserInputLeaf {
    fn fmt(&self, formatter: &mut Formatter) -> Result<(), fmt::Error> {
        match self {
@@ -28,6 +50,7 @@ impl Debug for UserInputLeaf {
                ref upper,
            } => {
                if let Some(ref field) = field {
+                    // TODO properly escape field (in case of \")
                    write!(formatter, "\"{field}\":")?;
                }
                lower.display_lower(formatter)?;
@@ -37,6 +60,7 @@ impl Debug for UserInputLeaf {
            }
            UserInputLeaf::Set { field, elements } => {
                if let Some(ref field) = field {
+                    // TODO properly escape field (in case of \")
                    write!(formatter, "\"{field}\": ")?;
                }
                write!(formatter, "IN [")?;
@@ -44,6 +68,7 @@ impl Debug for UserInputLeaf {
                    if i != 0 {
                        write!(formatter, " ")?;
                    }
+                    // TODO properly escape element
                    write!(formatter, "\"{text}\"")?;
                }
                write!(formatter, "]")
@@ -60,38 +85,45 @@ pub enum Delimiter {
    None,
 }

-#[derive(PartialEq)]
+#[derive(PartialEq, Clone)]
 pub struct UserInputLiteral {
    pub field_name: Option<String>,
    pub phrase: String,
    pub delimiter: Delimiter,
    pub slop: u32,
+    pub prefix: bool,
 }

 impl fmt::Debug for UserInputLiteral {
    fn fmt(&self, formatter: &mut fmt::Formatter) -> Result<(), fmt::Error> {
        if let Some(ref field) = self.field_name {
+            // TODO properly escape field (in case of \")
            write!(formatter, "\"{field}\":")?;
        }
        match self.delimiter {
            Delimiter::SingleQuotes => {
+                // TODO properly escape element (in case of \')
                write!(formatter, "'{}'", self.phrase)?;
            }
            Delimiter::DoubleQuotes => {
+                // TODO properly escape element (in case of \")
                write!(formatter, "\"{}\"", self.phrase)?;
            }
            Delimiter::None => {
+                // TODO properly escape element
                write!(formatter, "{}", self.phrase)?;
            }
        }
        if self.slop > 0 {
            write!(formatter, "~{}", self.slop)?;
+        } else if self.prefix {
+            write!(formatter, "*")?;
        }
        Ok(())
    }
 }

-#[derive(PartialEq)]
+#[derive(PartialEq, Debug, Clone)]
 pub enum UserInputBound {
    Inclusive(String),
    Exclusive(String),
@@ -101,6 +133,7 @@ pub enum UserInputBound {
 impl UserInputBound {
    fn display_lower(&self, formatter: &mut fmt::Formatter) -> Result<(), fmt::Error> {
        match *self {
+            // TODO properly escape word if required
            UserInputBound::Inclusive(ref word) => write!(formatter, "[\"{word}\""),
            UserInputBound::Exclusive(ref word) => write!(formatter, "{{\"{word}\""),
            UserInputBound::Unbounded => write!(formatter, "{{\"*\""),
@@ -109,6 +142,7 @@ impl UserInputBound {

    fn display_upper(&self, formatter: &mut fmt::Formatter) -> Result<(), fmt::Error> {
        match *self {
+            // TODO properly escape word if required
            UserInputBound::Inclusive(ref word) => write!(formatter, "\"{word}\"]"),
            UserInputBound::Exclusive(ref word) => write!(formatter, "\"{word}\"}}"),
            UserInputBound::Unbounded => write!(formatter, "\"*\"}}"),
@@ -124,6 +158,7 @@ impl UserInputBound {
    }
 }

+#[derive(PartialEq, Clone)]
 pub enum UserInputAst {
    Clause(Vec<(Option<Occur>, UserInputAst)>),
    Leaf(Box<UserInputLeaf>),
@@ -193,6 +228,7 @@ impl fmt::Debug for UserInputAst {
        match *self {
            UserInputAst::Clause(ref subqueries) => {
                if subqueries.is_empty() {
+                    // TODO this will break ast reserialization, is writing "( )" enought?
                    write!(formatter, "<emptyclause>")?;
                } else {
                    write!(formatter, "(")?;
--- a/src/aggregation/agg_limits.rs
+++ b/src/aggregation/agg_limits.rs
@@ -60,6 +60,8 @@ impl AggregationLimits {
    /// *bucket_limit*
    /// Limits the maximum number of buckets returned from an aggregation request.
    /// bucket_limit will default to `DEFAULT_BUCKET_LIMIT` (65000)
+    ///
+    /// Note: The returned instance contains a Arc shared counter to track memory consumption.
    pub fn new(memory_limit: Option<u64>, bucket_limit: Option<u32>) -> Self {
        Self {
            memory_consumption: Default::default(),
--- a/src/aggregation/agg_req.rs
+++ b/src/aggregation/agg_req.rs
@@ -44,22 +44,49 @@ use super::metric::{
 /// The key is the user defined name of the aggregation.
 pub type Aggregations = HashMap<String, Aggregation>;

-#[derive(Clone, Debug, PartialEq, Serialize, Deserialize)]
 /// Aggregation request.
 ///
 /// An aggregation is either a bucket or a metric.
+#[derive(Clone, Debug, PartialEq, Serialize, Deserialize)]
+#[serde(try_from = "AggregationForDeserialization")]
 pub struct Aggregation {
    /// The aggregation variant, which can be either a bucket or a metric.
    #[serde(flatten)]
    pub agg: AggregationVariants,
-    /// The sub_aggregations, only valid for bucket type aggregations. Each bucket will aggregate
    /// on the document set in the bucket.
    #[serde(rename = "aggs")]
-    #[serde(default)]
    #[serde(skip_serializing_if = "Aggregations::is_empty")]
    pub sub_aggregation: Aggregations,
 }

+/// In order to display proper error message, we cannot rely on flattening
+/// the json enum. Instead we introduce an intermediary struct to separate
+/// the aggregation from the subaggregation.
+#[derive(Deserialize)]
+struct AggregationForDeserialization {
+    #[serde(flatten)]
+    pub aggs_remaining_json: serde_json::Value,
+    #[serde(rename = "aggs")]
+    #[serde(default)]
+    pub sub_aggregation: Aggregations,
+}
+
+impl TryFrom<AggregationForDeserialization> for Aggregation {
+    type Error = serde_json::Error;
+
+    fn try_from(value: AggregationForDeserialization) -> serde_json::Result<Self> {
+        let AggregationForDeserialization {
+            aggs_remaining_json,
+            sub_aggregation,
+        } = value;
+        let agg: AggregationVariants = serde_json::from_value(aggs_remaining_json)?;
+        Ok(Aggregation {
+            agg,
+            sub_aggregation,
+        })
+    }
+}
+
 impl Aggregation {
    pub(crate) fn sub_aggregation(&self) -> &Aggregations {
        &self.sub_aggregation
@@ -123,7 +150,8 @@ pub enum AggregationVariants {
 }

 impl AggregationVariants {
-    fn get_fast_field_name(&self) -> &str {
+    /// Returns the name of the field used by the aggregation.
+    pub fn get_fast_field_name(&self) -> &str {
        match self {
            AggregationVariants::Terms(terms) => terms.field.as_str(),
            AggregationVariants::Range(range) => range.field.as_str(),
--- a/src/aggregation/agg_req_with_accessor.rs
+++ b/src/aggregation/agg_req_with_accessor.rs
@@ -13,6 +13,7 @@ use super::metric::{
 };
 use super::segment_agg_result::AggregationLimits;
 use super::VecWithNames;
+use crate::aggregation::{f64_to_fastfield_u64, Key};
 use crate::SegmentReader;

 #[derive(Default)]
@@ -35,96 +36,230 @@ pub struct AggregationWithAccessor {
    /// based on search terms. That is not that case currently, but eventually this needs to be
    /// Option or moved.
    pub(crate) accessor: Column<u64>,
+    /// Load insert u64 for missing use case
+    pub(crate) missing_value_for_accessor: Option<u64>,
    pub(crate) str_dict_column: Option<StrColumn>,
    pub(crate) field_type: ColumnType,
-    /// In case there are multiple types of fast fields, e.g. string and numeric.
-    /// Only used for term aggregations currently.
-    pub(crate) accessor2: Option<(Column<u64>, ColumnType)>,
    pub(crate) sub_aggregation: AggregationsWithAccessor,
    pub(crate) limits: ResourceLimitGuard,
    pub(crate) column_block_accessor: ColumnBlockAccessor<u64>,
+    /// Used for missing term aggregation, which checks all columns for existence.
+    /// By convention the missing aggregation is chosen, when this property is set
+    /// (instead bein set in `agg`).
+    /// If this needs to used by other aggregations, we need to refactor this.
+    pub(crate) accessors: Vec<Column<u64>>,
    pub(crate) agg: Aggregation,
 }

 impl AggregationWithAccessor {
+    /// May return multiple accessors if the aggregation is e.g. on mixed field types.
    fn try_from_agg(
        agg: &Aggregation,
        sub_aggregation: &Aggregations,
        reader: &SegmentReader,
        limits: AggregationLimits,
-    ) -> crate::Result<AggregationWithAccessor> {
-        let mut str_dict_column = None;
-        let mut accessor2 = None;
+    ) -> crate::Result<Vec<AggregationWithAccessor>> {
+        let add_agg_with_accessor = |accessor: Column<u64>,
+                                     column_type: ColumnType,
+                                     aggs: &mut Vec<AggregationWithAccessor>|
+         -> crate::Result<()> {
+            let res = AggregationWithAccessor {
+                accessor,
+                accessors: Vec::new(),
+                field_type: column_type,
+                sub_aggregation: get_aggs_with_segment_accessor_and_validate(
+                    sub_aggregation,
+                    reader,
+                    &limits,
+                )?,
+                agg: agg.clone(),
+                limits: limits.new_guard(),
+                missing_value_for_accessor: None,
+                str_dict_column: None,
+                column_block_accessor: Default::default(),
+            };
+            aggs.push(res);
+            Ok(())
+        };
+
+        let mut res: Vec<AggregationWithAccessor> = Vec::new();
        use AggregationVariants::*;
-        let (accessor, field_type) = match &agg.agg {
+        match &agg.agg {
            Range(RangeAggregation {
                field: field_name, ..
-            }) => get_ff_reader(reader, field_name, Some(get_numeric_or_date_column_types()))?,
+            }) => {
+                let (accessor, column_type) =
+                    get_ff_reader(reader, field_name, Some(get_numeric_or_date_column_types()))?;
+                add_agg_with_accessor(accessor, column_type, &mut res)?;
+            }
            Histogram(HistogramAggregation {
                field: field_name, ..
-            }) => get_ff_reader(reader, field_name, Some(get_numeric_or_date_column_types()))?,
+            }) => {
+                let (accessor, column_type) =
+                    get_ff_reader(reader, field_name, Some(get_numeric_or_date_column_types()))?;
+                add_agg_with_accessor(accessor, column_type, &mut res)?;
+            }
            DateHistogram(DateHistogramAggregationReq {
                field: field_name, ..
-            }) => get_ff_reader(reader, field_name, Some(get_numeric_or_date_column_types()))?,
-            Terms(TermsAggregation {
-                field: field_name, ..
            }) => {
-                str_dict_column = reader.fast_fields().str(field_name)?;
+                let (accessor, column_type) =
+                    get_ff_reader(reader, field_name, Some(get_numeric_or_date_column_types()))?;
+                add_agg_with_accessor(accessor, column_type, &mut res)?;
+            }
+            Terms(TermsAggregation {
+                field: field_name,
+                missing,
+                ..
+            }) => {
+                let str_dict_column = reader.fast_fields().str(field_name)?;
                let allowed_column_types = [
                    ColumnType::I64,
                    ColumnType::U64,
                    ColumnType::F64,
-                    ColumnType::Bytes,
                    ColumnType::Str,
+                    // ColumnType::Bytes Unsupported
                    // ColumnType::Bool Unsupported
                    // ColumnType::IpAddr Unsupported
                    // ColumnType::DateTime Unsupported
                ];
-                let mut columns =
-                    get_all_ff_reader(reader, field_name, Some(&allowed_column_types))?;
-                let first = columns.pop().unwrap();
-                accessor2 = columns.pop();
-                first
-            }
-            Average(AverageAggregation { field: field_name })
-            | Count(CountAggregation { field: field_name })
-            | Max(MaxAggregation { field: field_name })
-            | Min(MinAggregation { field: field_name })
-            | Stats(StatsAggregation { field: field_name })
-            | Sum(SumAggregation { field: field_name }) => {
-                let (accessor, field_type) =
-                    get_ff_reader(reader, field_name, Some(get_numeric_or_date_column_types()))?;

-                (accessor, field_type)
+                // In case the column is empty we want the shim column to match the missing type
+                let fallback_type = missing
+                    .as_ref()
+                    .map(|missing| match missing {
+                        Key::Str(_) => ColumnType::Str,
+                        Key::F64(_) => ColumnType::F64,
+                    })
+                    .unwrap_or(ColumnType::U64);
+                let column_and_types = get_all_ff_reader_or_empty(
+                    reader,
+                    field_name,
+                    Some(&allowed_column_types),
+                    fallback_type,
+                )?;
+                let missing_and_more_than_one_col = column_and_types.len() > 1 && missing.is_some();
+                let text_on_non_text_col = column_and_types.len() == 1
+                    && column_and_types[0].1.numerical_type().is_some()
+                    && missing
+                        .as_ref()
+                        .map(|m| matches!(m, Key::Str(_)))
+                        .unwrap_or(false);
+
+                let use_special_missing_agg = missing_and_more_than_one_col || text_on_non_text_col;
+                if use_special_missing_agg {
+                    let column_and_types =
+                        get_all_ff_reader_or_empty(reader, field_name, None, fallback_type)?;
+
+                    let accessors: Vec<Column> =
+                        column_and_types.iter().map(|(a, _)| a.clone()).collect();
+                    let agg_wit_acc = AggregationWithAccessor {
+                        missing_value_for_accessor: None,
+                        accessor: accessors[0].clone(),
+                        accessors,
+                        field_type: ColumnType::U64,
+                        sub_aggregation: get_aggs_with_segment_accessor_and_validate(
+                            sub_aggregation,
+                            reader,
+                            &limits,
+                        )?,
+                        agg: agg.clone(),
+                        str_dict_column: str_dict_column.clone(),
+                        limits: limits.new_guard(),
+                        column_block_accessor: Default::default(),
+                    };
+                    res.push(agg_wit_acc);
+                }
+
+                for (accessor, column_type) in column_and_types {
+                    let missing_value_term_agg = if use_special_missing_agg {
+                        None
+                    } else {
+                        missing.clone()
+                    };
+
+                    let missing_value_for_accessor =
+                        if let Some(missing) = missing_value_term_agg.as_ref() {
+                            get_missing_val(column_type, missing, agg.agg.get_fast_field_name())?
+                        } else {
+                            None
+                        };
+
+                    let agg = AggregationWithAccessor {
+                        missing_value_for_accessor,
+                        accessor,
+                        accessors: Vec::new(),
+                        field_type: column_type,
+                        sub_aggregation: get_aggs_with_segment_accessor_and_validate(
+                            sub_aggregation,
+                            reader,
+                            &limits,
+                        )?,
+                        agg: agg.clone(),
+                        str_dict_column: str_dict_column.clone(),
+                        limits: limits.new_guard(),
+                        column_block_accessor: Default::default(),
+                    };
+                    res.push(agg);
+                }
+            }
+            Average(AverageAggregation {
+                field: field_name, ..
+            })
+            | Count(CountAggregation {
+                field: field_name, ..
+            })
+            | Max(MaxAggregation {
+                field: field_name, ..
+            })
+            | Min(MinAggregation {
+                field: field_name, ..
+            })
+            | Stats(StatsAggregation {
+                field: field_name, ..
+            })
+            | Sum(SumAggregation {
+                field: field_name, ..
+            }) => {
+                let (accessor, column_type) =
+                    get_ff_reader(reader, field_name, Some(get_numeric_or_date_column_types()))?;
+                add_agg_with_accessor(accessor, column_type, &mut res)?;
            }
            Percentiles(percentiles) => {
-                let (accessor, field_type) = get_ff_reader(
+                let (accessor, column_type) = get_ff_reader(
                    reader,
                    percentiles.field_name(),
                    Some(get_numeric_or_date_column_types()),
                )?;
-                (accessor, field_type)
+                add_agg_with_accessor(accessor, column_type, &mut res)?;
            }
        };

-        let sub_aggregation = sub_aggregation.clone();
-        Ok(AggregationWithAccessor {
-            accessor,
-            accessor2,
-            field_type,
-            sub_aggregation: get_aggs_with_segment_accessor_and_validate(
-                &sub_aggregation,
-                reader,
-                &limits,
-            )?,
-            agg: agg.clone(),
-            str_dict_column,
-            limits: limits.new_guard(),
-            column_block_accessor: Default::default(),
-        })
+        Ok(res)
    }
 }

+fn get_missing_val(
+    column_type: ColumnType,
+    missing: &Key,
+    field_name: &str,
+) -> crate::Result<Option<u64>> {
+    let missing_val = match missing {
+        Key::Str(_) if column_type == ColumnType::Str => Some(u64::MAX),
+        // Allow fallback to number on text fields
+        Key::F64(_) if column_type == ColumnType::Str => Some(u64::MAX),
+        Key::F64(val) if column_type.numerical_type().is_some() => {
+            f64_to_fastfield_u64(*val, &column_type)
+        }
+        _ => {
+            return Err(crate::TantivyError::InvalidArgument(format!(
+                "Missing value {:?} for field {} is not supported for column type {:?}",
+                missing, field_name, column_type
+            )));
+        }
+    };
+    Ok(missing_val)
+}
+
 fn get_numeric_or_date_column_types() -> &'static [ColumnType] {
    &[
        ColumnType::F64,
@@ -141,15 +276,15 @@ pub(crate) fn get_aggs_with_segment_accessor_and_validate(
 ) -> crate::Result<AggregationsWithAccessor> {
    let mut aggss = Vec::new();
    for (key, agg) in aggs.iter() {
-        aggss.push((
-            key.to_string(),
-            AggregationWithAccessor::try_from_agg(
-                agg,
-                agg.sub_aggregation(),
-                reader,
-                limits.clone(),
-            )?,
-        ));
+        let aggs = AggregationWithAccessor::try_from_agg(
+            agg,
+            agg.sub_aggregation(),
+            reader,
+            limits.clone(),
+        )?;
+        for agg in aggs {
+            aggss.push((key.to_string(), agg));
+        }
    }
    Ok(AggregationsWithAccessor::from_data(
        VecWithNames::from_entries(aggss),
@@ -177,19 +312,17 @@ fn get_ff_reader(
 /// Get all fast field reader or empty as default.
 ///
 /// Is guaranteed to return at least one column.
-fn get_all_ff_reader(
+fn get_all_ff_reader_or_empty(
    reader: &SegmentReader,
    field_name: &str,
    allowed_column_types: Option<&[ColumnType]>,
+    fallback_type: ColumnType,
 ) -> crate::Result<Vec<(columnar::Column<u64>, ColumnType)>> {
    let ff_fields = reader.fast_fields();
    let mut ff_field_with_type =
        ff_fields.u64_lenient_for_type_all(allowed_column_types, field_name)?;
    if ff_field_with_type.is_empty() {
-        ff_field_with_type.push((
-            Column::build_empty_column(reader.num_docs()),
-            ColumnType::U64,
-        ));
+        ff_field_with_type.push((Column::build_empty_column(reader.num_docs()), fallback_type));
    }
    Ok(ff_field_with_type)
 }
--- a/src/aggregation/agg_tests.rs
+++ b/src/aggregation/agg_tests.rs
@@ -558,10 +558,10 @@ fn test_aggregation_invalid_requests() -> crate::Result<()> {

    assert_eq!(agg_req_1.is_err(), true);
    // TODO: This should list valid values
-    assert_eq!(
-        agg_req_1.unwrap_err().to_string(),
-        "no variant of enum AggregationVariants found in flattened data"
-    );
+    assert!(agg_req_1
+        .unwrap_err()
+        .to_string()
+        .contains("unknown variant `doesnotmatchanyagg`, expected one of"));

    // TODO: This should return an error
    // let agg_res = avg_on_field("not_exist_field").unwrap_err();
--- a/src/aggregation/bucket/histogram/date_histogram.rs
+++ b/src/aggregation/bucket/histogram/date_histogram.rs
@@ -604,6 +604,42 @@ mod tests {
            });
            assert_eq!(res, expected_res);
        }
+
+        {
+            // 1day + hard_bounds as Rfc3339
+            let elasticsearch_compatible_json = json!(
+                {
+                    "sales_over_time": {
+                        "date_histogram": {
+                            "field": "date",
+                            "fixed_interval": "1d",
+                            "hard_bounds": {
+                                "min": "2015-01-02T00:00:00Z",
+                                "max": "2015-01-02T12:00:00Z"
+                            }
+                        }
+                    }
+                }
+            );
+
+            let agg_req: Aggregations = serde_json::from_str(
+                &serde_json::to_string(&elasticsearch_compatible_json).unwrap(),
+            )
+            .unwrap();
+            let res = exec_request(agg_req, &index).unwrap();
+            let expected_res = json!({
+                "sales_over_time" : {
+                    "buckets": [
+                        {
+                            "doc_count": 1,
+                            "key": 1420156800000.0,
+                            "key_as_string": "2015-01-02T00:00:00Z"
+                        }
+                    ]
+                }
+            });
+            assert_eq!(res, expected_res);
+        }
    }
    #[test]
    fn histogram_test_invalid_req() {
--- a/src/aggregation/bucket/histogram/histogram.rs
+++ b/src/aggregation/bucket/histogram/histogram.rs
@@ -177,11 +177,38 @@ impl HistogramAggregation {
 #[derive(Clone, Copy, Debug, PartialEq, Serialize, Deserialize)]
 pub struct HistogramBounds {
    /// The lower bounds.
+    #[serde(deserialize_with = "deserialize_date_or_num")]
    pub min: f64,
    /// The upper bounds.
+    #[serde(deserialize_with = "deserialize_date_or_num")]
    pub max: f64,
 }

+fn deserialize_date_or_num<'de, D>(deserializer: D) -> Result<f64, D::Error>
+where D: serde::Deserializer<'de> {
+    let value: serde_json::Value = Deserialize::deserialize(deserializer)?;
+
+    // Check if the value is a string representing an Rfc3339 formatted date
+    if let serde_json::Value::String(date_str) = value {
+        // Parse the Rfc3339 formatted date string into a DateTime<Utc>
+        let date =
+            time::OffsetDateTime::parse(&date_str, &time::format_description::well_known::Rfc3339)
+                .map_err(|_| serde::de::Error::custom("Invalid Rfc3339 formatted date"))?;
+
+        let milliseconds: i64 = (date.unix_timestamp_nanos() / 1_000_000)
+            .try_into()
+            .map_err(|_| serde::de::Error::custom("{date_str} out of allowed range"))?;
+
+        // Return the milliseconds as f64
+        Ok(milliseconds as f64)
+    } else {
+        // The value is not a string, so assume it's a regular f64 number
+        value
+            .as_f64()
+            .ok_or_else(|| serde::de::Error::custom("Invalid number format"))
+    }
+}
+
 impl Display for HistogramBounds {
    fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
        f.write_fmt(format_args!("[{},{}]", self.min, self.max))
@@ -324,6 +351,7 @@ impl SegmentHistogramCollector {
        let buckets_mem = self.buckets.memory_consumption();
        self_mem + sub_aggs_mem + buckets_mem
    }
+    /// Converts the collector result into a intermediate bucket result.
    pub fn into_intermediate_bucket_result(
        self,
        agg_with_accessor: &AggregationWithAccessor,
@@ -426,15 +454,12 @@ fn intermediate_buckets_to_final_buckets_fill_gaps(

    let final_buckets: Vec<BucketEntry> = buckets
        .into_iter()
-        .merge_join_by(
-            fill_gaps_buckets.into_iter(),
-            |existing_bucket, fill_gaps_bucket| {
-                existing_bucket
-                    .key
-                    .partial_cmp(fill_gaps_bucket)
-                    .unwrap_or(Ordering::Equal)
-            },
-        )
+        .merge_join_by(fill_gaps_buckets, |existing_bucket, fill_gaps_bucket| {
+            existing_bucket
+                .key
+                .partial_cmp(fill_gaps_bucket)
+                .unwrap_or(Ordering::Equal)
+        })
        .map(|either| match either {
            // Ignore the generated bucket
            itertools::EitherOrBoth::Both(existing, _) => existing,
--- a/src/aggregation/bucket/mod.rs
+++ b/src/aggregation/bucket/mod.rs
@@ -15,19 +15,25 @@
 //! Results of final buckets are [`BucketResult`](super::agg_result::BucketResult).
 //! Results of intermediate buckets are
 //! [`IntermediateBucketResult`](super::intermediate_agg_result::IntermediateBucketResult)
+//!
+//! ## Supported Bucket Aggregations
+//! - [Histogram](HistogramAggregation)
+//! - [DateHistogram](DateHistogramAggregationReq)
+//! - [Range](RangeAggregation)
+//! - [Terms](TermsAggregation)

 mod histogram;
 mod range;
 mod term_agg;
+mod term_missing_agg;

 use std::collections::HashMap;

-pub(crate) use histogram::SegmentHistogramCollector;
 pub use histogram::*;
-pub(crate) use range::SegmentRangeCollector;
 pub use range::*;
 use serde::{de, Deserialize, Deserializer, Serialize, Serializer};
 pub use term_agg::*;
+pub use term_missing_agg::*;

 /// Order for buckets in a bucket aggregation.
 #[derive(Clone, Copy, Debug, PartialEq, Serialize, Deserialize, Default)]
--- a/src/aggregation/bucket/range.rs
+++ b/src/aggregation/bucket/range.rs
@@ -262,7 +262,7 @@ impl SegmentRangeCollector {
    pub(crate) fn from_req_and_validate(
        req: &RangeAggregation,
        sub_aggregation: &mut AggregationsWithAccessor,
-        limits: &mut ResourceLimitGuard,
+        limits: &ResourceLimitGuard,
        field_type: ColumnType,
        accessor_idx: usize,
    ) -> crate::Result<Self> {
@@ -465,7 +465,7 @@ mod tests {
        SegmentRangeCollector::from_req_and_validate(
            &req,
            &mut Default::default(),
-            &mut AggregationLimits::default().new_guard(),
+            &AggregationLimits::default().new_guard(),
            field_type,
            0,
        )
--- a/src/aggregation/bucket/term_agg.rs
+++ b/src/aggregation/bucket/term_agg.rs
@@ -1,6 +1,6 @@
 use std::fmt::Debug;

-use columnar::ColumnType;
+use columnar::{BytesColumn, ColumnType, StrColumn};
 use rustc_hash::FxHashMap;
 use serde::{Deserialize, Serialize};

@@ -9,7 +9,6 @@ use crate::aggregation::agg_limits::MemoryConsumption;
 use crate::aggregation::agg_req_with_accessor::{
    AggregationWithAccessor, AggregationsWithAccessor,
 };
-use crate::aggregation::f64_from_fastfield_u64;
 use crate::aggregation::intermediate_agg_result::{
    IntermediateAggregationResult, IntermediateAggregationResults, IntermediateBucketResult,
    IntermediateKey, IntermediateTermBucketEntry, IntermediateTermBucketResult,
@@ -17,6 +16,7 @@ use crate::aggregation::intermediate_agg_result::{
 use crate::aggregation::segment_agg_result::{
    build_segment_agg_collector, SegmentAggregationCollector,
 };
+use crate::aggregation::{f64_from_fastfield_u64, Key};
 use crate::error::DataCorruption;
 use crate::TantivyError;

@@ -146,6 +146,28 @@ pub struct TermsAggregation {
    /// { "average_price": "asc" }
    #[serde(skip_serializing_if = "Option::is_none", default)]
    pub order: Option<CustomOrder>,
+
+    /// The missing parameter defines how documents that are missing a value should be treated.
+    /// By default they will be ignored but it is also possible to treat them as if they had a
+    /// value. Examples in JSON format:
+    /// { "missing": "NO_DATA" }
+    ///
+    /// # Internal
+    ///
+    /// Internally, `missing` requires some specialized handling in some scenarios.
+    ///
+    /// Simple Case:
+    /// In the simplest case, we can just put the missing value in the termmap use that. In case of
+    /// text we put a special u64::MAX and replace it at the end with the actual missing value,
+    /// when loading the text.
+    /// Special Case 1:
+    /// If we have multiple columns on one field, we need to have a union on the indices on both
+    /// columns, to find docids without a value. That requires a special missing aggreggation.
+    /// Special Case 2: if the key is of type text and the column is numerical, we also need to use
+    /// the special missing aggregation, since there is no mechanism in the numerical column to
+    /// add text.
+    #[serde(skip_serializing_if = "Option::is_none", default)]
+    pub missing: Option<Key>,
 }

 /// Same as TermsAggregation, but with populated defaults.
@@ -176,6 +198,7 @@ pub(crate) struct TermsAggregationInternal {
    pub min_doc_count: u64,

    pub order: CustomOrder,
+    pub missing: Option<Key>,
 }

 impl TermsAggregationInternal {
@@ -195,6 +218,7 @@ impl TermsAggregationInternal {
                .unwrap_or_else(|| order == CustomOrder::default()),
            min_doc_count: req.min_doc_count.unwrap_or(1),
            order,
+            missing: req.missing.clone(),
        }
    }
 }
@@ -224,110 +248,6 @@ impl TermBuckets {
    }
 }

-/// The composite collector is used, when we have different types under one field, to support a term
-/// aggregation on both.
-#[derive(Clone, Debug)]
-pub struct SegmentTermCollectorComposite {
-    term_agg1: SegmentTermCollector, // field type 1, e.g. strings
-    term_agg2: SegmentTermCollector, // field type 2, e.g. u64
-    accessor_idx: usize,
-}
-impl SegmentAggregationCollector for SegmentTermCollectorComposite {
-    fn add_intermediate_aggregation_result(
-        self: Box<Self>,
-        agg_with_accessor: &AggregationsWithAccessor,
-        results: &mut IntermediateAggregationResults,
-    ) -> crate::Result<()> {
-        let name = agg_with_accessor.aggs.keys[self.accessor_idx].to_string();
-        let agg_with_accessor = &agg_with_accessor.aggs.values[self.accessor_idx];
-
-        let bucket = self
-            .term_agg1
-            .into_intermediate_bucket_result(agg_with_accessor)?;
-        results.push(
-            name.to_string(),
-            IntermediateAggregationResult::Bucket(bucket),
-        )?;
-        let bucket = self
-            .term_agg2
-            .into_intermediate_bucket_result(agg_with_accessor)?;
-        results.push(name, IntermediateAggregationResult::Bucket(bucket))?;
-
-        Ok(())
-    }
-
-    #[inline]
-    fn collect(
-        &mut self,
-        doc: crate::DocId,
-        agg_with_accessor: &mut AggregationsWithAccessor,
-    ) -> crate::Result<()> {
-        self.term_agg1.collect_block(&[doc], agg_with_accessor)?;
-        self.swap_accessor(&mut agg_with_accessor.aggs.values[self.accessor_idx]);
-        self.term_agg2.collect_block(&[doc], agg_with_accessor)?;
-        self.swap_accessor(&mut agg_with_accessor.aggs.values[self.accessor_idx]);
-        Ok(())
-    }
-
-    #[inline]
-    fn collect_block(
-        &mut self,
-        docs: &[crate::DocId],
-        agg_with_accessor: &mut AggregationsWithAccessor,
-    ) -> crate::Result<()> {
-        self.term_agg1.collect_block(docs, agg_with_accessor)?;
-        self.swap_accessor(&mut agg_with_accessor.aggs.values[self.accessor_idx]);
-        self.term_agg2.collect_block(docs, agg_with_accessor)?;
-        self.swap_accessor(&mut agg_with_accessor.aggs.values[self.accessor_idx]);
-
-        Ok(())
-    }
-
-    fn flush(&mut self, agg_with_accessor: &mut AggregationsWithAccessor) -> crate::Result<()> {
-        self.term_agg1.flush(agg_with_accessor)?;
-        self.swap_accessor(&mut agg_with_accessor.aggs.values[self.accessor_idx]);
-        self.term_agg2.flush(agg_with_accessor)?;
-        self.swap_accessor(&mut agg_with_accessor.aggs.values[self.accessor_idx]);
-
-        Ok(())
-    }
-}
-
-impl SegmentTermCollectorComposite {
-    /// Swaps the accessor and field type with the second accessor and field type.
-    /// This way we can use the same code for both aggregations.
-    fn swap_accessor(&self, aggregations: &mut AggregationWithAccessor) {
-        if let Some(accessor) = aggregations.accessor2.as_mut() {
-            std::mem::swap(&mut accessor.0, &mut aggregations.accessor);
-            std::mem::swap(&mut accessor.1, &mut aggregations.field_type);
-        }
-    }
-
-    pub(crate) fn from_req_and_validate(
-        req: &TermsAggregation,
-        sub_aggregations: &mut AggregationsWithAccessor,
-        field_type: ColumnType,
-        field_type2: ColumnType,
-        accessor_idx: usize,
-    ) -> crate::Result<Self> {
-        Ok(Self {
-            term_agg1: SegmentTermCollector::from_req_and_validate(
-                req,
-                sub_aggregations,
-                field_type,
-                accessor_idx,
-            )?,
-            term_agg2: SegmentTermCollector::from_req_and_validate(
-                req,
-                sub_aggregations,
-                field_type2,
-                accessor_idx,
-            )?,
-            accessor_idx,
-        })
-    }
-}
-
 /// The collector puts values from the fast field into the correct buckets and does a conversion to
 /// the correct datatype.
 #[derive(Clone, Debug)]
@@ -379,9 +299,16 @@ impl SegmentAggregationCollector for SegmentTermCollector {

        let mem_pre = self.get_memory_consumption();

-        bucket_agg_accessor
-            .column_block_accessor
-            .fetch_block(docs, &bucket_agg_accessor.accessor);
+        if let Some(missing) = bucket_agg_accessor.missing_value_for_accessor {
+            bucket_agg_accessor
+                .column_block_accessor
+                .fetch_block_with_missing(docs, &bucket_agg_accessor.accessor, missing);
+        } else {
+            bucket_agg_accessor
+                .column_block_accessor
+                .fetch_block(docs, &bucket_agg_accessor.accessor);
+        }
+
        for term_id in bucket_agg_accessor.column_block_accessor.iter_vals() {
            let entry = self.term_buckets.entries.entry(term_id).or_default();
            *entry += 1;
@@ -428,6 +355,12 @@ impl SegmentTermCollector {
        field_type: ColumnType,
        accessor_idx: usize,
    ) -> crate::Result<Self> {
+        if field_type == ColumnType::Bytes || field_type == ColumnType::Bool {
+            return Err(TantivyError::InvalidArgument(format!(
+                "terms aggregation is not supported for column type {:?}",
+                field_type
+            )));
+        }
        let term_buckets = TermBuckets::default();

        if let Some(custom_order) = req.order.as_ref() {
@@ -537,19 +470,42 @@ impl SegmentTermCollector {
            let term_dict = agg_with_accessor
                .str_dict_column
                .as_ref()
-                .expect("internal error: term dictionary not found for term aggregation");
-
+                .cloned()
+                .unwrap_or_else(|| {
+                    StrColumn::wrap(BytesColumn::empty(agg_with_accessor.accessor.num_docs()))
+                });
            let mut buffer = String::new();
            for (term_id, doc_count) in entries {
-                if !term_dict.ord_to_str(term_id, &mut buffer)? {
-                    return Err(TantivyError::InternalError(format!(
-                        "Couldn't find term_id {term_id} in dict"
-                    )));
-                }
-
                let intermediate_entry = into_intermediate_bucket_entry(term_id, doc_count)?;
-
-                dict.insert(IntermediateKey::Str(buffer.to_string()), intermediate_entry);
+                // Special case for missing key
+                if term_id == u64::MAX {
+                    let missing_key = self
+                        .req
+                        .missing
+                        .as_ref()
+                        .expect("Found placeholder term_id but `missing` is None");
+                    match missing_key {
+                        Key::Str(missing) => {
+                            buffer.clear();
+                            buffer.push_str(missing);
+                            dict.insert(
+                                IntermediateKey::Str(buffer.to_string()),
+                                intermediate_entry,
+                            );
+                        }
+                        Key::F64(val) => {
+                            buffer.push_str(&val.to_string());
+                            dict.insert(IntermediateKey::F64(*val), intermediate_entry);
+                        }
+                    }
+                } else {
+                    if !term_dict.ord_to_str(term_id, &mut buffer)? {
+                        return Err(TantivyError::InternalError(format!(
+                            "Couldn't find term_id {term_id} in dict"
+                        )));
+                    }
+                    dict.insert(IntermediateKey::Str(buffer.to_string()), intermediate_entry);
+                }
            }
            if self.req.min_doc_count == 0 {
                // TODO: Handle rev streaming for descending sorting by keys
@@ -1315,6 +1271,7 @@ mod tests {
        ];

        let index = get_test_index_from_terms(false, &terms_per_segment)?;
+        assert_eq!(index.searchable_segments().unwrap().len(), 2);

        let agg_req: Aggregations = serde_json::from_value(json!({
            "my_texts": {
@@ -1498,6 +1455,362 @@ mod tests {
                .unwrap();
        assert_eq!(agg_req, agg_req_deser);

+        Ok(())
+    }
+    #[test]
+    fn terms_empty_json() -> crate::Result<()> {
+        let mut schema_builder = Schema::builder();
+        let json = schema_builder.add_json_field("json", FAST);
+        let schema = schema_builder.build();
+        let index = Index::create_in_ram(schema);
+        let mut index_writer = index.writer_for_tests().unwrap();
+        // => Segment with empty json
+        index_writer.add_document(doc!()).unwrap();
+        index_writer.commit().unwrap();
+        // => Segment with json, but no field partially_empty
+        index_writer
+            .add_document(doc!(json => json!({"different_field": "blue"})))
+            .unwrap();
+        index_writer.commit().unwrap();
+        //// => Segment with field partially_empty
+        index_writer
+            .add_document(doc!(json => json!({"partially_empty": "blue"})))
+            .unwrap();
+        index_writer.add_document(doc!())?;
+        index_writer.commit().unwrap();
+
+        let agg_req: Aggregations = serde_json::from_value(json!({
+            "my_texts": {
+                "terms": {
+                    "field": "json.partially_empty"
+                },
+            }
+        }))
+        .unwrap();
+
+        let res = exec_request_with_query(agg_req, &index, None)?;
+
+        assert_eq!(res["my_texts"]["buckets"][0]["key"], "blue");
+        assert_eq!(res["my_texts"]["buckets"][0]["doc_count"], 1);
+        assert_eq!(res["my_texts"]["buckets"][1], serde_json::Value::Null);
+        assert_eq!(res["my_texts"]["sum_other_doc_count"], 0);
+        assert_eq!(res["my_texts"]["doc_count_error_upper_bound"], 0);
+
+        Ok(())
+    }
+
+    #[test]
+    fn terms_aggregation_bytes() -> crate::Result<()> {
+        let mut schema_builder = Schema::builder();
+        let bytes_field = schema_builder.add_bytes_field("bytes", FAST);
+        let index = Index::create_in_ram(schema_builder.build());
+        {
+            let mut index_writer = index.writer_with_num_threads(1, 20_000_000)?;
+            index_writer.set_merge_policy(Box::new(NoMergePolicy));
+            index_writer.add_document(doc!(
+                bytes_field => vec![1,2,3],
+            ))?;
+            index_writer.commit()?;
+        }
+
+        let agg_req: Aggregations = serde_json::from_value(json!({
+            "my_texts": {
+                "terms": {
+                    "field": "bytes"
+                },
+            }
+        }))
+        .unwrap();
+
+        let res = exec_request_with_query(agg_req, &index, None)?;
+
+        // TODO: Returning an error would be better instead of an empty result, since this is not a
+        // JSON field
+        assert_eq!(
+            res["my_texts"]["buckets"][0]["key"],
+            serde_json::Value::Null
+        );
+        assert_eq!(res["my_texts"]["sum_other_doc_count"], 0);
+        assert_eq!(res["my_texts"]["doc_count_error_upper_bound"], 0);
+
+        Ok(())
+    }
+
+    #[test]
+    fn terms_aggregation_missing_multi_value() -> crate::Result<()> {
+        let mut schema_builder = Schema::builder();
+        let text_field = schema_builder.add_text_field("text", FAST);
+        let id_field = schema_builder.add_u64_field("id", FAST);
+        let index = Index::create_in_ram(schema_builder.build());
+        {
+            let mut index_writer = index.writer_with_num_threads(1, 20_000_000)?;
+            index_writer.set_merge_policy(Box::new(NoMergePolicy));
+            index_writer.add_document(doc!(
+                text_field => "Hello Hello",
+                text_field => "Hello Hello",
+                id_field => 1u64,
+                id_field => 1u64,
+            ))?;
+            // Missing
+            index_writer.add_document(doc!())?;
+            index_writer.add_document(doc!(
+                text_field => "Hello Hello",
+            ))?;
+            index_writer.add_document(doc!(
+                text_field => "Hello Hello",
+            ))?;
+            index_writer.commit()?;
+            // Empty segment special case
+            index_writer.add_document(doc!())?;
+            index_writer.commit()?;
+            // Full segment special case
+            index_writer.add_document(doc!(
+                text_field => "Hello Hello",
+                id_field => 1u64,
+            ))?;
+            index_writer.commit()?;
+        }
+
+        let agg_req: Aggregations = serde_json::from_value(json!({
+            "my_texts": {
+                "terms": {
+                    "field": "text",
+                    "missing": "Empty"
+                },
+            },
+            "my_texts2": {
+                "terms": {
+                    "field": "text",
+                    "missing": 1337
+                },
+            },
+            "my_ids": {
+                "terms": {
+                    "field": "id",
+                    "missing": 1337
+                },
+            }
+        }))
+        .unwrap();
+
+        let res = exec_request_with_query(agg_req, &index, None)?;
+
+        // text field
+        assert_eq!(res["my_texts"]["buckets"][0]["key"], "Hello Hello");
+        assert_eq!(res["my_texts"]["buckets"][0]["doc_count"], 5);
+        assert_eq!(res["my_texts"]["buckets"][1]["key"], "Empty");
+        assert_eq!(res["my_texts"]["buckets"][1]["doc_count"], 2);
+        assert_eq!(
+            res["my_texts"]["buckets"][2]["key"],
+            serde_json::Value::Null
+        );
+        // text field with numner as missing fallback
+        assert_eq!(res["my_texts2"]["buckets"][0]["key"], "Hello Hello");
+        assert_eq!(res["my_texts2"]["buckets"][0]["doc_count"], 5);
+        assert_eq!(res["my_texts2"]["buckets"][1]["key"], 1337.0);
+        assert_eq!(res["my_texts2"]["buckets"][1]["doc_count"], 2);
+        assert_eq!(
+            res["my_texts2"]["buckets"][2]["key"],
+            serde_json::Value::Null
+        );
+        assert_eq!(res["my_texts"]["sum_other_doc_count"], 0);
+        assert_eq!(res["my_texts"]["doc_count_error_upper_bound"], 0);
+
+        // id field
+        assert_eq!(res["my_ids"]["buckets"][0]["key"], 1337.0);
+        assert_eq!(res["my_ids"]["buckets"][0]["doc_count"], 4);
+        assert_eq!(res["my_ids"]["buckets"][1]["key"], 1.0);
+        assert_eq!(res["my_ids"]["buckets"][1]["doc_count"], 3);
+        assert_eq!(res["my_ids"]["buckets"][2]["key"], serde_json::Value::Null);
+
+        Ok(())
+    }
+    #[test]
+    fn terms_aggregation_missing_simple_id() -> crate::Result<()> {
+        let mut schema_builder = Schema::builder();
+        let id_field = schema_builder.add_u64_field("id", FAST);
+        let index = Index::create_in_ram(schema_builder.build());
+        {
+            let mut index_writer = index.writer_with_num_threads(1, 20_000_000)?;
+            index_writer.set_merge_policy(Box::new(NoMergePolicy));
+            index_writer.add_document(doc!(
+                id_field => 1u64,
+            ))?;
+            // Missing
+            index_writer.add_document(doc!())?;
+            index_writer.add_document(doc!())?;
+            index_writer.commit()?;
+        }
+
+        let agg_req: Aggregations = serde_json::from_value(json!({
+            "my_ids": {
+                "terms": {
+                    "field": "id",
+                    "missing": 1337
+                },
+            }
+        }))
+        .unwrap();
+
+        let res = exec_request_with_query(agg_req, &index, None)?;
+
+        // id field
+        assert_eq!(res["my_ids"]["buckets"][0]["key"], 1337.0);
+        assert_eq!(res["my_ids"]["buckets"][0]["doc_count"], 2);
+        assert_eq!(res["my_ids"]["buckets"][1]["key"], 1.0);
+        assert_eq!(res["my_ids"]["buckets"][1]["doc_count"], 1);
+        assert_eq!(res["my_ids"]["buckets"][2]["key"], serde_json::Value::Null);
+
+        Ok(())
+    }
+
+    #[test]
+    fn terms_aggregation_missing1() -> crate::Result<()> {
+        let mut schema_builder = Schema::builder();
+        let text_field = schema_builder.add_text_field("text", FAST);
+        let id_field = schema_builder.add_u64_field("id", FAST);
+        let index = Index::create_in_ram(schema_builder.build());
+        {
+            let mut index_writer = index.writer_with_num_threads(1, 20_000_000)?;
+            index_writer.set_merge_policy(Box::new(NoMergePolicy));
+            index_writer.add_document(doc!(
+                text_field => "Hello Hello",
+                id_field => 1u64,
+            ))?;
+            // Missing
+            index_writer.add_document(doc!())?;
+            index_writer.add_document(doc!(
+                text_field => "Hello Hello",
+            ))?;
+            index_writer.add_document(doc!(
+                text_field => "Hello Hello",
+            ))?;
+            index_writer.commit()?;
+            // Empty segment special case
+            index_writer.add_document(doc!())?;
+            index_writer.commit()?;
+            // Full segment special case
+            index_writer.add_document(doc!(
+                text_field => "Hello Hello",
+                id_field => 1u64,
+            ))?;
+            index_writer.commit()?;
+        }
+
+        let agg_req: Aggregations = serde_json::from_value(json!({
+            "my_texts": {
+                "terms": {
+                    "field": "text",
+                    "missing": "Empty"
+                },
+            },
+            "my_texts2": {
+                "terms": {
+                    "field": "text",
+                    "missing": 1337
+                },
+            },
+            "my_ids": {
+                "terms": {
+                    "field": "id",
+                    "missing": 1337
+                },
+            }
+        }))
+        .unwrap();
+
+        let res = exec_request_with_query(agg_req, &index, None)?;
+
+        // text field
+        assert_eq!(res["my_texts"]["buckets"][0]["key"], "Hello Hello");
+        assert_eq!(res["my_texts"]["buckets"][0]["doc_count"], 4);
+        assert_eq!(res["my_texts"]["buckets"][1]["key"], "Empty");
+        assert_eq!(res["my_texts"]["buckets"][1]["doc_count"], 2);
+        assert_eq!(
+            res["my_texts"]["buckets"][2]["key"],
+            serde_json::Value::Null
+        );
+        // text field with numner as missing fallback
+        assert_eq!(res["my_texts2"]["buckets"][0]["key"], "Hello Hello");
+        assert_eq!(res["my_texts2"]["buckets"][0]["doc_count"], 4);
+        assert_eq!(res["my_texts2"]["buckets"][1]["key"], 1337.0);
+        assert_eq!(res["my_texts2"]["buckets"][1]["doc_count"], 2);
+        assert_eq!(
+            res["my_texts2"]["buckets"][2]["key"],
+            serde_json::Value::Null
+        );
+        assert_eq!(res["my_texts"]["sum_other_doc_count"], 0);
+        assert_eq!(res["my_texts"]["doc_count_error_upper_bound"], 0);
+
+        // id field
+        assert_eq!(res["my_ids"]["buckets"][0]["key"], 1337.0);
+        assert_eq!(res["my_ids"]["buckets"][0]["doc_count"], 4);
+        assert_eq!(res["my_ids"]["buckets"][1]["key"], 1.0);
+        assert_eq!(res["my_ids"]["buckets"][1]["doc_count"], 2);
+        assert_eq!(res["my_ids"]["buckets"][2]["key"], serde_json::Value::Null);
+
+        Ok(())
+    }
+    #[test]
+    fn terms_aggregation_missing_empty() -> crate::Result<()> {
+        let mut schema_builder = Schema::builder();
+        schema_builder.add_text_field("text", FAST);
+        schema_builder.add_u64_field("id", FAST);
+        let index = Index::create_in_ram(schema_builder.build());
+        {
+            let mut index_writer = index.writer_with_num_threads(1, 20_000_000)?;
+            index_writer.set_merge_policy(Box::new(NoMergePolicy));
+            // Empty segment special case
+            index_writer.add_document(doc!())?;
+            index_writer.commit()?;
+        }
+
+        let agg_req: Aggregations = serde_json::from_value(json!({
+            "my_texts": {
+                "terms": {
+                    "field": "text",
+                    "missing": "Empty"
+                },
+            },
+            "my_texts2": {
+                "terms": {
+                    "field": "text",
+                    "missing": 1337
+                },
+            },
+            "my_ids": {
+                "terms": {
+                    "field": "id",
+                    "missing": 1337
+                },
+            }
+        }))
+        .unwrap();
+
+        let res = exec_request_with_query(agg_req, &index, None)?;
+
+        // text field
+        assert_eq!(res["my_texts"]["buckets"][0]["key"], "Empty");
+        assert_eq!(res["my_texts"]["buckets"][0]["doc_count"], 1);
+        assert_eq!(
+            res["my_texts"]["buckets"][1]["key"],
+            serde_json::Value::Null
+        );
+        // text field with number as missing fallback
+        assert_eq!(res["my_texts2"]["buckets"][0]["key"], 1337.0);
+        assert_eq!(res["my_texts2"]["buckets"][0]["doc_count"], 1);
+        assert_eq!(
+            res["my_texts2"]["buckets"][1]["key"],
+            serde_json::Value::Null
+        );
+        assert_eq!(res["my_texts"]["sum_other_doc_count"], 0);
+        assert_eq!(res["my_texts"]["doc_count_error_upper_bound"], 0);
+
+        // id field
+        assert_eq!(res["my_ids"]["buckets"][0]["key"], 1337.0);
+        assert_eq!(res["my_ids"]["buckets"][0]["doc_count"], 1);
+        assert_eq!(res["my_ids"]["buckets"][1]["key"], serde_json::Value::Null);
+
        Ok(())
    }
 }
--- a/src/aggregation/bucket/term_missing_agg.rs
+++ b/src/aggregation/bucket/term_missing_agg.rs
@@ -0,0 +1,476 @@
+use rustc_hash::FxHashMap;
+
+use crate::aggregation::agg_req_with_accessor::AggregationsWithAccessor;
+use crate::aggregation::intermediate_agg_result::{
+    IntermediateAggregationResult, IntermediateAggregationResults, IntermediateBucketResult,
+    IntermediateKey, IntermediateTermBucketEntry, IntermediateTermBucketResult,
+};
+use crate::aggregation::segment_agg_result::{
+    build_segment_agg_collector, SegmentAggregationCollector,
+};
+
+/// The specialized missing term aggregation.
+#[derive(Default, Debug, Clone)]
+pub struct TermMissingAgg {
+    missing_count: u32,
+    accessor_idx: usize,
+    sub_agg: Option<Box<dyn SegmentAggregationCollector>>,
+}
+impl TermMissingAgg {
+    pub(crate) fn new(
+        accessor_idx: usize,
+        sub_aggregations: &mut AggregationsWithAccessor,
+    ) -> crate::Result<Self> {
+        let has_sub_aggregations = !sub_aggregations.is_empty();
+        let sub_agg = if has_sub_aggregations {
+            let sub_aggregation = build_segment_agg_collector(sub_aggregations)?;
+            Some(sub_aggregation)
+        } else {
+            None
+        };
+
+        Ok(Self {
+            accessor_idx,
+            sub_agg,
+            ..Default::default()
+        })
+    }
+}
+
+impl SegmentAggregationCollector for TermMissingAgg {
+    fn add_intermediate_aggregation_result(
+        self: Box<Self>,
+        agg_with_accessor: &AggregationsWithAccessor,
+        results: &mut IntermediateAggregationResults,
+    ) -> crate::Result<()> {
+        let name = agg_with_accessor.aggs.keys[self.accessor_idx].to_string();
+        let agg_with_accessor = &agg_with_accessor.aggs.values[self.accessor_idx];
+        let term_agg = agg_with_accessor
+            .agg
+            .agg
+            .as_term()
+            .expect("TermMissingAgg collector must be term agg req");
+        let missing = term_agg
+            .missing
+            .as_ref()
+            .expect("TermMissingAgg collector, but no missing found in agg req")
+            .clone();
+        let mut entries: FxHashMap<IntermediateKey, IntermediateTermBucketEntry> =
+            Default::default();
+
+        let mut missing_entry = IntermediateTermBucketEntry {
+            doc_count: self.missing_count,
+            sub_aggregation: Default::default(),
+        };
+        if let Some(sub_agg) = self.sub_agg {
+            let mut res = IntermediateAggregationResults::default();
+            sub_agg.add_intermediate_aggregation_result(
+                &agg_with_accessor.sub_aggregation,
+                &mut res,
+            )?;
+            missing_entry.sub_aggregation = res;
+        }
+
+        entries.insert(missing.into(), missing_entry);
+
+        let bucket = IntermediateBucketResult::Terms(IntermediateTermBucketResult {
+            entries,
+            sum_other_doc_count: 0,
+            doc_count_error_upper_bound: 0,
+        });
+
+        results.push(name, IntermediateAggregationResult::Bucket(bucket))?;
+
+        Ok(())
+    }
+
+    fn collect(
+        &mut self,
+        doc: crate::DocId,
+        agg_with_accessor: &mut AggregationsWithAccessor,
+    ) -> crate::Result<()> {
+        let agg = &mut agg_with_accessor.aggs.values[self.accessor_idx];
+        let has_value = agg.accessors.iter().any(|acc| acc.index.has_value(doc));
+        if !has_value {
+            self.missing_count += 1;
+            if let Some(sub_agg) = self.sub_agg.as_mut() {
+                sub_agg.collect(doc, &mut agg.sub_aggregation)?;
+            }
+        }
+        Ok(())
+    }
+
+    fn collect_block(
+        &mut self,
+        docs: &[crate::DocId],
+        agg_with_accessor: &mut AggregationsWithAccessor,
+    ) -> crate::Result<()> {
+        for doc in docs {
+            self.collect(*doc, agg_with_accessor)?;
+        }
+        Ok(())
+    }
+}
+
+#[cfg(test)]
+mod tests {
+    use crate::aggregation::agg_req::Aggregations;
+    use crate::aggregation::tests::exec_request_with_query;
+    use crate::schema::{Schema, FAST};
+    use crate::Index;
+
+    #[test]
+    fn terms_aggregation_missing_mixed_type_mult_seg_sub_agg() -> crate::Result<()> {
+        let mut schema_builder = Schema::builder();
+        let json = schema_builder.add_json_field("json", FAST);
+        let score = schema_builder.add_f64_field("score", FAST);
+        let schema = schema_builder.build();
+        let index = Index::create_in_ram(schema);
+        let mut index_writer = index.writer_for_tests().unwrap();
+        // => Segment with all values numeric
+        index_writer
+            .add_document(doc!(score => 1.0, json => json!({"mixed_type": 10.0})))
+            .unwrap();
+        index_writer.add_document(doc!(score => 5.0))?;
+        // index_writer.commit().unwrap();
+        //// => Segment with all values text
+        index_writer
+            .add_document(doc!(score => 1.0, json => json!({"mixed_type": "blue"})))
+            .unwrap();
+        index_writer.add_document(doc!(score => 5.0))?;
+        // index_writer.commit().unwrap();
+
+        // => Segment with mixed values
+        index_writer.add_document(doc!(json => json!({"mixed_type": "red"})))?;
+        index_writer.add_document(doc!(json => json!({"mixed_type": -20.5})))?;
+        index_writer.add_document(doc!(json => json!({"mixed_type": true})))?;
+        index_writer.add_document(doc!(score => 5.0))?;
+
+        index_writer.commit().unwrap();
+        let agg_req: Aggregations = serde_json::from_value(json!({
+            "replace_null": {
+                "terms": {
+                    "field": "json.mixed_type",
+                    "missing": "NULL"
+                },
+                "aggs": {
+                    "sum_score": {
+                        "sum": {
+                            "field": "score"
+                        }
+                    }
+                }
+            },
+        }))
+        .unwrap();
+
+        let res = exec_request_with_query(agg_req, &index, None)?;
+
+        // text field
+        assert_eq!(res["replace_null"]["buckets"][0]["key"], "NULL");
+        assert_eq!(res["replace_null"]["buckets"][0]["doc_count"], 3);
+        assert_eq!(
+            res["replace_null"]["buckets"][0]["sum_score"]["value"],
+            15.0
+        );
+        assert_eq!(res["replace_null"]["sum_other_doc_count"], 0);
+        assert_eq!(res["replace_null"]["doc_count_error_upper_bound"], 0);
+
+        Ok(())
+    }
+
+    #[test]
+    fn terms_aggregation_missing_mixed_type_sub_agg_reg1() -> crate::Result<()> {
+        let mut schema_builder = Schema::builder();
+        let json = schema_builder.add_json_field("json", FAST);
+        let score = schema_builder.add_f64_field("score", FAST);
+        let schema = schema_builder.build();
+        let index = Index::create_in_ram(schema);
+        let mut index_writer = index.writer_for_tests().unwrap();
+        // => Segment with all values numeric
+        index_writer.add_document(doc!(score => 1.0, json => json!({"mixed_type": 10.0})))?;
+        index_writer.add_document(doc!(score => 5.0))?;
+        index_writer.add_document(doc!(score => 5.0))?;
+
+        index_writer.commit().unwrap();
+        let agg_req: Aggregations = serde_json::from_value(json!({
+            "replace_null": {
+                "terms": {
+                    "field": "json.mixed_type",
+                    "missing": "NULL"
+                },
+                "aggs": {
+                    "sum_score": {
+                        "sum": {
+                            "field": "score"
+                        }
+                    }
+                }
+            },
+        }))
+        .unwrap();
+
+        let res = exec_request_with_query(agg_req, &index, None)?;
+
+        // text field
+        assert_eq!(res["replace_null"]["buckets"][0]["key"], "NULL");
+        assert_eq!(res["replace_null"]["buckets"][0]["doc_count"], 2);
+        assert_eq!(
+            res["replace_null"]["buckets"][0]["sum_score"]["value"],
+            10.0
+        );
+        assert_eq!(res["replace_null"]["sum_other_doc_count"], 0);
+        assert_eq!(res["replace_null"]["doc_count_error_upper_bound"], 0);
+
+        Ok(())
+    }
+
+    #[test]
+    fn terms_aggregation_missing_mult_seg_empty() -> crate::Result<()> {
+        let mut schema_builder = Schema::builder();
+        let score = schema_builder.add_f64_field("score", FAST);
+        let schema = schema_builder.build();
+        let index = Index::create_in_ram(schema);
+        let mut index_writer = index.writer_for_tests().unwrap();
+
+        index_writer.add_document(doc!(score => 5.0))?;
+        index_writer.commit().unwrap();
+        index_writer.add_document(doc!(score => 5.0))?;
+        index_writer.commit().unwrap();
+        index_writer.add_document(doc!(score => 5.0))?;
+
+        index_writer.commit().unwrap();
+        let agg_req: Aggregations = serde_json::from_value(json!({
+            "replace_null": {
+                "terms": {
+                    "field": "json.mixed_type",
+                    "missing": "NULL"
+                },
+                "aggs": {
+                    "sum_score": {
+                        "sum": {
+                            "field": "score"
+                        }
+                    }
+                }
+            },
+        }))
+        .unwrap();
+
+        let res = exec_request_with_query(agg_req, &index, None)?;
+
+        // text field
+        assert_eq!(res["replace_null"]["buckets"][0]["key"], "NULL");
+        assert_eq!(res["replace_null"]["buckets"][0]["doc_count"], 3);
+        assert_eq!(
+            res["replace_null"]["buckets"][0]["sum_score"]["value"],
+            15.0
+        );
+        assert_eq!(res["replace_null"]["sum_other_doc_count"], 0);
+        assert_eq!(res["replace_null"]["doc_count_error_upper_bound"], 0);
+
+        Ok(())
+    }
+
+    #[test]
+    fn terms_aggregation_missing_single_seg_empty() -> crate::Result<()> {
+        let mut schema_builder = Schema::builder();
+        let score = schema_builder.add_f64_field("score", FAST);
+        let schema = schema_builder.build();
+        let index = Index::create_in_ram(schema);
+        let mut index_writer = index.writer_for_tests().unwrap();
+
+        index_writer.add_document(doc!(score => 5.0))?;
+        index_writer.add_document(doc!(score => 5.0))?;
+        index_writer.add_document(doc!(score => 5.0))?;
+
+        index_writer.commit().unwrap();
+        let agg_req: Aggregations = serde_json::from_value(json!({
+            "replace_null": {
+                "terms": {
+                    "field": "json.mixed_type",
+                    "missing": "NULL"
+                },
+                "aggs": {
+                    "sum_score": {
+                        "sum": {
+                            "field": "score"
+                        }
+                    }
+                }
+            },
+        }))
+        .unwrap();
+
+        let res = exec_request_with_query(agg_req, &index, None)?;
+
+        // text field
+        assert_eq!(res["replace_null"]["buckets"][0]["key"], "NULL");
+        assert_eq!(res["replace_null"]["buckets"][0]["doc_count"], 3);
+        assert_eq!(
+            res["replace_null"]["buckets"][0]["sum_score"]["value"],
+            15.0
+        );
+        assert_eq!(res["replace_null"]["sum_other_doc_count"], 0);
+        assert_eq!(res["replace_null"]["doc_count_error_upper_bound"], 0);
+
+        Ok(())
+    }
+
+    #[test]
+    fn terms_aggregation_missing_mixed_type_mult_seg() -> crate::Result<()> {
+        let mut schema_builder = Schema::builder();
+        let json = schema_builder.add_json_field("json", FAST);
+        let schema = schema_builder.build();
+        let index = Index::create_in_ram(schema);
+        let mut index_writer = index.writer_for_tests().unwrap();
+        // => Segment with all values numeric
+        index_writer
+            .add_document(doc!(json => json!({"mixed_type": 10.0})))
+            .unwrap();
+        index_writer.add_document(doc!())?;
+        index_writer.commit().unwrap();
+        //// => Segment with all values text
+        index_writer
+            .add_document(doc!(json => json!({"mixed_type": "blue"})))
+            .unwrap();
+        index_writer.add_document(doc!())?;
+        index_writer.commit().unwrap();
+
+        // => Segment with mixed values
+        index_writer
+            .add_document(doc!(json => json!({"mixed_type": "red"})))
+            .unwrap();
+        index_writer
+            .add_document(doc!(json => json!({"mixed_type": -20.5})))
+            .unwrap();
+        index_writer
+            .add_document(doc!(json => json!({"mixed_type": true})))
+            .unwrap();
+        index_writer.add_document(doc!())?;
+
+        index_writer.commit().unwrap();
+        let agg_req: Aggregations = serde_json::from_value(json!({
+            "replace_null": {
+                "terms": {
+                    "field": "json.mixed_type",
+                    "missing": "NULL"
+                },
+            },
+            "replace_num": {
+                "terms": {
+                    "field": "json.mixed_type",
+                    "missing": 1337
+                },
+            },
+        }))
+        .unwrap();
+
+        let res = exec_request_with_query(agg_req, &index, None)?;
+
+        // text field
+        assert_eq!(res["replace_null"]["buckets"][0]["key"], "NULL");
+        assert_eq!(res["replace_null"]["buckets"][0]["doc_count"], 3);
+        assert_eq!(res["replace_num"]["buckets"][0]["key"], 1337.0);
+        assert_eq!(res["replace_num"]["buckets"][0]["doc_count"], 3);
+        assert_eq!(res["replace_null"]["sum_other_doc_count"], 0);
+        assert_eq!(res["replace_null"]["doc_count_error_upper_bound"], 0);
+
+        Ok(())
+    }
+
+    #[test]
+    fn terms_aggregation_missing_str_on_numeric_field() -> crate::Result<()> {
+        let mut schema_builder = Schema::builder();
+        let json = schema_builder.add_json_field("json", FAST);
+        let schema = schema_builder.build();
+        let index = Index::create_in_ram(schema);
+        let mut index_writer = index.writer_for_tests().unwrap();
+        // => Segment with all values numeric
+        index_writer
+            .add_document(doc!(json => json!({"mixed_type": 10.0})))
+            .unwrap();
+        index_writer.add_document(doc!())?;
+        index_writer.add_document(doc!())?;
+
+        index_writer
+            .add_document(doc!(json => json!({"mixed_type": -20.5})))
+            .unwrap();
+        index_writer.add_document(doc!())?;
+
+        index_writer.commit().unwrap();
+
+        let agg_req: Aggregations = serde_json::from_value(json!({
+            "replace_null": {
+                "terms": {
+                    "field": "json.mixed_type",
+                    "missing": "NULL"
+                },
+            },
+        }))
+        .unwrap();
+
+        let res = exec_request_with_query(agg_req, &index, None)?;
+
+        // text field
+        assert_eq!(res["replace_null"]["buckets"][0]["key"], "NULL");
+        assert_eq!(res["replace_null"]["buckets"][0]["doc_count"], 3);
+        assert_eq!(res["replace_null"]["sum_other_doc_count"], 0);
+        assert_eq!(res["replace_null"]["doc_count_error_upper_bound"], 0);
+
+        Ok(())
+    }
+
+    #[test]
+    fn terms_aggregation_missing_mixed_type_one_seg() -> crate::Result<()> {
+        let mut schema_builder = Schema::builder();
+        let json = schema_builder.add_json_field("json", FAST);
+        let schema = schema_builder.build();
+        let index = Index::create_in_ram(schema);
+        let mut index_writer = index.writer_for_tests().unwrap();
+        // => Segment with all values numeric
+        index_writer
+            .add_document(doc!(json => json!({"mixed_type": 10.0})))
+            .unwrap();
+        index_writer.add_document(doc!())?;
+        //// => Segment with all values text
+        index_writer
+            .add_document(doc!(json => json!({"mixed_type": "blue"})))
+            .unwrap();
+        index_writer.add_document(doc!())?;
+
+        // => Segment with mixed values
+        index_writer
+            .add_document(doc!(json => json!({"mixed_type": "red"})))
+            .unwrap();
+        index_writer
+            .add_document(doc!(json => json!({"mixed_type": -20.5})))
+            .unwrap();
+        index_writer
+            .add_document(doc!(json => json!({"mixed_type": true})))
+            .unwrap();
+        index_writer.add_document(doc!())?;
+
+        index_writer.commit().unwrap();
+
+        let agg_req: Aggregations = serde_json::from_value(json!({
+            "replace_null": {
+                "terms": {
+                    "field": "json.mixed_type",
+                    "missing": "NULL"
+                },
+            },
+        }))
+        .unwrap();
+
+        let res = exec_request_with_query(agg_req, &index, None)?;
+
+        // text field
+        assert_eq!(res["replace_null"]["buckets"][0]["key"], "NULL");
+        assert_eq!(res["replace_null"]["buckets"][0]["doc_count"], 3);
+        assert_eq!(res["replace_null"]["sum_other_doc_count"], 0);
+        assert_eq!(res["replace_null"]["doc_count_error_upper_bound"], 0);
+
+        Ok(())
+    }
+}
--- a/src/aggregation/intermediate_agg_result.rs
+++ b/src/aggregation/intermediate_agg_result.rs
@@ -111,9 +111,6 @@ impl IntermediateAggregationResults {
    }

    /// Convert intermediate result and its aggregation request to the final result.
-    ///
-    /// Internal function, AggregationsInternal is used instead Aggregations, which is optimized
-    /// for internal processing, by splitting metric and buckets into separate groups.
    pub(crate) fn into_final_result_internal(
        self,
        req: &Aggregations,
@@ -121,7 +118,14 @@ impl IntermediateAggregationResults {
    ) -> crate::Result<AggregationResults> {
        let mut results: FxHashMap<String, AggregationResult> = FxHashMap::default();
        for (key, agg_res) in self.aggs_res.into_iter() {
-            let req = req.get(key.as_str()).unwrap();
+            let req = req.get(key.as_str()).unwrap_or_else(|| {
+                panic!(
+                    "Could not find key {:?} in request keys {:?}. This probably means that \
+                     add_intermediate_aggregation_result passed the wrong agg object.",
+                    key,
+                    req.keys().collect::<Vec<_>>()
+                )
+            });
            results.insert(key, agg_res.into_final_result(req, limits)?);
        }
        // Handle empty results
@@ -463,7 +467,7 @@ impl IntermediateBucketResult {
                let buckets: Result<Vec<IntermediateHistogramBucketEntry>, TantivyError> =
                    buckets_left
                        .drain(..)
-                        .merge_join_by(buckets_right.into_iter(), |left, right| {
+                        .merge_join_by(buckets_right, |left, right| {
                            left.key.partial_cmp(&right.key).unwrap_or(Ordering::Equal)
                        })
                        .map(|either| match either {
--- a/src/aggregation/metric/average.rs
+++ b/src/aggregation/metric/average.rs
@@ -20,12 +20,21 @@ use super::{IntermediateStats, SegmentStatsCollector};
 pub struct AverageAggregation {
    /// The field name to compute the average on.
    pub field: String,
+    /// The missing parameter defines how documents that are missing a value should be treated.
+    /// By default they will be ignored but it is also possible to treat them as if they had a
+    /// value. Examples in JSON format:
+    /// { "field": "my_numbers", "missing": "10.0" }
+    #[serde(default)]
+    pub missing: Option<f64>,
 }

 impl AverageAggregation {
    /// Creates a new [`AverageAggregation`] instance from a field name.
    pub fn from_field_name(field_name: String) -> Self {
-        Self { field: field_name }
+        Self {
+            field: field_name,
+            missing: None,
+        }
    }
    /// Returns the field name the aggregation is computed on.
    pub fn field_name(&self) -> &str {
--- a/src/aggregation/metric/count.rs
+++ b/src/aggregation/metric/count.rs
@@ -18,14 +18,23 @@ use super::{IntermediateStats, SegmentStatsCollector};
 /// ```
 #[derive(Clone, Debug, PartialEq, Serialize, Deserialize)]
 pub struct CountAggregation {
-    /// The field name to compute the minimum on.
+    /// The field name to compute the count on.
    pub field: String,
+    /// The missing parameter defines how documents that are missing a value should be treated.
+    /// By default they will be ignored but it is also possible to treat them as if they had a
+    /// value. Examples in JSON format:
+    /// { "field": "my_numbers", "missing": "10.0" }
+    #[serde(default)]
+    pub missing: Option<f64>,
 }

 impl CountAggregation {
    /// Creates a new [`CountAggregation`] instance from a field name.
    pub fn from_field_name(field_name: String) -> Self {
-        Self { field: field_name }
+        Self {
+            field: field_name,
+            missing: None,
+        }
    }
    /// Returns the field name the aggregation is computed on.
    pub fn field_name(&self) -> &str {
@@ -51,7 +60,7 @@ impl IntermediateCount {
    pub fn merge_fruits(&mut self, other: IntermediateCount) {
        self.stats.merge_fruits(other.stats);
    }
-    /// Computes the final minimum value.
+    /// Computes the final count value.
    pub fn finalize(&self) -> Option<f64> {
        Some(self.stats.finalize().count as f64)
    }
--- a/src/aggregation/metric/max.rs
+++ b/src/aggregation/metric/max.rs
@@ -20,12 +20,21 @@ use super::{IntermediateStats, SegmentStatsCollector};
 pub struct MaxAggregation {
    /// The field name to compute the maximum on.
    pub field: String,
+    /// The missing parameter defines how documents that are missing a value should be treated.
+    /// By default they will be ignored but it is also possible to treat them as if they had a
+    /// value. Examples in JSON format:
+    /// { "field": "my_numbers", "missing": "10.0" }
+    #[serde(default)]
+    pub missing: Option<f64>,
 }

 impl MaxAggregation {
    /// Creates a new [`MaxAggregation`] instance from a field name.
    pub fn from_field_name(field_name: String) -> Self {
-        Self { field: field_name }
+        Self {
+            field: field_name,
+            missing: None,
+        }
    }
    /// Returns the field name the aggregation is computed on.
    pub fn field_name(&self) -> &str {
@@ -56,3 +65,55 @@ impl IntermediateMax {
        self.stats.finalize().max
    }
 }
+
+#[cfg(test)]
+mod tests {
+    use crate::aggregation::agg_req::Aggregations;
+    use crate::aggregation::tests::exec_request_with_query;
+    use crate::schema::{Schema, FAST};
+    use crate::Index;
+
+    #[test]
+    fn test_max_agg_with_missing() -> crate::Result<()> {
+        let mut schema_builder = Schema::builder();
+        let json = schema_builder.add_json_field("json", FAST);
+        let schema = schema_builder.build();
+        let index = Index::create_in_ram(schema);
+        let mut index_writer = index.writer_for_tests().unwrap();
+        // => Segment with empty json
+        index_writer.add_document(doc!()).unwrap();
+        index_writer.commit().unwrap();
+        // => Segment with json, but no field partially_empty
+        index_writer
+            .add_document(doc!(json => json!({"different_field": "blue"})))
+            .unwrap();
+        index_writer.commit().unwrap();
+        //// => Segment with field partially_empty
+        index_writer
+            .add_document(doc!(json => json!({"partially_empty": 10.0})))
+            .unwrap();
+        index_writer.add_document(doc!())?;
+        index_writer.commit().unwrap();
+
+        let agg_req: Aggregations = serde_json::from_value(json!({
+            "my_stats": {
+                "max": {
+                    "field": "json.partially_empty",
+                    "missing": 100.0,
+                }
+            }
+        }))
+        .unwrap();
+
+        let res = exec_request_with_query(agg_req, &index, None)?;
+
+        assert_eq!(
+            res["my_stats"],
+            json!({
+                "value": 100.0,
+            })
+        );
+
+        Ok(())
+    }
+}
--- a/src/aggregation/metric/min.rs
+++ b/src/aggregation/metric/min.rs
@@ -20,12 +20,21 @@ use super::{IntermediateStats, SegmentStatsCollector};
 pub struct MinAggregation {
    /// The field name to compute the minimum on.
    pub field: String,
+    /// The missing parameter defines how documents that are missing a value should be treated.
+    /// By default they will be ignored but it is also possible to treat them as if they had a
+    /// value. Examples in JSON format:
+    /// { "field": "my_numbers", "missing": "10.0" }
+    #[serde(default)]
+    pub missing: Option<f64>,
 }

 impl MinAggregation {
    /// Creates a new [`MinAggregation`] instance from a field name.
    pub fn from_field_name(field_name: String) -> Self {
-        Self { field: field_name }
+        Self {
+            field: field_name,
+            missing: None,
+        }
    }
    /// Returns the field name the aggregation is computed on.
    pub fn field_name(&self) -> &str {
--- a/src/aggregation/metric/mod.rs
+++ b/src/aggregation/metric/mod.rs
@@ -6,6 +6,15 @@
 //! Some aggregations output a single numeric metric (e.g. Average) and are called
 //! single-value numeric metrics aggregation, others generate multiple metrics (e.g. Stats) and are
 //! called multi-value numeric metrics aggregation.
+//!
+//! ## Supported Metric Aggregations
+//! - [Average](AverageAggregation)
+//! - [Stats](StatsAggregation)
+//! - [Min](MinAggregation)
+//! - [Max](MaxAggregation)
+//! - [Sum](SumAggregation)
+//! - [Count](CountAggregation)
+//! - [Percentiles](PercentilesAggregationReq)

 mod average;
 mod count;
--- a/src/aggregation/metric/percentiles.rs
+++ b/src/aggregation/metric/percentiles.rs
@@ -11,7 +11,7 @@ use crate::aggregation::intermediate_agg_result::{
    IntermediateAggregationResult, IntermediateAggregationResults, IntermediateMetricResult,
 };
 use crate::aggregation::segment_agg_result::SegmentAggregationCollector;
-use crate::aggregation::{f64_from_fastfield_u64, AggregationError};
+use crate::aggregation::{f64_from_fastfield_u64, f64_to_fastfield_u64, AggregationError};
 use crate::{DocId, TantivyError};

 /// # Percentiles
@@ -80,6 +80,12 @@ pub struct PercentilesAggregationReq {
    /// Whether to return the percentiles as a hash map
    #[serde(default = "default_as_true")]
    pub keyed: bool,
+    /// The missing parameter defines how documents that are missing a value should be treated.
+    /// By default they will be ignored but it is also possible to treat them as if they had a
+    /// value. Examples in JSON format:
+    /// { "field": "my_numbers", "missing": "10.0" }
+    #[serde(skip_serializing_if = "Option::is_none", default)]
+    pub missing: Option<f64>,
 }
 fn default_percentiles() -> &'static [f64] {
    &[1.0, 5.0, 25.0, 50.0, 75.0, 95.0, 99.0]
@@ -95,6 +101,7 @@ impl PercentilesAggregationReq {
            field: field_name,
            percents: None,
            keyed: default_as_true(),
+            missing: None,
        }
    }
    /// Returns the field name the aggregation is computed on.
@@ -127,6 +134,7 @@ pub(crate) struct SegmentPercentilesCollector {
    pub(crate) percentiles: PercentilesCollector,
    pub(crate) accessor_idx: usize,
    val_cache: Vec<u64>,
+    missing: Option<u64>,
 }

 #[derive(Clone, Serialize, Deserialize)]
@@ -227,11 +235,16 @@ impl SegmentPercentilesCollector {
        accessor_idx: usize,
    ) -> crate::Result<Self> {
        req.validate()?;
+        let missing = req
+            .missing
+            .and_then(|val| f64_to_fastfield_u64(val, &field_type));
+
        Ok(Self {
            field_type,
            percentiles: PercentilesCollector::new(),
            accessor_idx,
            val_cache: Default::default(),
+            missing,
        })
    }
    #[inline]
@@ -240,9 +253,17 @@ impl SegmentPercentilesCollector {
        docs: &[DocId],
        agg_accessor: &mut AggregationWithAccessor,
    ) {
-        agg_accessor
-            .column_block_accessor
-            .fetch_block(docs, &agg_accessor.accessor);
+        if let Some(missing) = self.missing.as_ref() {
+            agg_accessor.column_block_accessor.fetch_block_with_missing(
+                docs,
+                &agg_accessor.accessor,
+                *missing,
+            );
+        } else {
+            agg_accessor
+                .column_block_accessor
+                .fetch_block(docs, &agg_accessor.accessor);
+        }

        for val in agg_accessor.column_block_accessor.iter_vals() {
            let val1 = f64_from_fastfield_u64(val, &self.field_type);
@@ -277,9 +298,22 @@ impl SegmentAggregationCollector for SegmentPercentilesCollector {
    ) -> crate::Result<()> {
        let field = &agg_with_accessor.aggs.values[self.accessor_idx].accessor;

-        for val in field.values_for_doc(doc) {
-            let val1 = f64_from_fastfield_u64(val, &self.field_type);
-            self.percentiles.collect(val1);
+        if let Some(missing) = self.missing {
+            let mut has_val = false;
+            for val in field.values_for_doc(doc) {
+                let val1 = f64_from_fastfield_u64(val, &self.field_type);
+                self.percentiles.collect(val1);
+                has_val = true;
+            }
+            if !has_val {
+                self.percentiles
+                    .collect(f64_from_fastfield_u64(missing, &self.field_type));
+            }
+        } else {
+            for val in field.values_for_doc(doc) {
+                let val1 = f64_from_fastfield_u64(val, &self.field_type);
+                self.percentiles.collect(val1);
+            }
        }

        Ok(())
@@ -309,10 +343,12 @@ mod tests {
    use crate::aggregation::agg_req::Aggregations;
    use crate::aggregation::agg_result::AggregationResults;
    use crate::aggregation::tests::{
-        get_test_index_from_values, get_test_index_from_values_and_terms,
+        exec_request_with_query, get_test_index_from_values, get_test_index_from_values_and_terms,
    };
    use crate::aggregation::AggregationCollector;
    use crate::query::AllQuery;
+    use crate::schema::{Schema, FAST};
+    use crate::Index;

    #[test]
    fn test_aggregation_percentiles_empty_index() -> crate::Result<()> {
@@ -463,7 +499,7 @@ mod tests {

    fn test_aggregation_percentiles(merge_segments: bool) -> crate::Result<()> {
        use rand_distr::Distribution;
-        let num_values_in_segment = vec![100, 30_000, 8000];
+        let num_values_in_segment = [100, 30_000, 8000];
        let lg_norm = rand_distr::LogNormal::new(2.996f64, 0.979f64).unwrap();
        let mut rng = StdRng::from_seed([1u8; 32]);

@@ -545,4 +581,110 @@ mod tests {

        Ok(())
    }
+
+    #[test]
+    fn test_percentiles_missing_sub_agg() -> crate::Result<()> {
+        // This test verifies the `collect` method (in contrast to `collect_block`), which is
+        // called when the sub-aggregations are flushed.
+        let mut schema_builder = Schema::builder();
+        let text_field = schema_builder.add_text_field("texts", FAST);
+        let score_field_f64 = schema_builder.add_f64_field("score", FAST);
+        let schema = schema_builder.build();
+        let index = Index::create_in_ram(schema);
+
+        {
+            let mut index_writer = index.writer_for_tests()?;
+            // writing the segment
+            index_writer.add_document(doc!(
+                score_field_f64 => 10.0f64,
+                text_field => "a"
+            ))?;
+            index_writer.add_document(doc!(
+                score_field_f64 => 10.0f64,
+                text_field => "a"
+            ))?;
+
+            index_writer.add_document(doc!(text_field => "a"))?;
+
+            index_writer.commit()?;
+        }
+
+        let agg_req: Aggregations = {
+            serde_json::from_value(json!({
+                "range_with_stats": {
+                    "terms": {
+                        "field": "texts"
+                    },
+                    "aggs": {
+                        "percentiles": {
+                            "percentiles": {
+                                "field": "score",
+                                "missing": 5.0
+                            }
+                        }
+                    }
+                }
+            }))
+            .unwrap()
+        };
+
+        let res = exec_request_with_query(agg_req, &index, None)?;
+        assert_eq!(res["range_with_stats"]["buckets"][0]["doc_count"], 3);
+
+        assert_eq!(
+            res["range_with_stats"]["buckets"][0]["percentiles"]["values"]["1.0"],
+            5.0028295751107414
+        );
+        assert_eq!(
+            res["range_with_stats"]["buckets"][0]["percentiles"]["values"]["99.0"],
+            10.07469668951144
+        );
+
+        Ok(())
+    }
+
+    #[test]
+    fn test_percentiles_missing() -> crate::Result<()> {
+        let mut schema_builder = Schema::builder();
+        let text_field = schema_builder.add_text_field("texts", FAST);
+        let score_field_f64 = schema_builder.add_f64_field("score", FAST);
+        let schema = schema_builder.build();
+        let index = Index::create_in_ram(schema);
+
+        {
+            let mut index_writer = index.writer_for_tests()?;
+            // writing the segment
+            index_writer.add_document(doc!(
+                score_field_f64 => 10.0f64,
+                text_field => "a"
+            ))?;
+            index_writer.add_document(doc!(
+                score_field_f64 => 10.0f64,
+                text_field => "a"
+            ))?;
+
+            index_writer.add_document(doc!(text_field => "a"))?;
+
+            index_writer.commit()?;
+        }
+
+        let agg_req: Aggregations = {
+            serde_json::from_value(json!({
+                "percentiles": {
+                    "percentiles": {
+                        "field": "score",
+                        "missing": 5.0
+                    }
+                }
+            }))
+            .unwrap()
+        };
+
+        let res = exec_request_with_query(agg_req, &index, None)?;
+
+        assert_eq!(res["percentiles"]["values"]["1.0"], 5.0028295751107414);
+        assert_eq!(res["percentiles"]["values"]["99.0"], 10.07469668951144);
+
+        Ok(())
+    }
 }
--- a/src/aggregation/metric/stats.rs
+++ b/src/aggregation/metric/stats.rs
@@ -5,11 +5,11 @@ use super::*;
 use crate::aggregation::agg_req_with_accessor::{
    AggregationWithAccessor, AggregationsWithAccessor,
 };
-use crate::aggregation::f64_from_fastfield_u64;
 use crate::aggregation::intermediate_agg_result::{
    IntermediateAggregationResult, IntermediateAggregationResults, IntermediateMetricResult,
 };
 use crate::aggregation::segment_agg_result::SegmentAggregationCollector;
+use crate::aggregation::{f64_from_fastfield_u64, f64_to_fastfield_u64};
 use crate::{DocId, TantivyError};

 /// A multi-value metric aggregation that computes a collection of statistics on numeric values that
@@ -29,12 +29,21 @@ use crate::{DocId, TantivyError};
 pub struct StatsAggregation {
    /// The field name to compute the stats on.
    pub field: String,
+    /// The missing parameter defines how documents that are missing a value should be treated.
+    /// By default they will be ignored but it is also possible to treat them as if they had a
+    /// value. Examples in JSON format:
+    /// { "field": "my_numbers", "missing": "10.0" }
+    #[serde(default)]
+    pub missing: Option<f64>,
 }

 impl StatsAggregation {
    /// Creates a new [`StatsAggregation`] instance from a field name.
    pub fn from_field_name(field_name: String) -> Self {
-        StatsAggregation { field: field_name }
+        StatsAggregation {
+            field: field_name,
+            missing: None,
+        }
    }
    /// Returns the field name the aggregation is computed on.
    pub fn field_name(&self) -> &str {
@@ -153,6 +162,7 @@ pub(crate) enum SegmentStatsType {

 #[derive(Clone, Debug, PartialEq)]
 pub(crate) struct SegmentStatsCollector {
+    missing: Option<u64>,
    field_type: ColumnType,
    pub(crate) collecting_for: SegmentStatsType,
    pub(crate) stats: IntermediateStats,
@@ -165,12 +175,15 @@ impl SegmentStatsCollector {
        field_type: ColumnType,
        collecting_for: SegmentStatsType,
        accessor_idx: usize,
+        missing: Option<f64>,
    ) -> Self {
+        let missing = missing.and_then(|val| f64_to_fastfield_u64(val, &field_type));
        Self {
            field_type,
            collecting_for,
            stats: IntermediateStats::default(),
            accessor_idx,
+            missing,
            val_cache: Default::default(),
        }
    }
@@ -180,10 +193,17 @@ impl SegmentStatsCollector {
        docs: &[DocId],
        agg_accessor: &mut AggregationWithAccessor,
    ) {
-        agg_accessor
-            .column_block_accessor
-            .fetch_block(docs, &agg_accessor.accessor);
-
+        if let Some(missing) = self.missing.as_ref() {
+            agg_accessor.column_block_accessor.fetch_block_with_missing(
+                docs,
+                &agg_accessor.accessor,
+                *missing,
+            );
+        } else {
+            agg_accessor
+                .column_block_accessor
+                .fetch_block(docs, &agg_accessor.accessor);
+        }
        for val in agg_accessor.column_block_accessor.iter_vals() {
            let val1 = f64_from_fastfield_u64(val, &self.field_type);
            self.stats.collect(val1);
@@ -234,10 +254,22 @@ impl SegmentAggregationCollector for SegmentStatsCollector {
        agg_with_accessor: &mut AggregationsWithAccessor,
    ) -> crate::Result<()> {
        let field = &agg_with_accessor.aggs.values[self.accessor_idx].accessor;
-
-        for val in field.values_for_doc(doc) {
-            let val1 = f64_from_fastfield_u64(val, &self.field_type);
-            self.stats.collect(val1);
+        if let Some(missing) = self.missing {
+            let mut has_val = false;
+            for val in field.values_for_doc(doc) {
+                let val1 = f64_from_fastfield_u64(val, &self.field_type);
+                self.stats.collect(val1);
+                has_val = true;
+            }
+            if !has_val {
+                self.stats
+                    .collect(f64_from_fastfield_u64(missing, &self.field_type));
+            }
+        } else {
+            for val in field.values_for_doc(doc) {
+                let val1 = f64_from_fastfield_u64(val, &self.field_type);
+                self.stats.collect(val1);
+            }
        }

        Ok(())
@@ -262,11 +294,13 @@ mod tests {

    use crate::aggregation::agg_req::{Aggregation, Aggregations};
    use crate::aggregation::agg_result::AggregationResults;
-    use crate::aggregation::tests::{get_test_index_2_segments, get_test_index_from_values};
+    use crate::aggregation::tests::{
+        exec_request_with_query, get_test_index_2_segments, get_test_index_from_values,
+    };
    use crate::aggregation::AggregationCollector;
    use crate::query::{AllQuery, TermQuery};
-    use crate::schema::IndexRecordOption;
-    use crate::Term;
+    use crate::schema::{IndexRecordOption, Schema, FAST};
+    use crate::{Index, Term};

    #[test]
    fn test_aggregation_stats_empty_index() -> crate::Result<()> {
@@ -453,4 +487,159 @@ mod tests {

        Ok(())
    }
+
+    #[test]
+    fn test_stats_json() -> crate::Result<()> {
+        let mut schema_builder = Schema::builder();
+        let json = schema_builder.add_json_field("json", FAST);
+        let schema = schema_builder.build();
+        let index = Index::create_in_ram(schema);
+        let mut index_writer = index.writer_for_tests().unwrap();
+        // => Segment with empty json
+        index_writer.add_document(doc!()).unwrap();
+        index_writer.commit().unwrap();
+        // => Segment with json, but no field partially_empty
+        index_writer
+            .add_document(doc!(json => json!({"different_field": "blue"})))
+            .unwrap();
+        index_writer.commit().unwrap();
+        //// => Segment with field partially_empty
+        index_writer
+            .add_document(doc!(json => json!({"partially_empty": 10.0})))
+            .unwrap();
+        index_writer.add_document(doc!())?;
+        index_writer.commit().unwrap();
+
+        let agg_req: Aggregations = serde_json::from_value(json!({
+            "my_stats": {
+                "stats": {
+                    "field": "json.partially_empty"
+                },
+            }
+        }))
+        .unwrap();
+
+        let res = exec_request_with_query(agg_req, &index, None)?;
+
+        assert_eq!(
+            res["my_stats"],
+            json!({
+                "avg":  10.0,
+                "count": 1,
+                "max": 10.0,
+                "min": 10.0,
+                "sum": 10.0
+            })
+        );
+
+        Ok(())
+    }
+
+    #[test]
+    fn test_stats_json_missing() -> crate::Result<()> {
+        let mut schema_builder = Schema::builder();
+        let json = schema_builder.add_json_field("json", FAST);
+        let schema = schema_builder.build();
+        let index = Index::create_in_ram(schema);
+        let mut index_writer = index.writer_for_tests().unwrap();
+        // => Segment with empty json
+        index_writer.add_document(doc!()).unwrap();
+        index_writer.commit().unwrap();
+        // => Segment with json, but no field partially_empty
+        index_writer
+            .add_document(doc!(json => json!({"different_field": "blue"})))
+            .unwrap();
+        index_writer.commit().unwrap();
+        //// => Segment with field partially_empty
+        index_writer
+            .add_document(doc!(json => json!({"partially_empty": 10.0})))
+            .unwrap();
+        index_writer.add_document(doc!())?;
+        index_writer.commit().unwrap();
+
+        let agg_req: Aggregations = serde_json::from_value(json!({
+            "my_stats": {
+                "stats": {
+                    "field": "json.partially_empty",
+                    "missing": 0.0
+                },
+            }
+        }))
+        .unwrap();
+
+        let res = exec_request_with_query(agg_req, &index, None)?;
+
+        assert_eq!(
+            res["my_stats"],
+            json!({
+                "avg":  2.5,
+                "count": 4,
+                "max": 10.0,
+                "min": 0.0,
+                "sum": 10.0
+            })
+        );
+
+        Ok(())
+    }
+
+    #[test]
+    fn test_stats_json_missing_sub_agg() -> crate::Result<()> {
+        // This test verifies the `collect` method (in contrast to `collect_block`), which is
+        // called when the sub-aggregations are flushed.
+        let mut schema_builder = Schema::builder();
+        let text_field = schema_builder.add_text_field("texts", FAST);
+        let score_field_f64 = schema_builder.add_f64_field("score", FAST);
+        let schema = schema_builder.build();
+        let index = Index::create_in_ram(schema);
+
+        {
+            let mut index_writer = index.writer_for_tests()?;
+            // writing the segment
+            index_writer.add_document(doc!(
+                score_field_f64 => 10.0f64,
+                text_field => "a"
+            ))?;
+
+            index_writer.add_document(doc!(text_field => "a"))?;
+
+            index_writer.commit()?;
+        }
+
+        let agg_req: Aggregations = {
+            serde_json::from_value(json!({
+                "range_with_stats": {
+                    "terms": {
+                        "field": "texts"
+                    },
+                    "aggs": {
+                        "my_stats": {
+                            "stats": {
+                                "field": "score",
+                                "missing": 0.0
+                            }
+                        }
+                    }
+                }
+            }))
+            .unwrap()
+        };
+
+        let res = exec_request_with_query(agg_req, &index, None)?;
+
+        assert_eq!(
+            res["range_with_stats"]["buckets"][0]["my_stats"]["count"],
+            2
+        );
+        assert_eq!(
+            res["range_with_stats"]["buckets"][0]["my_stats"]["min"],
+            0.0
+        );
+        assert_eq!(
+            res["range_with_stats"]["buckets"][0]["my_stats"]["avg"],
+            5.0
+        );
+
+        Ok(())
+    }
 }
--- a/src/aggregation/metric/sum.rs
+++ b/src/aggregation/metric/sum.rs
@@ -20,12 +20,21 @@ use super::{IntermediateStats, SegmentStatsCollector};
 pub struct SumAggregation {
    /// The field name to compute the minimum on.
    pub field: String,
+    /// The missing parameter defines how documents that are missing a value should be treated.
+    /// By default they will be ignored but it is also possible to treat them as if they had a
+    /// value. Examples in JSON format:
+    /// { "field": "my_numbers", "missing": "10.0" }
+    #[serde(default)]
+    pub missing: Option<f64>,
 }

 impl SumAggregation {
    /// Creates a new [`SumAggregation`] instance from a field name.
    pub fn from_field_name(field_name: String) -> Self {
-        Self { field: field_name }
+        Self {
+            field: field_name,
+            missing: None,
+        }
    }
    /// Returns the field name the aggregation is computed on.
    pub fn field_name(&self) -> &str {
--- a/src/aggregation/segment_agg_result.rs
+++ b/src/aggregation/segment_agg_result.rs
@@ -15,7 +15,7 @@ use super::metric::{
    SegmentPercentilesCollector, SegmentStatsCollector, SegmentStatsType, StatsAggregation,
    SumAggregation,
 };
-use crate::aggregation::bucket::SegmentTermCollectorComposite;
+use crate::aggregation::bucket::TermMissingAgg;

 pub(crate) trait SegmentAggregationCollector: CollectorClone + Debug {
    fn add_intermediate_aggregation_result(
@@ -82,29 +82,24 @@ pub(crate) fn build_single_agg_segment_collector(
    use AggregationVariants::*;
    match &req.agg.agg {
        Terms(terms_req) => {
-            if let Some(acc2) = req.accessor2.as_ref() {
-                Ok(Box::new(
-                    SegmentTermCollectorComposite::from_req_and_validate(
-                        terms_req,
-                        &mut req.sub_aggregation,
-                        req.field_type,
-                        acc2.1,
-                        accessor_idx,
-                    )?,
-                ))
-            } else {
+            if req.accessors.is_empty() {
                Ok(Box::new(SegmentTermCollector::from_req_and_validate(
                    terms_req,
                    &mut req.sub_aggregation,
                    req.field_type,
                    accessor_idx,
                )?))
+            } else {
+                Ok(Box::new(TermMissingAgg::new(
+                    accessor_idx,
+                    &mut req.sub_aggregation,
+                )?))
            }
        }
        Range(range_req) => Ok(Box::new(SegmentRangeCollector::from_req_and_validate(
            range_req,
            &mut req.sub_aggregation,
-            &mut req.limits,
+            &req.limits,
            req.field_type,
            accessor_idx,
        )?)),
@@ -120,35 +115,43 @@ pub(crate) fn build_single_agg_segment_collector(
            req.field_type,
            accessor_idx,
        )?)),
-        Average(AverageAggregation { .. }) => Ok(Box::new(SegmentStatsCollector::from_req(
-            req.field_type,
-            SegmentStatsType::Average,
-            accessor_idx,
-        ))),
-        Count(CountAggregation { .. }) => Ok(Box::new(SegmentStatsCollector::from_req(
+        Average(AverageAggregation { missing, .. }) => {
+            Ok(Box::new(SegmentStatsCollector::from_req(
+                req.field_type,
+                SegmentStatsType::Average,
+                accessor_idx,
+                *missing,
+            )))
+        }
+        Count(CountAggregation { missing, .. }) => Ok(Box::new(SegmentStatsCollector::from_req(
            req.field_type,
            SegmentStatsType::Count,
            accessor_idx,
+            *missing,
        ))),
-        Max(MaxAggregation { .. }) => Ok(Box::new(SegmentStatsCollector::from_req(
+        Max(MaxAggregation { missing, .. }) => Ok(Box::new(SegmentStatsCollector::from_req(
            req.field_type,
            SegmentStatsType::Max,
            accessor_idx,
+            *missing,
        ))),
-        Min(MinAggregation { .. }) => Ok(Box::new(SegmentStatsCollector::from_req(
+        Min(MinAggregation { missing, .. }) => Ok(Box::new(SegmentStatsCollector::from_req(
            req.field_type,
            SegmentStatsType::Min,
            accessor_idx,
+            *missing,
        ))),
-        Stats(StatsAggregation { .. }) => Ok(Box::new(SegmentStatsCollector::from_req(
+        Stats(StatsAggregation { missing, .. }) => Ok(Box::new(SegmentStatsCollector::from_req(
            req.field_type,
            SegmentStatsType::Stats,
            accessor_idx,
+            *missing,
        ))),
-        Sum(SumAggregation { .. }) => Ok(Box::new(SegmentStatsCollector::from_req(
+        Sum(SumAggregation { missing, .. }) => Ok(Box::new(SegmentStatsCollector::from_req(
            req.field_type,
            SegmentStatsType::Sum,
            accessor_idx,
+            *missing,
        ))),
        Percentiles(percentiles_req) => Ok(Box::new(
            SegmentPercentilesCollector::from_req_and_validate(
--- a/src/collector/count_collector.rs
+++ b/src/collector/count_collector.rs
@@ -16,7 +16,7 @@ use crate::{DocId, Score, SegmentOrdinal, SegmentReader};
 /// let schema = schema_builder.build();
 /// let index = Index::create_in_ram(schema);
 ///
-/// let mut index_writer = index.writer(3_000_000).unwrap();
+/// let mut index_writer = index.writer(15_000_000).unwrap();
 /// index_writer.add_document(doc!(title => "The Name of the Wind")).unwrap();
 /// index_writer.add_document(doc!(title => "The Diary of Muadib")).unwrap();
 /// index_writer.add_document(doc!(title => "A Dairy Cow")).unwrap();
--- a/src/collector/facet_collector.rs
+++ b/src/collector/facet_collector.rs
@@ -89,7 +89,7 @@ fn facet_depth(facet_bytes: &[u8]) -> usize {
 ///     let schema = schema_builder.build();
 ///     let index = Index::create_in_ram(schema);
 ///     {
-///         let mut index_writer = index.writer(3_000_000)?;
+///         let mut index_writer = index.writer(15_000_000)?;
 ///         // a document can be associated with any number of facets
 ///         index_writer.add_document(doc!(
 ///             title => "The Name of the Wind",
@@ -161,6 +161,21 @@ fn facet_depth(facet_bytes: &[u8]) -> usize {
 ///         ]);
 ///     }
 ///
+///     {
+///         let mut facet_collector = FacetCollector::for_field("facet");
+///         facet_collector.add_facet("/");
+///         let facet_counts = searcher.search(&AllQuery, &facet_collector)?;
+///
+///         // This lists all of the facet counts
+///         let facets: Vec<(&Facet, u64)> = facet_counts
+///             .get("/")
+///             .collect();
+///         assert_eq!(facets, vec![
+///             (&Facet::from("/category"), 4),
+///             (&Facet::from("/lang"), 4)
+///         ]);
+///     }
+///
 ///     Ok(())
 /// }
 /// # assert!(example().is_ok());
@@ -285,6 +300,9 @@ fn is_child_facet(parent_facet: &[u8], possible_child_facet: &[u8]) -> bool {
    if !possible_child_facet.starts_with(parent_facet) {
        return false;
    }
+    if parent_facet.is_empty() {
+        return true;
+    }
    possible_child_facet.get(parent_facet.len()).copied() == Some(0u8)
 }

@@ -789,6 +807,15 @@ mod tests {
        );
        Ok(())
    }
+
+    #[test]
+    fn is_child_facet() {
+        assert!(super::is_child_facet(&b"foo"[..], &b"foo\0bar"[..]));
+        assert!(super::is_child_facet(&b""[..], &b"foo\0bar"[..]));
+        assert!(super::is_child_facet(&b""[..], &b"foo"[..]));
+        assert!(!super::is_child_facet(&b"foo\0bar"[..], &b"foo"[..]));
+        assert!(!super::is_child_facet(&b"foo"[..], &b"foobar\0baz"[..]));
+    }
 }

 #[cfg(all(test, feature = "unstable"))]
--- a/src/collector/filter_collector_wrapper.rs
+++ b/src/collector/filter_collector_wrapper.rs
@@ -6,36 +6,39 @@
 //
 // Of course, you can have a look at the tantivy's built-in collectors
 // such as the `CountCollector` for more examples.
-
-// ---
-// Importing tantivy...
+use std::fmt::Debug;
 use std::marker::PhantomData;
-use std::sync::Arc;

-use columnar::{ColumnValues, DynamicColumn, HasAssociatedColumnType};
+use columnar::{BytesColumn, Column, DynamicColumn, HasAssociatedColumnType};

 use crate::collector::{Collector, SegmentCollector};
 use crate::schema::Field;
-use crate::{Score, SegmentReader, TantivyError};
+use crate::{DocId, Score, SegmentReader, TantivyError};

 /// The `FilterCollector` filters docs using a fast field value and a predicate.
-/// Only the documents for which the predicate returned "true" will be passed on to the next
-/// collector.
+///
+/// Only the documents containing at least one value for which the predicate returns `true`
+/// will be passed on to the next collector.
+///
+/// In other words,
+/// - documents with no values are filtered out.
+/// - documents with several values are accepted if at least one value matches the predicate.
+///
 ///
 /// ```rust
 /// use tantivy::collector::{TopDocs, FilterCollector};
 /// use tantivy::query::QueryParser;
-/// use tantivy::schema::{Schema, TEXT, INDEXED, FAST};
+/// use tantivy::schema::{Schema, TEXT, FAST};
 /// use tantivy::{doc, DocAddress, Index};
 ///
 /// # fn main() -> tantivy::Result<()> {
 /// let mut schema_builder = Schema::builder();
 /// let title = schema_builder.add_text_field("title", TEXT);
-/// let price = schema_builder.add_u64_field("price", INDEXED | FAST);
+/// let price = schema_builder.add_u64_field("price", FAST);
 /// let schema = schema_builder.build();
 /// let index = Index::create_in_ram(schema);
 ///
-/// let mut index_writer = index.writer_with_num_threads(1, 10_000_000)?;
+/// let mut index_writer = index.writer_with_num_threads(1, 20_000_000)?;
 /// index_writer.add_document(doc!(title => "The Name of the Wind", price => 30_200u64))?;
 /// index_writer.add_document(doc!(title => "The Diary of Muadib", price => 29_240u64))?;
 /// index_writer.add_document(doc!(title => "A Dairy Cow", price => 21_240u64))?;
@@ -47,20 +50,24 @@ use crate::{Score, SegmentReader, TantivyError};
 ///
 /// let query_parser = QueryParser::for_index(&index, vec![title]);
 /// let query = query_parser.parse_query("diary")?;
-/// let no_filter_collector = FilterCollector::new(price, &|value: u64| value > 20_120u64, TopDocs::with_limit(2));
+/// let no_filter_collector = FilterCollector::new(price, |value: u64| value > 20_120u64, TopDocs::with_limit(2));
 /// let top_docs = searcher.search(&query, &no_filter_collector)?;
 ///
 /// assert_eq!(top_docs.len(), 1);
 /// assert_eq!(top_docs[0].1, DocAddress::new(0, 1));
 ///
-/// let filter_all_collector: FilterCollector<_, _, u64> = FilterCollector::new(price, &|value| value < 5u64, TopDocs::with_limit(2));
+/// let filter_all_collector: FilterCollector<_, _, u64> = FilterCollector::new(price, |value| value < 5u64, TopDocs::with_limit(2));
 /// let filtered_top_docs = searcher.search(&query, &filter_all_collector)?;
 ///
 /// assert_eq!(filtered_top_docs.len(), 0);
 /// # Ok(())
 /// # }
 /// ```
-pub struct FilterCollector<TCollector, TPredicate, TPredicateValue: Default>
+///
+/// Note that this is limited to fast fields which implement the
+/// [`FastValue`][crate::fastfield::FastValue] trait, e.g. `u64` but not `&[u8]`.
+/// To filter based on a bytes fast field, use a [`BytesFilterCollector`] instead.
+pub struct FilterCollector<TCollector, TPredicate, TPredicateValue>
 where TPredicate: 'static + Clone
 {
    field: Field,
@@ -69,19 +76,15 @@ where TPredicate: 'static + Clone
    t_predicate_value: PhantomData<TPredicateValue>,
 }

-impl<TCollector, TPredicate, TPredicateValue: Default>
+impl<TCollector, TPredicate, TPredicateValue>
    FilterCollector<TCollector, TPredicate, TPredicateValue>
 where
    TCollector: Collector + Send + Sync,
    TPredicate: Fn(TPredicateValue) -> bool + Send + Sync + Clone,
 {
-    /// Create a new FilterCollector.
-    pub fn new(
-        field: Field,
-        predicate: TPredicate,
-        collector: TCollector,
-    ) -> FilterCollector<TCollector, TPredicate, TPredicateValue> {
-        FilterCollector {
+    /// Create a new `FilterCollector`.
+    pub fn new(field: Field, predicate: TPredicate, collector: TCollector) -> Self {
+        Self {
            field,
            predicate,
            collector,
@@ -90,7 +93,7 @@ where
    }
 }

-impl<TCollector, TPredicate, TPredicateValue: Default> Collector
+impl<TCollector, TPredicate, TPredicateValue> Collector
    for FilterCollector<TCollector, TPredicate, TPredicateValue>
 where
    TCollector: Collector + Send + Sync,
@@ -98,8 +101,6 @@ where
    TPredicateValue: HasAssociatedColumnType,
    DynamicColumn: Into<Option<columnar::Column<TPredicateValue>>>,
 {
-    // That's the type of our result.
-    // Our standard deviation will be a float.
    type Fruit = TCollector::Fruit;

    type Child = FilterSegmentCollector<TCollector::Child, TPredicate, TPredicateValue>;
@@ -108,7 +109,7 @@ where
        &self,
        segment_local_id: u32,
        segment_reader: &SegmentReader,
-    ) -> crate::Result<FilterSegmentCollector<TCollector::Child, TPredicate, TPredicateValue>> {
+    ) -> crate::Result<Self::Child> {
        let schema = segment_reader.schema();
        let field_entry = schema.get_field_entry(self.field);
        if !field_entry.is_fast() {
@@ -118,16 +119,16 @@ where
            )));
        }

-        let fast_field_reader = segment_reader
+        let column_opt = segment_reader
            .fast_fields()
-            .column_first_or_default(schema.get_field_name(self.field))?;
+            .column_opt(field_entry.name())?;

        let segment_collector = self
            .collector
            .for_segment(segment_local_id, segment_reader)?;

        Ok(FilterSegmentCollector {
-            fast_field_reader,
+            column_opt,
            segment_collector,
            predicate: self.predicate.clone(),
            t_predicate_value: PhantomData,
@@ -146,35 +147,208 @@ where
    }
 }

-pub struct FilterSegmentCollector<TSegmentCollector, TPredicate, TPredicateValue>
-where
-    TPredicate: 'static,
-    DynamicColumn: Into<Option<columnar::Column<TPredicateValue>>>,
-{
-    fast_field_reader: Arc<dyn ColumnValues<TPredicateValue>>,
+pub struct FilterSegmentCollector<TSegmentCollector, TPredicate, TPredicateValue> {
+    column_opt: Option<Column<TPredicateValue>>,
    segment_collector: TSegmentCollector,
    predicate: TPredicate,
    t_predicate_value: PhantomData<TPredicateValue>,
 }

+impl<TSegmentCollector, TPredicate, TPredicateValue>
+    FilterSegmentCollector<TSegmentCollector, TPredicate, TPredicateValue>
+where
+    TPredicateValue: PartialOrd + Copy + Debug + Send + Sync + 'static,
+    TPredicate: 'static + Fn(TPredicateValue) -> bool + Send + Sync,
+{
+    #[inline]
+    fn accept_document(&self, doc_id: DocId) -> bool {
+        if let Some(column) = &self.column_opt {
+            for val in column.values_for_doc(doc_id) {
+                if (self.predicate)(val) {
+                    return true;
+                }
+            }
+        }
+        false
+    }
+}
+
 impl<TSegmentCollector, TPredicate, TPredicateValue> SegmentCollector
    for FilterSegmentCollector<TSegmentCollector, TPredicate, TPredicateValue>
 where
    TSegmentCollector: SegmentCollector,
    TPredicateValue: HasAssociatedColumnType,
-    TPredicate: 'static + Fn(TPredicateValue) -> bool + Send + Sync,
-    DynamicColumn: Into<Option<columnar::Column<TPredicateValue>>>,
+    TPredicate: 'static + Fn(TPredicateValue) -> bool + Send + Sync, /* DynamicColumn: Into<Option<columnar::Column<TPredicateValue>>> */
 {
    type Fruit = TSegmentCollector::Fruit;

    fn collect(&mut self, doc: u32, score: Score) {
-        let value = self.fast_field_reader.get_val(doc);
-        if (self.predicate)(value) {
-            self.segment_collector.collect(doc, score)
+        if self.accept_document(doc) {
+            self.segment_collector.collect(doc, score);
        }
    }

-    fn harvest(self) -> <TSegmentCollector as SegmentCollector>::Fruit {
+    fn harvest(self) -> TSegmentCollector::Fruit {
+        self.segment_collector.harvest()
+    }
+}
+
+/// A variant of the [`FilterCollector`] specialized for bytes fast fields, i.e.
+/// it transparently wraps an inner [`Collector`] but filters documents
+/// based on the result of applying the predicate to the bytes fast field.
+///
+/// A document is accepted if and only if the predicate returns `true` for at least one value.
+///
+/// In other words,
+/// - documents with no values are filtered out.
+/// - documents with several values are accepted if at least one value matches the predicate.
+///
+/// ```rust
+/// use tantivy::collector::{TopDocs, BytesFilterCollector};
+/// use tantivy::query::QueryParser;
+/// use tantivy::schema::{Schema, TEXT, FAST};
+/// use tantivy::{doc, DocAddress, Index};
+///
+/// # fn main() -> tantivy::Result<()> {
+/// let mut schema_builder = Schema::builder();
+/// let title = schema_builder.add_text_field("title", TEXT);
+/// let barcode = schema_builder.add_bytes_field("barcode", FAST);
+/// let schema = schema_builder.build();
+/// let index = Index::create_in_ram(schema);
+///
+/// let mut index_writer = index.writer_with_num_threads(1, 20_000_000)?;
+/// index_writer.add_document(doc!(title => "The Name of the Wind", barcode => &b"010101"[..]))?;
+/// index_writer.add_document(doc!(title => "The Diary of Muadib", barcode => &b"110011"[..]))?;
+/// index_writer.add_document(doc!(title => "A Dairy Cow", barcode => &b"110111"[..]))?;
+/// index_writer.add_document(doc!(title => "The Diary of a Young Girl", barcode => &b"011101"[..]))?;
+/// index_writer.add_document(doc!(title => "Bridget Jones's Diary"))?;
+/// index_writer.commit()?;
+///
+/// let reader = index.reader()?;
+/// let searcher = reader.searcher();
+///
+/// let query_parser = QueryParser::for_index(&index, vec![title]);
+/// let query = query_parser.parse_query("diary")?;
+/// let filter_collector = BytesFilterCollector::new(barcode, |bytes: &[u8]| bytes.starts_with(b"01"), TopDocs::with_limit(2));
+/// let top_docs = searcher.search(&query, &filter_collector)?;
+///
+/// assert_eq!(top_docs.len(), 1);
+/// assert_eq!(top_docs[0].1, DocAddress::new(0, 3));
+/// # Ok(())
+/// # }
+/// ```
+pub struct BytesFilterCollector<TCollector, TPredicate>
+where TPredicate: 'static + Clone
+{
+    field: Field,
+    collector: TCollector,
+    predicate: TPredicate,
+}
+
+impl<TCollector, TPredicate> BytesFilterCollector<TCollector, TPredicate>
+where
+    TCollector: Collector + Send + Sync,
+    TPredicate: Fn(&[u8]) -> bool + Send + Sync + Clone,
+{
+    /// Create a new `BytesFilterCollector`.
+    pub fn new(field: Field, predicate: TPredicate, collector: TCollector) -> Self {
+        Self {
+            field,
+            predicate,
+            collector,
+        }
+    }
+}
+
+impl<TCollector, TPredicate> Collector for BytesFilterCollector<TCollector, TPredicate>
+where
+    TCollector: Collector + Send + Sync,
+    TPredicate: 'static + Fn(&[u8]) -> bool + Send + Sync + Clone,
+{
+    type Fruit = TCollector::Fruit;
+
+    type Child = BytesFilterSegmentCollector<TCollector::Child, TPredicate>;
+
+    fn for_segment(
+        &self,
+        segment_local_id: u32,
+        segment_reader: &SegmentReader,
+    ) -> crate::Result<Self::Child> {
+        let schema = segment_reader.schema();
+        let field_name = schema.get_field_name(self.field);
+
+        let column_opt = segment_reader.fast_fields().bytes(field_name)?;
+
+        let segment_collector = self
+            .collector
+            .for_segment(segment_local_id, segment_reader)?;
+
+        Ok(BytesFilterSegmentCollector {
+            column_opt,
+            segment_collector,
+            predicate: self.predicate.clone(),
+            buffer: Vec::new(),
+        })
+    }
+
+    fn requires_scoring(&self) -> bool {
+        self.collector.requires_scoring()
+    }
+
+    fn merge_fruits(
+        &self,
+        segment_fruits: Vec<<TCollector::Child as SegmentCollector>::Fruit>,
+    ) -> crate::Result<TCollector::Fruit> {
+        self.collector.merge_fruits(segment_fruits)
+    }
+}
+
+pub struct BytesFilterSegmentCollector<TSegmentCollector, TPredicate>
+where TPredicate: 'static
+{
+    column_opt: Option<BytesColumn>,
+    segment_collector: TSegmentCollector,
+    predicate: TPredicate,
+    buffer: Vec<u8>,
+}
+
+impl<TSegmentCollector, TPredicate> BytesFilterSegmentCollector<TSegmentCollector, TPredicate>
+where
+    TSegmentCollector: SegmentCollector,
+    TPredicate: 'static + Fn(&[u8]) -> bool + Send + Sync,
+{
+    #[inline]
+    fn accept_document(&mut self, doc_id: DocId) -> bool {
+        if let Some(column) = &self.column_opt {
+            for ord in column.term_ords(doc_id) {
+                self.buffer.clear();
+
+                let found = column.ord_to_bytes(ord, &mut self.buffer).unwrap_or(false);
+
+                if found && (self.predicate)(&self.buffer) {
+                    return true;
+                }
+            }
+        }
+        false
+    }
+}
+
+impl<TSegmentCollector, TPredicate> SegmentCollector
+    for BytesFilterSegmentCollector<TSegmentCollector, TPredicate>
+where
+    TSegmentCollector: SegmentCollector,
+    TPredicate: 'static + Fn(&[u8]) -> bool + Send + Sync,
+{
+    type Fruit = TSegmentCollector::Fruit;
+
+    fn collect(&mut self, doc: u32, score: Score) {
+        if self.accept_document(doc) {
+            self.segment_collector.collect(doc, score);
+        }
+    }
+
+    fn harvest(self) -> TSegmentCollector::Fruit {
        self.segment_collector.harvest()
    }
 }
--- a/src/collector/histogram_collector.rs
+++ b/src/collector/histogram_collector.rs
@@ -233,7 +233,7 @@ mod tests {
        let val_field = schema_builder.add_i64_field("val_field", FAST);
        let schema = schema_builder.build();
        let index = Index::create_in_ram(schema);
-        let mut writer = index.writer_with_num_threads(1, 4_000_000)?;
+        let mut writer = index.writer_for_tests()?;
        writer.add_document(doc!(val_field=>12i64))?;
        writer.add_document(doc!(val_field=>-30i64))?;
        writer.add_document(doc!(val_field=>-12i64))?;
@@ -255,7 +255,7 @@ mod tests {
        let val_field = schema_builder.add_i64_field("val_field", FAST);
        let schema = schema_builder.build();
        let index = Index::create_in_ram(schema);
-        let mut writer = index.writer_with_num_threads(1, 4_000_000)?;
+        let mut writer = index.writer_for_tests()?;
        writer.add_document(doc!(val_field=>12i64))?;
        writer.commit()?;
        writer.add_document(doc!(val_field=>-30i64))?;
@@ -280,7 +280,7 @@ mod tests {
        let date_field = schema_builder.add_date_field("date_field", FAST);
        let schema = schema_builder.build();
        let index = Index::create_in_ram(schema);
-        let mut writer = index.writer_with_num_threads(1, 4_000_000)?;
+        let mut writer = index.writer_for_tests()?;
        writer.add_document(doc!(date_field=>DateTime::from_primitive(Date::from_calendar_date(1982, Month::September, 17)?.with_hms(0, 0, 0)?)))?;
        writer.add_document(
            doc!(date_field=>DateTime::from_primitive(Date::from_calendar_date(1986, Month::March, 9)?.with_hms(0, 0, 0)?)),
--- a/src/collector/mod.rs
+++ b/src/collector/mod.rs
@@ -44,7 +44,7 @@
 //! #     let title = schema_builder.add_text_field("title", TEXT);
 //! #     let schema = schema_builder.build();
 //! #     let index = Index::create_in_ram(schema);
-//! #     let mut index_writer = index.writer(3_000_000)?;
+//! #     let mut index_writer = index.writer(15_000_000)?;
 //! #       index_writer.add_document(doc!(
 //! #       title => "The Name of the Wind",
 //! #      ))?;
@@ -112,7 +112,7 @@ mod docset_collector;
 pub use self::docset_collector::DocSetCollector;

 mod filter_collector_wrapper;
-pub use self::filter_collector_wrapper::FilterCollector;
+pub use self::filter_collector_wrapper::{BytesFilterCollector, FilterCollector};

 /// `Fruit` is the type for the result of our collection.
 /// e.g. `usize` for the `Count` collector.
--- a/src/collector/multi_collector.rs
+++ b/src/collector/multi_collector.rs
@@ -120,7 +120,7 @@ impl<TFruit: Fruit> FruitHandle<TFruit> {
 /// let title = schema_builder.add_text_field("title", TEXT);
 /// let schema = schema_builder.build();
 /// let index = Index::create_in_ram(schema);
-/// let mut index_writer = index.writer(3_000_000)?;
+/// let mut index_writer = index.writer(15_000_000)?;
 /// index_writer.add_document(doc!(title => "The Name of the Wind"))?;
 /// index_writer.add_document(doc!(title => "The Diary of Muadib"))?;
 /// index_writer.add_document(doc!(title => "A Dairy Cow"))?;
--- a/src/collector/tests.rs
+++ b/src/collector/tests.rs
@@ -26,7 +26,7 @@ pub fn test_filter_collector() -> crate::Result<()> {
    let schema = schema_builder.build();
    let index = Index::create_in_ram(schema);

-    let mut index_writer = index.writer_with_num_threads(1, 10_000_000)?;
+    let mut index_writer = index.writer_with_num_threads(1, 20_000_000)?;
    index_writer.add_document(doc!(title => "The Name of the Wind", price => 30_200u64, date => DateTime::from_utc(OffsetDateTime::parse("1898-04-09T00:00:00+00:00", &Rfc3339).unwrap())))?;
    index_writer.add_document(doc!(title => "The Diary of Muadib", price => 29_240u64, date => DateTime::from_utc(OffsetDateTime::parse("2020-04-09T00:00:00+00:00", &Rfc3339).unwrap())))?;
    index_writer.add_document(doc!(title => "The Diary of Anne Frank", price => 18_240u64, date => DateTime::from_utc(OffsetDateTime::parse("2019-04-20T00:00:00+00:00", &Rfc3339).unwrap())))?;
--- a/src/collector/top_score_collector.rs
+++ b/src/collector/top_score_collector.rs
@@ -14,7 +14,7 @@ use crate::collector::{
 };
 use crate::fastfield::{FastFieldNotAvailableError, FastValue};
 use crate::query::Weight;
-use crate::{DocAddress, DocId, Score, SegmentOrdinal, SegmentReader, TantivyError};
+use crate::{DocAddress, DocId, Order, Score, SegmentOrdinal, SegmentReader, TantivyError};

 struct FastFieldConvertCollector<
    TCollector: Collector<Fruit = Vec<(u64, DocAddress)>>,
@@ -23,6 +23,7 @@ struct FastFieldConvertCollector<
    pub collector: TCollector,
    pub field: String,
    pub fast_value: std::marker::PhantomData<TFastValue>,
+    order: Order,
 }

 impl<TCollector, TFastValue> Collector for FastFieldConvertCollector<TCollector, TFastValue>
@@ -70,7 +71,13 @@ where
        let raw_result = self.collector.merge_fruits(segment_fruits)?;
        let transformed_result = raw_result
            .into_iter()
-            .map(|(score, doc_address)| (TFastValue::from_u64(score), doc_address))
+            .map(|(score, doc_address)| {
+                if self.order.is_desc() {
+                    (TFastValue::from_u64(score), doc_address)
+                } else {
+                    (TFastValue::from_u64(u64::MAX - score), doc_address)
+                }
+            })
            .collect::<Vec<_>>();
        Ok(transformed_result)
    }
@@ -98,7 +105,7 @@ where
 /// let schema = schema_builder.build();
 /// let index = Index::create_in_ram(schema);
 ///
-/// let mut index_writer = index.writer_with_num_threads(1, 10_000_000)?;
+/// let mut index_writer = index.writer_with_num_threads(1, 20_000_000)?;
 /// index_writer.add_document(doc!(title => "The Name of the Wind"))?;
 /// index_writer.add_document(doc!(title => "The Diary of Muadib"))?;
 /// index_writer.add_document(doc!(title => "A Dairy Cow"))?;
@@ -131,16 +138,23 @@ impl fmt::Debug for TopDocs {

 struct ScorerByFastFieldReader {
    sort_column: Arc<dyn ColumnValues<u64>>,
+    order: Order,
 }

 impl CustomSegmentScorer<u64> for ScorerByFastFieldReader {
    fn score(&mut self, doc: DocId) -> u64 {
-        self.sort_column.get_val(doc)
+        let value = self.sort_column.get_val(doc);
+        if self.order.is_desc() {
+            value
+        } else {
+            u64::MAX - value
+        }
    }
 }

 struct ScorerByField {
    field: String,
+    order: Order,
 }

 impl CustomScorer<u64> for ScorerByField {
@@ -157,8 +171,13 @@ impl CustomScorer<u64> for ScorerByField {
            sort_column_opt.ok_or_else(|| FastFieldNotAvailableError {
                field_name: self.field.clone(),
            })?;
+        let mut default_value = 0u64;
+        if self.order.is_asc() {
+            default_value = u64::MAX;
+        }
        Ok(ScorerByFastFieldReader {
-            sort_column: sort_column.first_or_default_col(0u64),
+            sort_column: sort_column.first_or_default_col(default_value),
+            order: self.order.clone(),
        })
    }
 }
@@ -191,7 +210,7 @@ impl TopDocs {
    /// let schema = schema_builder.build();
    /// let index = Index::create_in_ram(schema);
    ///
-    /// let mut index_writer = index.writer_with_num_threads(1, 10_000_000)?;
+    /// let mut index_writer = index.writer_with_num_threads(1, 20_000_000)?;
    /// index_writer.add_document(doc!(title => "The Name of the Wind"))?;
    /// index_writer.add_document(doc!(title => "The Diary of Muadib"))?;
    /// index_writer.add_document(doc!(title => "A Dairy Cow"))?;
@@ -230,7 +249,7 @@ impl TopDocs {
    ///
    /// ```rust
    /// # use tantivy::schema::{Schema, FAST, TEXT};
-    /// # use tantivy::{doc, Index, DocAddress};
+    /// # use tantivy::{doc, Index, DocAddress, Order};
    /// # use tantivy::query::{Query, QueryParser};
    /// use tantivy::Searcher;
    /// use tantivy::collector::TopDocs;
@@ -242,7 +261,7 @@ impl TopDocs {
    /// #   let schema = schema_builder.build();
    /// #
    /// #   let index = Index::create_in_ram(schema);
-    /// #   let mut index_writer = index.writer_with_num_threads(1, 10_000_000)?;
+    /// #   let mut index_writer = index.writer_with_num_threads(1, 20_000_000)?;
    /// #   index_writer.add_document(doc!(title => "The Name of the Wind", rating => 92u64))?;
    /// #   index_writer.add_document(doc!(title => "The Diary of Muadib", rating => 97u64))?;
    /// #   index_writer.add_document(doc!(title => "A Dairy Cow", rating => 63u64))?;
@@ -268,7 +287,7 @@ impl TopDocs {
    ///     // Note the `rating_field` needs to be a FAST field here.
    ///     let top_books_by_rating = TopDocs
    ///                 ::with_limit(10)
-    ///                  .order_by_u64_field("rating");
+    ///                  .order_by_fast_field("rating", Order::Desc);
    ///
    ///     // ... and here are our documents. Note this is a simple vec.
    ///     // The `u64` in the pair is the value of our fast field for
@@ -288,13 +307,15 @@ impl TopDocs {
    ///
    /// To comfortably work with `u64`s, `i64`s, `f64`s, or `date`s, please refer to
    /// the [.order_by_fast_field(...)](TopDocs::order_by_fast_field) method.
-    pub fn order_by_u64_field(
+    fn order_by_u64_field(
        self,
        field: impl ToString,
+        order: Order,
    ) -> impl Collector<Fruit = Vec<(u64, DocAddress)>> {
        CustomScoreTopCollector::new(
            ScorerByField {
                field: field.to_string(),
+                order,
            },
            self.0.into_tscore(),
        )
@@ -316,7 +337,7 @@ impl TopDocs {
    ///
    /// ```rust
    /// # use tantivy::schema::{Schema, FAST, TEXT};
-    /// # use tantivy::{doc, Index, DocAddress};
+    /// # use tantivy::{doc, Index, DocAddress,Order};
    /// # use tantivy::query::{Query, AllQuery};
    /// use tantivy::Searcher;
    /// use tantivy::collector::TopDocs;
@@ -328,7 +349,7 @@ impl TopDocs {
    /// #   let schema = schema_builder.build();
    /// #
    /// #   let index = Index::create_in_ram(schema);
-    /// #   let mut index_writer = index.writer_with_num_threads(1, 10_000_000)?;
+    /// #   let mut index_writer = index.writer_with_num_threads(1, 20_000_000)?;
    /// #   index_writer.add_document(doc!(title => "MadCow Inc.", revenue => 92_000_000i64))?;
    /// #   index_writer.add_document(doc!(title => "Zozo Cow KKK", revenue => 119_000_000i64))?;
    /// #   index_writer.add_document(doc!(title => "Declining Cow", revenue => -63_000_000i64))?;
@@ -354,7 +375,7 @@ impl TopDocs {
    ///     // type `sort_by_field`. revenue_field here is a FAST i64 field.
    ///     let top_company_by_revenue = TopDocs
    ///                 ::with_limit(2)
-    ///                  .order_by_fast_field("revenue");
+    ///                  .order_by_fast_field("revenue", Order::Desc);
    ///
    ///     // ... and here are our documents. Note this is a simple vec.
    ///     // The `i64` in the pair is the value of our fast field for
@@ -372,15 +393,17 @@ impl TopDocs {
    pub fn order_by_fast_field<TFastValue>(
        self,
        fast_field: impl ToString,
+        order: Order,
    ) -> impl Collector<Fruit = Vec<(TFastValue, DocAddress)>>
    where
        TFastValue: FastValue,
    {
-        let u64_collector = self.order_by_u64_field(fast_field.to_string());
+        let u64_collector = self.order_by_u64_field(fast_field.to_string(), order.clone());
        FastFieldConvertCollector {
            collector: u64_collector,
            field: fast_field.to_string(),
            fast_value: PhantomData,
+            order,
        }
    }

@@ -426,7 +449,7 @@ impl TopDocs {
    /// fn create_index() -> tantivy::Result<Index> {
    ///   let schema = create_schema();
    ///   let index = Index::create_in_ram(schema);
-    ///   let mut index_writer = index.writer_with_num_threads(1, 10_000_000)?;
+    ///   let mut index_writer = index.writer_with_num_threads(1, 20_000_000)?;
    ///   let product_name = index.schema().get_field("product_name").unwrap();
    ///   let popularity: Field = index.schema().get_field("popularity").unwrap();
    ///   index_writer.add_document(doc!(product_name => "The Diary of Muadib", popularity => 1u64))?;
@@ -533,7 +556,7 @@ impl TopDocs {
    /// # fn main() -> tantivy::Result<()> {
    /// #   let schema = create_schema();
    /// #   let index = Index::create_in_ram(schema);
-    /// #   let mut index_writer = index.writer_with_num_threads(1, 10_000_000)?;
+    /// #   let mut index_writer = index.writer_with_num_threads(1, 20_000_000)?;
    /// #   let product_name = index.schema().get_field("product_name").unwrap();
    /// #
    /// let popularity: Field = index.schema().get_field("popularity").unwrap();
@@ -721,7 +744,7 @@ mod tests {
    use crate::schema::{Field, Schema, FAST, STORED, TEXT};
    use crate::time::format_description::well_known::Rfc3339;
    use crate::time::OffsetDateTime;
-    use crate::{DateTime, DocAddress, DocId, Index, IndexWriter, Score, SegmentReader};
+    use crate::{DateTime, DocAddress, DocId, Index, IndexWriter, Order, Score, SegmentReader};

    fn make_index() -> crate::Result<Index> {
        let mut schema_builder = Schema::builder();
@@ -729,7 +752,7 @@ mod tests {
        let schema = schema_builder.build();
        let index = Index::create_in_ram(schema);
        // writing the segment
-        let mut index_writer = index.writer_with_num_threads(1, 10_000_000)?;
+        let mut index_writer = index.writer_with_num_threads(1, 20_000_000)?;
        index_writer.add_document(doc!(text_field=>"Hello happy tax payer."))?;
        index_writer.add_document(doc!(text_field=>"Droopy says hello happy tax payer"))?;
        index_writer.add_document(doc!(text_field=>"I like Droopy"))?;
@@ -882,7 +905,7 @@ mod tests {
        });
        let searcher = index.reader()?.searcher();

-        let top_collector = TopDocs::with_limit(4).order_by_u64_field(SIZE);
+        let top_collector = TopDocs::with_limit(4).order_by_u64_field(SIZE, Order::Desc);
        let top_docs: Vec<(u64, DocAddress)> = searcher.search(&query, &top_collector)?;
        assert_eq!(
            &top_docs[..],
@@ -921,7 +944,7 @@ mod tests {
        ))?;
        index_writer.commit()?;
        let searcher = index.reader()?.searcher();
-        let top_collector = TopDocs::with_limit(3).order_by_fast_field("birthday");
+        let top_collector = TopDocs::with_limit(3).order_by_fast_field("birthday", Order::Desc);
        let top_docs: Vec<(DateTime, DocAddress)> = searcher.search(&AllQuery, &top_collector)?;
        assert_eq!(
            &top_docs[..],
@@ -951,7 +974,7 @@ mod tests {
        ))?;
        index_writer.commit()?;
        let searcher = index.reader()?.searcher();
-        let top_collector = TopDocs::with_limit(3).order_by_fast_field("altitude");
+        let top_collector = TopDocs::with_limit(3).order_by_fast_field("altitude", Order::Desc);
        let top_docs: Vec<(i64, DocAddress)> = searcher.search(&AllQuery, &top_collector)?;
        assert_eq!(
            &top_docs[..],
@@ -981,7 +1004,7 @@ mod tests {
        ))?;
        index_writer.commit()?;
        let searcher = index.reader()?.searcher();
-        let top_collector = TopDocs::with_limit(3).order_by_fast_field("altitude");
+        let top_collector = TopDocs::with_limit(3).order_by_fast_field("altitude", Order::Desc);
        let top_docs: Vec<(f64, DocAddress)> = searcher.search(&AllQuery, &top_collector)?;
        assert_eq!(
            &top_docs[..],
@@ -1009,7 +1032,7 @@ mod tests {
                .unwrap();
        });
        let searcher = index.reader().unwrap().searcher();
-        let top_collector = TopDocs::with_limit(4).order_by_u64_field("missing_field");
+        let top_collector = TopDocs::with_limit(4).order_by_u64_field("missing_field", Order::Desc);
        let segment_reader = searcher.segment_reader(0u32);
        top_collector
            .for_segment(0, segment_reader)
@@ -1027,7 +1050,7 @@ mod tests {
        index_writer.commit()?;
        let searcher = index.reader()?.searcher();
        let segment = searcher.segment_reader(0);
-        let top_collector = TopDocs::with_limit(4).order_by_u64_field(SIZE);
+        let top_collector = TopDocs::with_limit(4).order_by_u64_field(SIZE, Order::Desc);
        let err = top_collector.for_segment(0, segment).err().unwrap();
        assert!(matches!(err, crate::TantivyError::InvalidArgument(_)));
        Ok(())
@@ -1044,7 +1067,7 @@ mod tests {
        index_writer.commit()?;
        let searcher = index.reader()?.searcher();
        let segment = searcher.segment_reader(0);
-        let top_collector = TopDocs::with_limit(4).order_by_fast_field::<i64>(SIZE);
+        let top_collector = TopDocs::with_limit(4).order_by_fast_field::<i64>(SIZE, Order::Desc);
        let err = top_collector.for_segment(0, segment).err().unwrap();
        assert!(
            matches!(err, crate::TantivyError::SchemaError(msg) if msg == "Field \"size\" is not a fast field.")
@@ -1099,11 +1122,57 @@ mod tests {
        mut doc_adder: impl FnMut(&mut IndexWriter),
    ) -> (Index, Box<dyn Query>) {
        let index = Index::create_in_ram(schema);
-        let mut index_writer = index.writer_with_num_threads(1, 10_000_000).unwrap();
+        let mut index_writer = index.writer_with_num_threads(1, 15_000_000).unwrap();
        doc_adder(&mut index_writer);
        index_writer.commit().unwrap();
        let query_parser = QueryParser::for_index(&index, vec![query_field]);
        let query = query_parser.parse_query(query).unwrap();
        (index, query)
    }
+    #[test]
+    fn test_fast_field_ascending_order() -> crate::Result<()> {
+        let mut schema_builder = Schema::builder();
+        let title = schema_builder.add_text_field(TITLE, TEXT);
+        let size = schema_builder.add_u64_field(SIZE, FAST);
+        let schema = schema_builder.build();
+        let (index, query) = index("beer", title, schema, |index_writer| {
+            index_writer
+                .add_document(doc!(
+                    title => "bottle of beer",
+                    size => 12u64,
+                ))
+                .unwrap();
+            index_writer
+                .add_document(doc!(
+                    title => "growler of beer",
+                    size => 64u64,
+                ))
+                .unwrap();
+            index_writer
+                .add_document(doc!(
+                    title => "pint of beer",
+                    size => 16u64,
+                ))
+                .unwrap();
+            index_writer
+                .add_document(doc!(
+                    title => "empty beer",
+                ))
+                .unwrap();
+        });
+        let searcher = index.reader()?.searcher();
+
+        let top_collector = TopDocs::with_limit(4).order_by_fast_field(SIZE, Order::Asc);
+        let top_docs: Vec<(u64, DocAddress)> = searcher.search(&query, &top_collector)?;
+        assert_eq!(
+            &top_docs[..],
+            &[
+                (12, DocAddress::new(0, 0)),
+                (16, DocAddress::new(0, 2)),
+                (64, DocAddress::new(0, 1)),
+                (18446744073709551615, DocAddress::new(0, 3)),
+            ]
+        );
+        Ok(())
+    }
 }
--- a/src/core/index.rs
+++ b/src/core/index.rs
@@ -16,7 +16,7 @@ use crate::directory::error::OpenReadError;
 use crate::directory::MmapDirectory;
 use crate::directory::{Directory, ManagedDirectory, RamDirectory, INDEX_WRITER_LOCK};
 use crate::error::{DataCorruption, TantivyError};
-use crate::indexer::index_writer::{MAX_NUM_THREAD, MEMORY_ARENA_NUM_BYTES_MIN};
+use crate::indexer::index_writer::{MAX_NUM_THREAD, MEMORY_BUDGET_NUM_BYTES_MIN};
 use crate::indexer::segment_updater::save_metas;
 use crate::reader::{IndexReader, IndexReaderBuilder};
 use crate::schema::{Field, FieldType, Schema};
@@ -523,9 +523,9 @@ impl Index {
    /// - `num_threads` defines the number of indexing workers that
    /// should work at the same time.
    ///
-    /// - `overall_memory_arena_in_bytes` sets the amount of memory
+    /// - `overall_memory_budget_in_bytes` sets the amount of memory
    /// allocated for all indexing thread.
-    /// Each thread will receive a budget of  `overall_memory_arena_in_bytes / num_threads`.
+    /// Each thread will receive a budget of  `overall_memory_budget_in_bytes / num_threads`.
    ///
    /// # Errors
    /// If the lockfile already exists, returns `Error::DirectoryLockBusy` or an `Error::IoError`.
@@ -534,7 +534,7 @@ impl Index {
    pub fn writer_with_num_threads(
        &self,
        num_threads: usize,
-        overall_memory_arena_in_bytes: usize,
+        overall_memory_budget_in_bytes: usize,
    ) -> crate::Result<IndexWriter> {
        let directory_lock = self
            .directory
@@ -550,7 +550,7 @@ impl Index {
                    ),
                )
            })?;
-        let memory_arena_in_bytes_per_thread = overall_memory_arena_in_bytes / num_threads;
+        let memory_arena_in_bytes_per_thread = overall_memory_budget_in_bytes / num_threads;
        IndexWriter::new(
            self,
            num_threads,
@@ -561,11 +561,11 @@ impl Index {

    /// Helper to create an index writer for tests.
    ///
-    /// That index writer only simply has a single thread and a memory arena of 10 MB.
+    /// That index writer only simply has a single thread and a memory budget of 15 MB.
    /// Using a single thread gives us a deterministic allocation of DocId.
    #[cfg(test)]
    pub fn writer_for_tests(&self) -> crate::Result<IndexWriter> {
-        self.writer_with_num_threads(1, 10_000_000)
+        self.writer_with_num_threads(1, 15_000_000)
    }

    /// Creates a multithreaded writer
@@ -579,13 +579,13 @@ impl Index {
    /// If the lockfile already exists, returns `Error::FileAlreadyExists`.
    /// If the memory arena per thread is too small or too big, returns
    /// `TantivyError::InvalidArgument`
-    pub fn writer(&self, memory_arena_num_bytes: usize) -> crate::Result<IndexWriter> {
+    pub fn writer(&self, memory_budget_in_bytes: usize) -> crate::Result<IndexWriter> {
        let mut num_threads = std::cmp::min(num_cpus::get(), MAX_NUM_THREAD);
-        let memory_arena_num_bytes_per_thread = memory_arena_num_bytes / num_threads;
-        if memory_arena_num_bytes_per_thread < MEMORY_ARENA_NUM_BYTES_MIN {
-            num_threads = (memory_arena_num_bytes / MEMORY_ARENA_NUM_BYTES_MIN).max(1);
+        let memory_budget_num_bytes_per_thread = memory_budget_in_bytes / num_threads;
+        if memory_budget_num_bytes_per_thread < MEMORY_BUDGET_NUM_BYTES_MIN {
+            num_threads = (memory_budget_in_bytes / MEMORY_BUDGET_NUM_BYTES_MIN).max(1);
        }
-        self.writer_with_num_threads(num_threads, memory_arena_num_bytes)
+        self.writer_with_num_threads(num_threads, memory_budget_in_bytes)
    }

    /// Accessor to the index settings
--- a/src/core/index_meta.rs
+++ b/src/core/index_meta.rs
@@ -410,7 +410,9 @@ mod tests {
    use super::IndexMeta;
    use crate::core::index_meta::UntrackedIndexMeta;
    use crate::schema::{Schema, TEXT};
-    use crate::store::{Compressor, ZstdCompressor};
+    use crate::store::Compressor;
+    #[cfg(feature = "zstd-compression")]
+    use crate::store::ZstdCompressor;
    use crate::{IndexSettings, IndexSortByField, Order};

    #[test]
@@ -446,6 +448,7 @@ mod tests {
    }

    #[test]
+    #[cfg(feature = "zstd-compression")]
    fn test_serialize_metas_zstd_compressor() {
        let schema = {
            let mut schema_builder = Schema::builder();
@@ -482,13 +485,14 @@ mod tests {
    }

    #[test]
+    #[cfg(all(feature = "lz4-compression", feature = "zstd-compression"))]
    fn test_serialize_metas_invalid_comp() {
        let json = r#"{"index_settings":{"sort_by_field":{"field":"text","order":"Asc"},"docstore_compression":"zsstd","docstore_blocksize":1000000},"segments":[],"schema":[{"name":"text","type":"text","options":{"indexing":{"record":"position","fieldnorms":true,"tokenizer":"default"},"stored":false,"fast":false}}],"opstamp":0}"#;

        let err = serde_json::from_str::<UntrackedIndexMeta>(json).unwrap_err();
        assert_eq!(
            err.to_string(),
-            "unknown variant `zsstd`, expected one of `none`, `lz4`, `brotli`, `snappy`, `zstd`, \
+            "unknown variant `zsstd`, expected one of `none`, `lz4`, `zstd`, \
             `zstd(compression_level=5)` at line 1 column 96"
                .to_string()
        );
@@ -502,6 +506,20 @@ mod tests {
        );
    }

+    #[test]
+    #[cfg(not(feature = "zstd-compression"))]
+    fn test_serialize_metas_unsupported_comp() {
+        let json = r#"{"index_settings":{"sort_by_field":{"field":"text","order":"Asc"},"docstore_compression":"zstd","docstore_blocksize":1000000},"segments":[],"schema":[{"name":"text","type":"text","options":{"indexing":{"record":"position","fieldnorms":true,"tokenizer":"default"},"stored":false,"fast":false}}],"opstamp":0}"#;
+
+        let err = serde_json::from_str::<UntrackedIndexMeta>(json).unwrap_err();
+        assert_eq!(
+            err.to_string(),
+            "unsupported variant `zstd`, please enable Tantivy's `zstd-compression` feature at \
+             line 1 column 95"
+                .to_string()
+        );
+    }
+
    #[test]
    #[cfg(feature = "lz4-compression")]
    fn test_index_settings_default() {
--- a/src/core/json_utils.rs
+++ b/src/core/json_utils.rs
@@ -60,14 +60,14 @@ impl IndexingPositionsPerPath {
    fn get_position(&mut self, term: &Term) -> &mut IndexingPosition {
        self.positions_per_path
            .entry(murmurhash2(term.serialized_term()))
-            .or_insert_with(Default::default)
+            .or_default()
    }
 }

 pub(crate) fn index_json_values<'a>(
    doc: DocId,
    json_values: impl Iterator<Item = crate::Result<&'a serde_json::Map<String, serde_json::Value>>>,
-    text_analyzer: &TextAnalyzer,
+    text_analyzer: &mut TextAnalyzer,
    expand_dots_enabled: bool,
    term_buffer: &mut Term,
    postings_writer: &mut dyn PostingsWriter,
@@ -93,7 +93,7 @@ pub(crate) fn index_json_values<'a>(
 fn index_json_object(
    doc: DocId,
    json_value: &serde_json::Map<String, serde_json::Value>,
-    text_analyzer: &TextAnalyzer,
+    text_analyzer: &mut TextAnalyzer,
    json_term_writer: &mut JsonTermWriter,
    postings_writer: &mut dyn PostingsWriter,
    ctx: &mut IndexingContext,
@@ -117,7 +117,7 @@ fn index_json_object(
 fn index_json_value(
    doc: DocId,
    json_value: &serde_json::Value,
-    text_analyzer: &TextAnalyzer,
+    text_analyzer: &mut TextAnalyzer,
    json_term_writer: &mut JsonTermWriter,
    postings_writer: &mut dyn PostingsWriter,
    ctx: &mut IndexingContext,
@@ -212,12 +212,12 @@ pub fn convert_to_fast_value_and_get_term(
            DateTime::from_utc(dt_utc),
        ));
    }
-    if let Ok(u64_val) = str::parse::<u64>(phrase) {
-        return Some(set_fastvalue_and_get_term(json_term_writer, u64_val));
-    }
    if let Ok(i64_val) = str::parse::<i64>(phrase) {
        return Some(set_fastvalue_and_get_term(json_term_writer, i64_val));
    }
+    if let Ok(u64_val) = str::parse::<u64>(phrase) {
+        return Some(set_fastvalue_and_get_term(json_term_writer, u64_val));
+    }
    if let Ok(f64_val) = str::parse::<f64>(phrase) {
        return Some(set_fastvalue_and_get_term(json_term_writer, f64_val));
    }
@@ -239,7 +239,7 @@ pub(crate) fn set_fastvalue_and_get_term<T: FastValue>(
 pub(crate) fn set_string_and_get_terms(
    json_term_writer: &mut JsonTermWriter,
    value: &str,
-    text_analyzer: &TextAnalyzer,
+    text_analyzer: &mut TextAnalyzer,
 ) -> Vec<(usize, Term)> {
    let mut positions_and_terms = Vec::<(usize, Term)>::new();
    json_term_writer.close_path_and_set_type(Type::Str);
@@ -259,7 +259,7 @@ pub(crate) fn set_string_and_get_terms(

 /// Writes a value of a JSON field to a `Term`.
 /// The Term format is as follows:
-/// [JSON_TYPE][JSON_PATH][JSON_END_OF_PATH][VALUE_BYTES]
+/// `[JSON_TYPE][JSON_PATH][JSON_END_OF_PATH][VALUE_BYTES]`
 pub struct JsonTermWriter<'a> {
    term_buffer: &'a mut Term,
    path_stack: Vec<usize>,
@@ -619,21 +619,21 @@ mod tests {

    #[test]
    fn test_split_json_path_escaped_dot() {
-        let json_path = split_json_path(r#"toto\.titi"#);
+        let json_path = split_json_path(r"toto\.titi");
        assert_eq!(&json_path, &["toto.titi"]);
-        let json_path_2 = split_json_path(r#"k8s\.container\.name"#);
+        let json_path_2 = split_json_path(r"k8s\.container\.name");
        assert_eq!(&json_path_2, &["k8s.container.name"]);
    }

    #[test]
    fn test_split_json_path_escaped_backslash() {
-        let json_path = split_json_path(r#"toto\\titi"#);
-        assert_eq!(&json_path, &[r#"toto\titi"#]);
+        let json_path = split_json_path(r"toto\\titi");
+        assert_eq!(&json_path, &[r"toto\titi"]);
    }

    #[test]
    fn test_split_json_path_escaped_normal_letter() {
-        let json_path = split_json_path(r#"toto\titi"#);
+        let json_path = split_json_path(r"toto\titi");
        assert_eq!(&json_path, &[r#"tototiti"#]);
    }
 }
--- a/src/core/segment_reader.rs
+++ b/src/core/segment_reader.rs
@@ -2,8 +2,6 @@ use std::collections::HashMap;
 use std::sync::{Arc, RwLock};
 use std::{fmt, io};

-use fail::fail_point;
-
 use crate::core::{InvertedIndexReader, Segment, SegmentComponent, SegmentId};
 use crate::directory::{CompositeFile, FileSlice};
 use crate::error::DataCorruption;
@@ -151,7 +149,7 @@ impl SegmentReader {

        let store_file = segment.open_read(SegmentComponent::Store)?;

-        fail_point!("SegmentReader::open#middle");
+        crate::fail_point!("SegmentReader::open#middle");

        let postings_file = segment.open_read(SegmentComponent::Postings)?;
        let postings_composite = CompositeFile::open(&postings_file)?;
--- a/src/core/tests.rs
+++ b/src/core/tests.rs
@@ -283,7 +283,7 @@ fn test_single_segment_index_writer() -> crate::Result<()> {
    let directory = RamDirectory::default();
    let mut single_segment_index_writer = Index::builder()
        .schema(schema)
-        .single_segment_index_writer(directory, 10_000_000)?;
+        .single_segment_index_writer(directory, 15_000_000)?;
    for _ in 0..10 {
        let doc = doc!(text_field=>"hello");
        single_segment_index_writer.add_document(doc)?;
--- a/src/directory/directory.rs
+++ b/src/directory/directory.rs
@@ -73,7 +73,7 @@ impl From<io::Error> for TryAcquireLockError {

 fn try_acquire_lock(
    filepath: &Path,
-    directory: &mut dyn Directory,
+    directory: &dyn Directory,
 ) -> Result<DirectoryLock, TryAcquireLockError> {
    let mut write = directory.open_write(filepath).map_err(|e| match e {
        OpenWriteError::FileAlreadyExists(_) => TryAcquireLockError::FileExists,
@@ -191,10 +191,10 @@ pub trait Directory: DirectoryClone + fmt::Debug + Send + Sync + 'static {
    ///
    /// The method is blocking or not depending on the [`Lock`] object.
    fn acquire_lock(&self, lock: &Lock) -> Result<DirectoryLock, LockError> {
-        let mut box_directory = self.box_clone();
+        let box_directory = self.box_clone();
        let mut retry_policy = retry_policy(lock.is_blocking);
        loop {
-            match try_acquire_lock(&lock.filepath, &mut *box_directory) {
+            match try_acquire_lock(&lock.filepath, &*box_directory) {
                Ok(result) => {
                    return Ok(result);
                }
--- a/src/directory/mmap_directory.rs
+++ b/src/directory/mmap_directory.rs
@@ -1,10 +1,10 @@
 use std::collections::HashMap;
+use std::fmt;
 use std::fs::{self, File, OpenOptions};
 use std::io::{self, BufWriter, Read, Seek, Write};
 use std::ops::Deref;
 use std::path::{Path, PathBuf};
 use std::sync::{Arc, RwLock, Weak};
-use std::{fmt, result};

 use common::StableDeref;
 use fs4::FileExt;
@@ -21,6 +21,7 @@ use crate::directory::{
    AntiCallToken, Directory, DirectoryLock, FileHandle, Lock, OwnedBytes, TerminatingWrite,
    WatchCallback, WatchHandle, WritePtr,
 };
+#[cfg(unix)]
 use crate::Advice;

 pub type ArcBytes = Arc<dyn Deref<Target = [u8]> + Send + Sync + 'static>;
@@ -33,10 +34,7 @@ pub(crate) fn make_io_err(msg: String) -> io::Error {

 /// Returns `None` iff the file exists, can be read, but is empty (and hence
 /// cannot be mmapped)
-fn open_mmap(
-    full_path: &Path,
-    madvice_opt: Option<Advice>,
-) -> result::Result<Option<Mmap>, OpenReadError> {
+fn open_mmap(full_path: &Path) -> Result<Option<Mmap>, OpenReadError> {
    let file = File::open(full_path).map_err(|io_err| {
        if io_err.kind() == io::ErrorKind::NotFound {
            OpenReadError::FileDoesNotExist(full_path.to_path_buf())
@@ -59,9 +57,7 @@ fn open_mmap(
            .map(Some)
            .map_err(|io_err| OpenReadError::wrap_io_error(io_err, full_path.to_path_buf()))
    }?;
-    if let (Some(mmap), Some(madvice)) = (&mmap_opt, madvice_opt) {
-        let _ = mmap.advise(madvice);
-    }
+
    Ok(mmap_opt)
 }

@@ -83,18 +79,25 @@ pub struct CacheInfo {
 struct MmapCache {
    counters: CacheCounters,
    cache: HashMap<PathBuf, WeakArcBytes>,
+    #[cfg(unix)]
    madvice_opt: Option<Advice>,
 }

 impl MmapCache {
-    fn new(madvice_opt: Option<Advice>) -> MmapCache {
+    fn new() -> MmapCache {
        MmapCache {
            counters: CacheCounters::default(),
            cache: HashMap::default(),
-            madvice_opt,
+            #[cfg(unix)]
+            madvice_opt: None,
        }
    }

+    #[cfg(unix)]
+    fn set_advice(&mut self, madvice: Advice) {
+        self.madvice_opt = Some(madvice);
+    }
+
    fn get_info(&self) -> CacheInfo {
        let paths: Vec<PathBuf> = self.cache.keys().cloned().collect();
        CacheInfo {
@@ -115,6 +118,16 @@ impl MmapCache {
        }
    }

+    fn open_mmap_impl(&self, full_path: &Path) -> Result<Option<Mmap>, OpenReadError> {
+        let mmap_opt = open_mmap(full_path)?;
+        #[cfg(unix)]
+        if let (Some(mmap), Some(madvice)) = (mmap_opt.as_ref(), self.madvice_opt) {
+            // We ignore madvise errors.
+            let _ = mmap.advise(madvice);
+        }
+        Ok(mmap_opt)
+    }
+
    // Returns None if the file exists but as a len of 0 (and hence is not mmappable).
    fn get_mmap(&mut self, full_path: &Path) -> Result<Option<ArcBytes>, OpenReadError> {
        if let Some(mmap_weak) = self.cache.get(full_path) {
@@ -125,7 +138,7 @@ impl MmapCache {
        }
        self.cache.remove(full_path);
        self.counters.miss += 1;
-        let mmap_opt = open_mmap(full_path, self.madvice_opt)?;
+        let mmap_opt = self.open_mmap_impl(full_path)?;
        Ok(mmap_opt.map(|mmap| {
            let mmap_arc: ArcBytes = Arc::new(mmap);
            let mmap_weak = Arc::downgrade(&mmap_arc);
@@ -160,13 +173,9 @@ struct MmapDirectoryInner {
 }

 impl MmapDirectoryInner {
-    fn new(
-        root_path: PathBuf,
-        temp_directory: Option<TempDir>,
-        madvice_opt: Option<Advice>,
-    ) -> MmapDirectoryInner {
+    fn new(root_path: PathBuf, temp_directory: Option<TempDir>) -> MmapDirectoryInner {
        MmapDirectoryInner {
-            mmap_cache: RwLock::new(MmapCache::new(madvice_opt)),
+            mmap_cache: RwLock::new(MmapCache::new()),
            _temp_directory: temp_directory,
            watcher: FileWatcher::new(&root_path.join(*META_FILEPATH)),
            root_path,
@@ -185,12 +194,8 @@ impl fmt::Debug for MmapDirectory {
 }

 impl MmapDirectory {
-    fn new(
-        root_path: PathBuf,
-        temp_directory: Option<TempDir>,
-        madvice_opt: Option<Advice>,
-    ) -> MmapDirectory {
-        let inner = MmapDirectoryInner::new(root_path, temp_directory, madvice_opt);
+    fn new(root_path: PathBuf, temp_directory: Option<TempDir>) -> MmapDirectory {
+        let inner = MmapDirectoryInner::new(root_path, temp_directory);
        MmapDirectory {
            inner: Arc::new(inner),
        }
@@ -206,29 +211,33 @@ impl MmapDirectory {
        Ok(MmapDirectory::new(
            tempdir.path().to_path_buf(),
            Some(tempdir),
-            None,
        ))
    }

+    /// Opens a MmapDirectory in a directory, with a given access pattern.
+    ///
+    /// This is only supported on unix platforms.
+    #[cfg(unix)]
+    pub fn open_with_madvice(
+        directory_path: impl AsRef<Path>,
+        madvice: Advice,
+    ) -> Result<MmapDirectory, OpenDirectoryError> {
+        let dir = Self::open_impl_to_avoid_monomorphization(directory_path.as_ref())?;
+        dir.inner.mmap_cache.write().unwrap().set_advice(madvice);
+        Ok(dir)
+    }
+
    /// Opens a MmapDirectory in a directory.
    ///
    /// Returns an error if the `directory_path` does not
    /// exist or if it is not a directory.
-    pub fn open<P: AsRef<Path>>(directory_path: P) -> Result<MmapDirectory, OpenDirectoryError> {
-        Self::open_with_access_pattern_impl(directory_path.as_ref(), None)
+    pub fn open(directory_path: impl AsRef<Path>) -> Result<MmapDirectory, OpenDirectoryError> {
+        Self::open_impl_to_avoid_monomorphization(directory_path.as_ref())
    }

-    /// Opens a MmapDirectory in a directory, with a given access pattern.
-    pub fn open_with_madvice<P: AsRef<Path>>(
-        directory_path: P,
-        madvice: Advice,
-    ) -> Result<MmapDirectory, OpenDirectoryError> {
-        Self::open_with_access_pattern_impl(directory_path.as_ref(), Some(madvice))
-    }
-
-    fn open_with_access_pattern_impl(
+    #[inline(never)]
+    fn open_impl_to_avoid_monomorphization(
        directory_path: &Path,
-        madvice_opt: Option<Advice>,
    ) -> Result<MmapDirectory, OpenDirectoryError> {
        if !directory_path.exists() {
            return Err(OpenDirectoryError::DoesNotExist(PathBuf::from(
@@ -256,7 +265,7 @@ impl MmapDirectory {
                directory_path,
            )));
        }
-        Ok(MmapDirectory::new(canonical_path, None, madvice_opt))
+        Ok(MmapDirectory::new(canonical_path, None))
    }

    /// Joins a relative_path to the directory `root_path`
@@ -365,7 +374,7 @@ pub(crate) fn atomic_write(path: &Path, content: &[u8]) -> io::Result<()> {
 }

 impl Directory for MmapDirectory {
-    fn get_file_handle(&self, path: &Path) -> result::Result<Arc<dyn FileHandle>, OpenReadError> {
+    fn get_file_handle(&self, path: &Path) -> Result<Arc<dyn FileHandle>, OpenReadError> {
        debug!("Open Read {:?}", path);
        let full_path = self.resolve_path(path);

@@ -388,7 +397,7 @@ impl Directory for MmapDirectory {

    /// Any entry associated with the path in the mmap will be
    /// removed before the file is deleted.
-    fn delete(&self, path: &Path) -> result::Result<(), DeleteError> {
+    fn delete(&self, path: &Path) -> Result<(), DeleteError> {
        let full_path = self.resolve_path(path);
        fs::remove_file(full_path).map_err(|e| {
            if e.kind() == io::ErrorKind::NotFound {
--- a/src/directory/ram_directory.rs
+++ b/src/directory/ram_directory.rs
@@ -5,7 +5,6 @@ use std::sync::{Arc, RwLock};
 use std::{fmt, result};

 use common::HasLen;
-use fail::fail_point;

 use super::FileHandle;
 use crate::core::META_FILEPATH;
@@ -184,7 +183,7 @@ impl Directory for RamDirectory {
    }

    fn delete(&self, path: &Path) -> result::Result<(), DeleteError> {
-        fail_point!("RamDirectory::delete", |_| {
+        crate::fail_point!("RamDirectory::delete", |_| {
            Err(DeleteError::IoError {
                io_error: Arc::new(io::Error::from(io::ErrorKind::Other)),
                filepath: path.to_path_buf(),
--- a/src/fastfield/mod.rs
+++ b/src/fastfield/mod.rs
@@ -686,12 +686,12 @@ mod tests {
        let mut schema_builder = Schema::builder();
        let date_field = schema_builder.add_date_field(
            "date",
-            DateOptions::from(FAST).set_precision(DateTimePrecision::Nanosecond),
+            DateOptions::from(FAST).set_precision(DateTimePrecision::Nanoseconds),
        );
        let multi_date_field = schema_builder.add_date_field(
            "multi_date",
            DateOptions::default()
-                .set_precision(DateTimePrecision::Nanosecond)
+                .set_precision(DateTimePrecision::Nanoseconds)
                .set_fast(),
        );
        let schema = schema_builder.build();
@@ -862,9 +862,9 @@ mod tests {

    #[test]
    pub fn test_gcd_date() {
-        let size_prec_sec = test_gcd_date_with_codec(DateTimePrecision::Second);
+        let size_prec_sec = test_gcd_date_with_codec(DateTimePrecision::Seconds);
        assert!((1000 * 13 / 8..100 + 1000 * 13 / 8).contains(&size_prec_sec.get_bytes())); // 13 bits per val = ceil(log_2(number of seconds in 2hours);
-        let size_prec_micros = test_gcd_date_with_codec(DateTimePrecision::Microsecond);
+        let size_prec_micros = test_gcd_date_with_codec(DateTimePrecision::Microseconds);
        assert!((1000 * 33 / 8..100 + 1000 * 33 / 8).contains(&size_prec_micros.get_bytes()));
        // 33 bits per
        // val = ceil(log_2(number
@@ -939,7 +939,7 @@ mod tests {
            .unwrap()
            .first_or_default_col(0);

-        let numbers = vec![100, 200, 300];
+        let numbers = [100, 200, 300];
        let test_range = |range: RangeInclusive<u64>| {
            let expexted_count = numbers.iter().filter(|num| range.contains(num)).count();
            let mut vec = vec![];
@@ -1013,7 +1013,7 @@ mod tests {
            .unwrap()
            .first_or_default_col(0);

-        let numbers = vec![1000, 1001, 1003];
+        let numbers = [1000, 1001, 1003];
        let test_range = |range: RangeInclusive<u64>| {
            let expexted_count = numbers.iter().filter(|num| range.contains(num)).count();
            let mut vec = vec![];
@@ -1098,7 +1098,7 @@ mod tests {
            .unwrap()
            .is_none());
        let column = fast_field_reader
-            .column_opt::<i64>(r#"json.attr\.age"#)
+            .column_opt::<i64>(r"json.attr\.age")
            .unwrap()
            .unwrap();
        let vals: Vec<i64> = column.values_for_doc(0u32).collect();
@@ -1208,7 +1208,7 @@ mod tests {
        let ff_tokenizer_manager = TokenizerManager::default();
        ff_tokenizer_manager.register(
            "custom_lowercase",
-            TextAnalyzer::builder(RawTokenizer)
+            TextAnalyzer::builder(RawTokenizer::default())
                .filter(LowerCaser)
                .build(),
        );
--- a/src/fastfield/readers.rs
+++ b/src/fastfield/readers.rs
@@ -88,7 +88,7 @@ impl FastFieldReaders {
        let Some((field, path)): Option<(Field, &str)> = self
            .schema
            .find_field_with_default(field_name, default_field_opt)
-        else{
+        else {
            return Ok(None);
        };
        let field_entry: &FieldEntry = self.schema.get_field_entry(field);
@@ -120,7 +120,8 @@ impl FastFieldReaders {
        T: HasAssociatedColumnType,
        DynamicColumn: Into<Option<Column<T>>>,
    {
-        let Some(dynamic_column_handle) = self.dynamic_column_handle(field_name, T::column_type())?
+        let Some(dynamic_column_handle) =
+            self.dynamic_column_handle(field_name, T::column_type())?
        else {
            return Ok(None);
        };
@@ -196,7 +197,8 @@ impl FastFieldReaders {

    /// Returns a `str` column.
    pub fn str(&self, field_name: &str) -> crate::Result<Option<StrColumn>> {
-        let Some(dynamic_column_handle) = self.dynamic_column_handle(field_name, ColumnType::Str)?
+        let Some(dynamic_column_handle) =
+            self.dynamic_column_handle(field_name, ColumnType::Str)?
        else {
            return Ok(None);
        };
@@ -206,7 +208,8 @@ impl FastFieldReaders {

    /// Returns a `bytes` column.
    pub fn bytes(&self, field_name: &str) -> crate::Result<Option<BytesColumn>> {
-        let Some(dynamic_column_handle) = self.dynamic_column_handle(field_name, ColumnType::Bytes)?
+        let Some(dynamic_column_handle) =
+            self.dynamic_column_handle(field_name, ColumnType::Bytes)?
        else {
            return Ok(None);
        };
@@ -273,7 +276,7 @@ impl FastFieldReaders {
    }

    /// Returns the all `u64` column used to represent any `u64`-mapped typed (String/Bytes term
-    /// ids, i64, u64, f64, DateTime).
+    /// ids, i64, u64, f64, bool, DateTime).
    ///
    /// In case of JSON, there may be two columns. One for term and one for numerical types. (This
    /// may change later to 3 types if JSON handles DateTime)
--- a/src/fastfield/writer.rs
+++ b/src/fastfield/writer.rs
@@ -147,7 +147,7 @@ impl FastFieldsWriter {
                    }
                    Value::Str(text_val) => {
                        if let Some(tokenizer) =
-                            &self.per_field_tokenizer[field_value.field().field_id() as usize]
+                            &mut self.per_field_tokenizer[field_value.field().field_id() as usize]
                        {
                            let mut token_stream = tokenizer.token_stream(text_val);
                            token_stream.process(&mut |token: &Token| {
@@ -202,7 +202,7 @@ impl FastFieldsWriter {
                        self.json_path_buffer.push_str(field_name);

                        let text_analyzer =
-                            &self.per_field_tokenizer[field_value.field().field_id() as usize];
+                            &mut self.per_field_tokenizer[field_value.field().field_id() as usize];

                        record_json_obj_to_columnar_writer(
                            doc_id,
@@ -263,7 +263,7 @@ fn record_json_obj_to_columnar_writer(
    remaining_depth_limit: usize,
    json_path_buffer: &mut String,
    columnar_writer: &mut columnar::ColumnarWriter,
-    tokenizer: &Option<TextAnalyzer>,
+    tokenizer: &mut Option<TextAnalyzer>,
 ) {
    for (key, child) in json_obj {
        let len_path = json_path_buffer.len();
@@ -302,7 +302,7 @@ fn record_json_value_to_columnar_writer(
    mut remaining_depth_limit: usize,
    json_path_writer: &mut String,
    columnar_writer: &mut columnar::ColumnarWriter,
-    tokenizer: &Option<TextAnalyzer>,
+    tokenizer: &mut Option<TextAnalyzer>,
 ) {
    if remaining_depth_limit == 0 {
        return;
@@ -321,7 +321,7 @@ fn record_json_value_to_columnar_writer(
            }
        }
        serde_json::Value::String(text) => {
-            if let Some(text_analyzer) = tokenizer {
+            if let Some(text_analyzer) = tokenizer.as_mut() {
                let mut token_stream = text_analyzer.token_stream(text);
                token_stream.process(&mut |token| {
                    columnar_writer.record_str(doc, json_path_writer.as_str(), &token.text);
@@ -379,7 +379,7 @@ mod tests {
                JSON_DEPTH_LIMIT,
                &mut json_path,
                &mut columnar_writer,
-                &None,
+                &mut None,
            );
        }
        let mut buffer = Vec::new();
--- a/src/functional_test.rs
+++ b/src/functional_test.rs
@@ -2,6 +2,7 @@ use std::collections::HashSet;

 use rand::{thread_rng, Rng};

+use crate::indexer::index_writer::MEMORY_BUDGET_NUM_BYTES_MIN;
 use crate::schema::*;
 use crate::{doc, schema, Index, IndexSettings, IndexSortByField, Order, Searcher};

@@ -30,7 +31,7 @@ fn test_functional_store() -> crate::Result<()> {

    let mut rng = thread_rng();

-    let mut index_writer = index.writer_with_num_threads(3, 12_000_000)?;
+    let mut index_writer = index.writer_with_num_threads(3, MEMORY_BUDGET_NUM_BYTES_MIN)?;

    let mut doc_set: Vec<u64> = Vec::new();

--- a/src/indexer/doc_id_mapping.rs
+++ b/src/indexer/doc_id_mapping.rs
@@ -152,8 +152,11 @@ pub(crate) fn get_doc_id_mapping_from_field(

 #[cfg(test)]
 mod tests_indexsorting {
+    use common::DateTime;
+
    use crate::collector::TopDocs;
    use crate::indexer::doc_id_mapping::DocIdMapping;
+    use crate::indexer::NoMergePolicy;
    use crate::query::QueryParser;
    use crate::schema::{Schema, *};
    use crate::{DocAddress, Index, IndexSettings, IndexSortByField, Order};
@@ -444,48 +447,93 @@ mod tests_indexsorting {
        Ok(())
    }

-    // #[test]
-    // fn test_sort_index_fast_field() -> crate::Result<()> {
-    //     let index = create_test_index(
-    //         Some(IndexSettings {
-    //             sort_by_field: Some(IndexSortByField {
-    //                 field: "my_number".to_string(),
-    //                 order: Order::Asc,
-    //             }),
-    //             ..Default::default()
-    //         }),
-    //         get_text_options(),
-    //     )?;
-    //     assert_eq!(
-    //         index.settings().sort_by_field.as_ref().unwrap().field,
-    //         "my_number".to_string()
-    //     );
+    #[test]
+    fn test_sort_index_fast_field() -> crate::Result<()> {
+        let index = create_test_index(
+            Some(IndexSettings {
+                sort_by_field: Some(IndexSortByField {
+                    field: "my_number".to_string(),
+                    order: Order::Asc,
+                }),
+                ..Default::default()
+            }),
+            get_text_options(),
+        )?;
+        assert_eq!(
+            index.settings().sort_by_field.as_ref().unwrap().field,
+            "my_number".to_string()
+        );

-    //     let searcher = index.reader()?.searcher();
-    //     assert_eq!(searcher.segment_readers().len(), 1);
-    //     let segment_reader = searcher.segment_reader(0);
-    //     let fast_fields = segment_reader.fast_fields();
-    //     let my_number = index.schema().get_field("my_number").unwrap();
+        let searcher = index.reader()?.searcher();
+        assert_eq!(searcher.segment_readers().len(), 1);
+        let segment_reader = searcher.segment_reader(0);
+        let fast_fields = segment_reader.fast_fields();

-    //     let fast_field = fast_fields.u64(my_number).unwrap();
-    //     assert_eq!(fast_field.get_val(0), 10u64);
-    //     assert_eq!(fast_field.get_val(1), 20u64);
-    //     assert_eq!(fast_field.get_val(2), 30u64);
+        let fast_field = fast_fields
+            .u64("my_number")
+            .unwrap()
+            .first_or_default_col(999);
+        assert_eq!(fast_field.get_val(0), 10u64);
+        assert_eq!(fast_field.get_val(1), 20u64);
+        assert_eq!(fast_field.get_val(2), 30u64);

-    //     let multi_numbers = index.schema().get_field("multi_numbers").unwrap();
-    //     let multifield = fast_fields.u64s(multi_numbers).unwrap();
-    //     let mut vals = vec![];
-    //     multifield.get_vals(0u32, &mut vals);
-    //     assert_eq!(vals, &[] as &[u64]);
-    //     let mut vals = vec![];
-    //     multifield.get_vals(1u32, &mut vals);
-    //     assert_eq!(vals, &[5, 6]);
+        let multifield = fast_fields.u64("multi_numbers").unwrap();
+        let vals: Vec<u64> = multifield.values_for_doc(0u32).collect();
+        assert_eq!(vals, &[] as &[u64]);
+        let vals: Vec<_> = multifield.values_for_doc(1u32).collect();
+        assert_eq!(vals, &[5, 6]);

-    //     let mut vals = vec![];
-    //     multifield.get_vals(2u32, &mut vals);
-    //     assert_eq!(vals, &[3]);
-    //     Ok(())
-    // }
+        let vals: Vec<_> = multifield.values_for_doc(2u32).collect();
+        assert_eq!(vals, &[3]);
+        Ok(())
+    }
+
+    #[test]
+    fn test_with_sort_by_date_field() -> crate::Result<()> {
+        let mut schema_builder = Schema::builder();
+        let date_field = schema_builder.add_date_field("date", INDEXED | STORED | FAST);
+        let schema = schema_builder.build();
+
+        let settings = IndexSettings {
+            sort_by_field: Some(IndexSortByField {
+                field: "date".to_string(),
+                order: Order::Desc,
+            }),
+            ..Default::default()
+        };
+
+        let index = Index::builder()
+            .schema(schema)
+            .settings(settings)
+            .create_in_ram()?;
+        let mut index_writer = index.writer_for_tests()?;
+        index_writer.set_merge_policy(Box::new(NoMergePolicy));
+
+        index_writer.add_document(doc!(
+            date_field => DateTime::from_timestamp_secs(1000),
+        ))?;
+        index_writer.add_document(doc!(
+            date_field => DateTime::from_timestamp_secs(999),
+        ))?;
+        index_writer.add_document(doc!(
+            date_field => DateTime::from_timestamp_secs(1001),
+        ))?;
+        index_writer.commit()?;
+
+        let searcher = index.reader()?.searcher();
+        assert_eq!(searcher.segment_readers().len(), 1);
+        let segment_reader = searcher.segment_reader(0);
+        let fast_fields = segment_reader.fast_fields();
+
+        let fast_field = fast_fields
+            .date("date")
+            .unwrap()
+            .first_or_default_col(DateTime::from_timestamp_secs(0));
+        assert_eq!(fast_field.get_val(0), DateTime::from_timestamp_secs(1001));
+        assert_eq!(fast_field.get_val(1), DateTime::from_timestamp_secs(1000));
+        assert_eq!(fast_field.get_val(2), DateTime::from_timestamp_secs(999));
+        Ok(())
+    }

    #[test]
    fn test_doc_mapping() {
--- a/src/indexer/index_writer.rs
+++ b/src/indexer/index_writer.rs
@@ -27,9 +27,9 @@ use crate::{FutureResult, Opstamp};
 // in the `memory_arena` goes below MARGIN_IN_BYTES.
 pub const MARGIN_IN_BYTES: usize = 1_000_000;

-// We impose the memory per thread to be at least 3 MB.
-pub const MEMORY_ARENA_NUM_BYTES_MIN: usize = ((MARGIN_IN_BYTES as u32) * 3u32) as usize;
-pub const MEMORY_ARENA_NUM_BYTES_MAX: usize = u32::MAX as usize - MARGIN_IN_BYTES;
+// We impose the memory per thread to be at least 15 MB, as the baseline consumption is 12MB.
+pub const MEMORY_BUDGET_NUM_BYTES_MIN: usize = ((MARGIN_IN_BYTES as u32) * 15u32) as usize;
+pub const MEMORY_BUDGET_NUM_BYTES_MAX: usize = u32::MAX as usize - MARGIN_IN_BYTES;

 // We impose the number of index writer threads to be at most this.
 pub const MAX_NUM_THREAD: usize = 8;
@@ -57,7 +57,8 @@ pub struct IndexWriter {

    index: Index,

-    memory_arena_in_bytes_per_thread: usize,
+    // The memory budget per thread, after which a commit is triggered.
+    memory_budget_in_bytes_per_thread: usize,

    workers_join_handle: Vec<JoinHandle<crate::Result<()>>>,

@@ -167,7 +168,7 @@ fn index_documents(
    memory_budget: usize,
    segment: Segment,
    grouped_document_iterator: &mut dyn Iterator<Item = AddBatch>,
-    segment_updater: &mut SegmentUpdater,
+    segment_updater: &SegmentUpdater,
    mut delete_cursor: DeleteCursor,
 ) -> crate::Result<()> {
    let mut segment_writer = SegmentWriter::for_segment(memory_budget, segment.clone())?;
@@ -264,19 +265,19 @@ impl IndexWriter {
    pub(crate) fn new(
        index: &Index,
        num_threads: usize,
-        memory_arena_in_bytes_per_thread: usize,
+        memory_budget_in_bytes_per_thread: usize,
        directory_lock: DirectoryLock,
    ) -> crate::Result<IndexWriter> {
-        if memory_arena_in_bytes_per_thread < MEMORY_ARENA_NUM_BYTES_MIN {
+        if memory_budget_in_bytes_per_thread < MEMORY_BUDGET_NUM_BYTES_MIN {
            let err_msg = format!(
                "The memory arena in bytes per thread needs to be at least \
-                 {MEMORY_ARENA_NUM_BYTES_MIN}."
+                 {MEMORY_BUDGET_NUM_BYTES_MIN}."
            );
            return Err(TantivyError::InvalidArgument(err_msg));
        }
-        if memory_arena_in_bytes_per_thread >= MEMORY_ARENA_NUM_BYTES_MAX {
+        if memory_budget_in_bytes_per_thread >= MEMORY_BUDGET_NUM_BYTES_MAX {
            let err_msg = format!(
-                "The memory arena in bytes per thread cannot exceed {MEMORY_ARENA_NUM_BYTES_MAX}"
+                "The memory arena in bytes per thread cannot exceed {MEMORY_BUDGET_NUM_BYTES_MAX}"
            );
            return Err(TantivyError::InvalidArgument(err_msg));
        }
@@ -295,7 +296,7 @@ impl IndexWriter {
        let mut index_writer = IndexWriter {
            _directory_lock: Some(directory_lock),

-            memory_arena_in_bytes_per_thread,
+            memory_budget_in_bytes_per_thread,
            index: index.clone(),
            index_writer_status: IndexWriterStatus::from(document_receiver),
            operation_sender: document_sender,
@@ -392,11 +393,11 @@ impl IndexWriter {
        let document_receiver_clone = self.operation_receiver()?;
        let index_writer_bomb = self.index_writer_status.create_bomb();

-        let mut segment_updater = self.segment_updater.clone();
+        let segment_updater = self.segment_updater.clone();

        let mut delete_cursor = self.delete_queue.cursor();

-        let mem_budget = self.memory_arena_in_bytes_per_thread;
+        let mem_budget = self.memory_budget_in_bytes_per_thread;
        let index = self.index.clone();
        let join_handle: JoinHandle<crate::Result<()>> = thread::Builder::new()
            .name(format!("thrd-tantivy-index{}", self.worker_id))
@@ -428,7 +429,7 @@ impl IndexWriter {
                        mem_budget,
                        index.new_segment(),
                        &mut document_iterator,
-                        &mut segment_updater,
+                        &segment_updater,
                        delete_cursor.clone(),
                    )?;
                }
@@ -554,7 +555,7 @@ impl IndexWriter {
        let new_index_writer: IndexWriter = IndexWriter::new(
            &self.index,
            self.num_threads,
-            self.memory_arena_in_bytes_per_thread,
+            self.memory_budget_in_bytes_per_thread,
            directory_lock,
        )?;

@@ -810,6 +811,7 @@ mod tests {
    use crate::collector::TopDocs;
    use crate::directory::error::LockError;
    use crate::error::*;
+    use crate::indexer::index_writer::MEMORY_BUDGET_NUM_BYTES_MIN;
    use crate::indexer::NoMergePolicy;
    use crate::query::{BooleanQuery, Occur, Query, QueryParser, TermQuery};
    use crate::schema::{
@@ -941,7 +943,7 @@ mod tests {
    fn test_empty_operations_group() {
        let schema_builder = schema::Schema::builder();
        let index = Index::create_in_ram(schema_builder.build());
-        let index_writer = index.writer(3_000_000).unwrap();
+        let index_writer = index.writer_for_tests().unwrap();
        let operations1 = vec![];
        let batch_opstamp1 = index_writer.run(operations1).unwrap();
        assert_eq!(batch_opstamp1, 0u64);
@@ -954,8 +956,8 @@ mod tests {
    fn test_lockfile_stops_duplicates() {
        let schema_builder = schema::Schema::builder();
        let index = Index::create_in_ram(schema_builder.build());
-        let _index_writer = index.writer(3_000_000).unwrap();
-        match index.writer(3_000_000) {
+        let _index_writer = index.writer_for_tests().unwrap();
+        match index.writer_for_tests() {
            Err(TantivyError::LockFailure(LockError::LockBusy, _)) => {}
            _ => panic!("Expected a `LockFailure` error"),
        }
@@ -979,7 +981,7 @@ mod tests {
    fn test_set_merge_policy() {
        let schema_builder = schema::Schema::builder();
        let index = Index::create_in_ram(schema_builder.build());
-        let index_writer = index.writer(3_000_000).unwrap();
+        let index_writer = index.writer_for_tests().unwrap();
        assert_eq!(
            format!("{:?}", index_writer.get_merge_policy()),
            "LogMergePolicy { min_num_segments: 8, max_docs_before_merge: 10000000, \
@@ -998,11 +1000,11 @@ mod tests {
        let schema_builder = schema::Schema::builder();
        let index = Index::create_in_ram(schema_builder.build());
        {
-            let _index_writer = index.writer(3_000_000).unwrap();
+            let _index_writer = index.writer_for_tests().unwrap();
            // the lock should be released when the
            // index_writer leaves the scope.
        }
-        let _index_writer_two = index.writer(3_000_000).unwrap();
+        let _index_writer_two = index.writer_for_tests().unwrap();
    }

    #[test]
@@ -1022,7 +1024,7 @@ mod tests {

        {
            // writing the segment
-            let mut index_writer = index.writer(3_000_000)?;
+            let mut index_writer = index.writer_for_tests()?;
            index_writer.add_document(doc!(text_field=>"a"))?;
            index_writer.rollback()?;
            assert_eq!(index_writer.commit_opstamp(), 0u64);
@@ -1054,7 +1056,7 @@ mod tests {
            reader.searcher().doc_freq(&term_a).unwrap()
        };
        // writing the segment
-        let mut index_writer = index.writer(12_000_000).unwrap();
+        let mut index_writer = index.writer_for_tests().unwrap();
        index_writer.add_document(doc!(text_field=>"a"))?;
        index_writer.commit()?;
        //  this should create 1 segment
@@ -1094,7 +1096,7 @@ mod tests {
            reader.searcher().doc_freq(&term_a).unwrap()
        };
        // writing the segment
-        let mut index_writer = index.writer(12_000_000).unwrap();
+        let mut index_writer = index.writer_for_tests().unwrap();
        index_writer.add_document(doc!(text_field=>"a"))?;
        index_writer.commit()?;
        index_writer.add_document(doc!(text_field=>"a"))?;
@@ -1140,7 +1142,7 @@ mod tests {
            reader.searcher().doc_freq(&term_a).unwrap()
        };
        // writing the segment
-        let mut index_writer = index.writer(12_000_000).unwrap();
+        let mut index_writer = index.writer(MEMORY_BUDGET_NUM_BYTES_MIN).unwrap();
        // create 8 segments with 100 tiny docs
        for _doc in 0..100 {
            index_writer.add_document(doc!(text_field=>"a"))?;
@@ -1196,7 +1198,8 @@ mod tests {

        {
            // writing the segment
-            let mut index_writer = index.writer_with_num_threads(4, 12_000_000)?;
+            let mut index_writer =
+                index.writer_with_num_threads(4, MEMORY_BUDGET_NUM_BYTES_MIN * 4)?;
            // create 8 segments with 100 tiny docs
            for _doc in 0..100 {
                index_writer.add_document(doc!(text_field => "a"))?;
@@ -1245,7 +1248,9 @@ mod tests {
            let term = Term::from_field_text(text_field, s);
            searcher.doc_freq(&term).unwrap()
        };
-        let mut index_writer = index.writer_with_num_threads(4, 12_000_000).unwrap();
+        let mut index_writer = index
+            .writer_with_num_threads(4, MEMORY_BUDGET_NUM_BYTES_MIN * 4)
+            .unwrap();

        let add_tstamp = index_writer.add_document(doc!(text_field => "a")).unwrap();
        let commit_tstamp = index_writer.commit().unwrap();
@@ -1262,7 +1267,9 @@ mod tests {
        let mut schema_builder = schema::Schema::builder();
        let text_field = schema_builder.add_text_field("text", TEXT);
        let index = Index::create_in_ram(schema_builder.build());
-        let mut index_writer = index.writer_with_num_threads(4, 12_000_000).unwrap();
+        let mut index_writer = index
+            .writer_with_num_threads(4, MEMORY_BUDGET_NUM_BYTES_MIN * 4)
+            .unwrap();

        let add_tstamp = index_writer.add_document(doc!(text_field => "a")).unwrap();

@@ -1311,7 +1318,9 @@ mod tests {
        let text_field = schema_builder.add_text_field("text", TEXT);
        let index = Index::create_in_ram(schema_builder.build());
        // writing the segment
-        let mut index_writer = index.writer_with_num_threads(4, 12_000_000).unwrap();
+        let mut index_writer = index
+            .writer_with_num_threads(4, MEMORY_BUDGET_NUM_BYTES_MIN * 4)
+            .unwrap();
        let res = index_writer.delete_all_documents();
        assert!(res.is_ok());

@@ -1338,7 +1347,9 @@ mod tests {
        let mut schema_builder = schema::Schema::builder();
        let text_field = schema_builder.add_text_field("text", TEXT);
        let index = Index::create_in_ram(schema_builder.build());
-        let mut index_writer = index.writer_with_num_threads(4, 12_000_000).unwrap();
+        let mut index_writer = index
+            .writer_with_num_threads(4, MEMORY_BUDGET_NUM_BYTES_MIN * 4)
+            .unwrap();

        // add one simple doc
        assert!(index_writer.add_document(doc!(text_field => "a")).is_ok());
@@ -1371,7 +1382,9 @@ mod tests {
    fn test_delete_all_documents_empty_index() {
        let schema_builder = schema::Schema::builder();
        let index = Index::create_in_ram(schema_builder.build());
-        let mut index_writer = index.writer_with_num_threads(4, 12_000_000).unwrap();
+        let mut index_writer = index
+            .writer_with_num_threads(4, MEMORY_BUDGET_NUM_BYTES_MIN * 4)
+            .unwrap();
        let clear = index_writer.delete_all_documents();
        let commit = index_writer.commit();
        assert!(clear.is_ok());
@@ -1382,7 +1395,9 @@ mod tests {
    fn test_delete_all_documents_index_twice() {
        let schema_builder = schema::Schema::builder();
        let index = Index::create_in_ram(schema_builder.build());
-        let mut index_writer = index.writer_with_num_threads(4, 12_000_000).unwrap();
+        let mut index_writer = index
+            .writer_with_num_threads(4, MEMORY_BUDGET_NUM_BYTES_MIN * 4)
+            .unwrap();
        let clear = index_writer.delete_all_documents();
        let commit = index_writer.commit();
        assert!(clear.is_ok());
@@ -1688,7 +1703,8 @@ mod tests {

        let old_reader = index.reader()?;

-        let id_exists = |id| id % 3 != 0; // 0 does not exist
+        // Every 3rd doc has only id field
+        let id_is_full_doc = |id| id % 3 != 0;

        let multi_text_field_text1 = "test1 test2 test3 test1 test2 test3";
        // rotate left
@@ -1704,7 +1720,7 @@ mod tests {
                    let facet = Facet::from(&("/cola/".to_string() + &id.to_string()));
                    let ip = ip_from_id(id);

-                    if !id_exists(id) {
+                    if !id_is_full_doc(id) {
                        // every 3rd doc has no ip field
                        index_writer.add_document(doc!(
                            id_field=>id,
@@ -1824,7 +1840,7 @@ mod tests {

        let num_docs_with_values = expected_ids_and_num_occurrences
            .iter()
-            .filter(|(id, _id_occurrences)| id_exists(**id))
+            .filter(|(id, _id_occurrences)| id_is_full_doc(**id))
            .map(|(_, id_occurrences)| *id_occurrences as usize)
            .sum::<usize>();

@@ -1848,7 +1864,7 @@ mod tests {
        if force_end_merge && num_segments_before_merge > 1 && num_segments_after_merge == 1 {
            let mut expected_multi_ips: Vec<_> = id_list
                .iter()
-                .filter(|id| id_exists(**id))
+                .filter(|id| id_is_full_doc(**id))
                .flat_map(|id| vec![ip_from_id(*id), ip_from_id(*id)])
                .collect();
            assert_eq!(num_ips, expected_multi_ips.len() as u32);
@@ -1886,7 +1902,7 @@ mod tests {
        let expected_ips = expected_ids_and_num_occurrences
            .keys()
            .flat_map(|id| {
-                if !id_exists(*id) {
+                if !id_is_full_doc(*id) {
                    None
                } else {
                    Some(Ipv6Addr::from_u128(*id as u128))
@@ -1898,7 +1914,7 @@ mod tests {
        let expected_ips = expected_ids_and_num_occurrences
            .keys()
            .filter_map(|id| {
-                if !id_exists(*id) {
+                if !id_is_full_doc(*id) {
                    None
                } else {
                    Some(Ipv6Addr::from_u128(*id as u128))
@@ -1933,7 +1949,7 @@ mod tests {
                let id = id_reader.first(doc).unwrap();

                let vals: Vec<u64> = ff_reader.values_for_doc(doc).collect();
-                if id_exists(id) {
+                if id_is_full_doc(id) {
                    assert_eq!(vals.len(), 2);
                    assert_eq!(vals[0], vals[1]);
                    assert!(expected_ids_and_num_occurrences.contains_key(&vals[0]));
@@ -1943,7 +1959,7 @@ mod tests {
                }

                let bool_vals: Vec<bool> = bool_ff_reader.values_for_doc(doc).collect();
-                if id_exists(id) {
+                if id_is_full_doc(id) {
                    assert_eq!(bool_vals.len(), 2);
                    assert_ne!(bool_vals[0], bool_vals[1]);
                } else {
@@ -1972,7 +1988,7 @@ mod tests {
                    .as_u64()
                    .unwrap();
                assert!(expected_ids_and_num_occurrences.contains_key(&id));
-                if id_exists(id) {
+                if id_is_full_doc(id) {
                    let id2 = store_reader
                        .get(doc_id)
                        .unwrap()
@@ -2019,7 +2035,7 @@ mod tests {
            let (existing_id, count) = (*id, *count);
            let get_num_hits = |field| do_search(&existing_id.to_string(), field).len() as u64;
            assert_eq!(get_num_hits(id_field), count);
-            if !id_exists(existing_id) {
+            if !id_is_full_doc(existing_id) {
                continue;
            }
            assert_eq!(get_num_hits(text_field), count);
@@ -2069,7 +2085,7 @@ mod tests {
        //
        for (existing_id, count) in &expected_ids_and_num_occurrences {
            let (existing_id, count) = (*existing_id, *count);
-            if !id_exists(existing_id) {
+            if !id_is_full_doc(existing_id) {
                continue;
            }
            let do_search_ip_field = |term: &str| do_search(term, ip_field).len() as u64;
@@ -2086,34 +2102,84 @@ mod tests {
            }
        }

-        // assert data is like expected
+        // Range query
        //
-        for (existing_id, count) in expected_ids_and_num_occurrences.iter().take(10) {
-            let (existing_id, count) = (*existing_id, *count);
-            if !id_exists(existing_id) {
-                continue;
-            }
-            let gen_query_inclusive = |field: &str, from: Ipv6Addr, to: Ipv6Addr| {
-                format!("{}:[{} TO {}]", field, &from.to_string(), &to.to_string())
+        // Take half as sample
+        let mut sample: Vec<_> = expected_ids_and_num_occurrences.iter().collect();
+        sample.sort_by_key(|(k, _num_occurences)| *k);
+        // sample.truncate(sample.len() / 2);
+        if !sample.is_empty() {
+            let (left_sample, right_sample) = sample.split_at(sample.len() / 2);
+
+            let expected_count = |sample: &[(&u64, &u64)]| {
+                sample
+                    .iter()
+                    .filter(|(id, _)| id_is_full_doc(**id))
+                    .map(|(_id, num_occurences)| **num_occurences)
+                    .sum::<u64>()
            };
-            let ip = ip_from_id(existing_id);
+            fn gen_query_inclusive<T1: ToString, T2: ToString>(
+                field: &str,
+                from: T1,
+                to: T2,
+            ) -> String {
+                format!("{}:[{} TO {}]", field, &from.to_string(), &to.to_string())
+            }

-            let do_search_ip_field = |term: &str| do_search(term, ip_field).len() as u64;
-            // Range query on single value field
-            let query = gen_query_inclusive("ip", ip, ip);
-            assert_eq!(do_search_ip_field(&query), count);
+            // Query first half
+            if !left_sample.is_empty() {
+                let expected_count = expected_count(left_sample);

-            // Range query on multi value field
-            let query = gen_query_inclusive("ips", ip, ip);
+                let start_range = *left_sample[0].0;
+                let end_range = *left_sample.last().unwrap().0;
+                let query = gen_query_inclusive("id_opt", start_range, end_range);
+                assert_eq!(do_search(&query, id_opt_field).len() as u64, expected_count);

-            assert_eq!(do_search_ip_field(&query), count);
+                // Range query on ip field
+                let ip1 = ip_from_id(start_range);
+                let ip2 = ip_from_id(end_range);
+                let do_search_ip_field = |term: &str| do_search(term, ip_field).len() as u64;
+                let query = gen_query_inclusive("ip", ip1, ip2);
+                assert_eq!(do_search_ip_field(&query), expected_count);
+                let query = gen_query_inclusive("ip", "*", ip2);
+                assert_eq!(do_search_ip_field(&query), expected_count);
+                // Range query on multi value field
+                let query = gen_query_inclusive("ips", ip1, ip2);
+                assert_eq!(do_search_ip_field(&query), expected_count);
+                let query = gen_query_inclusive("ips", "*", ip2);
+                assert_eq!(do_search_ip_field(&query), expected_count);
+            }
+            // Query second half
+            if !right_sample.is_empty() {
+                let expected_count = expected_count(right_sample);
+                let start_range = *right_sample[0].0;
+                let end_range = *right_sample.last().unwrap().0;
+                // Range query on id opt field
+                let query =
+                    gen_query_inclusive("id_opt", start_range.to_string(), end_range.to_string());
+                assert_eq!(do_search(&query, id_opt_field).len() as u64, expected_count);
+
+                // Range query on ip field
+                let ip1 = ip_from_id(start_range);
+                let ip2 = ip_from_id(end_range);
+                let do_search_ip_field = |term: &str| do_search(term, ip_field).len() as u64;
+                let query = gen_query_inclusive("ip", ip1, ip2);
+                assert_eq!(do_search_ip_field(&query), expected_count);
+                let query = gen_query_inclusive("ip", ip1, "*");
+                assert_eq!(do_search_ip_field(&query), expected_count);
+                // Range query on multi value field
+                let query = gen_query_inclusive("ips", ip1, ip2);
+                assert_eq!(do_search_ip_field(&query), expected_count);
+                let query = gen_query_inclusive("ips", ip1, "*");
+                assert_eq!(do_search_ip_field(&query), expected_count);
+            }
        }

        // ip range query on fast field
        //
        for (existing_id, count) in expected_ids_and_num_occurrences.iter().take(10) {
            let (existing_id, count) = (*existing_id, *count);
-            if !id_exists(existing_id) {
+            if !id_is_full_doc(existing_id) {
                continue;
            }
            let gen_query_inclusive = |field: &str, from: Ipv6Addr, to: Ipv6Addr| {
@@ -2141,7 +2207,7 @@ mod tests {
                .first_or_default_col(9999);
            for doc_id in segment_reader.doc_ids_alive() {
                let id = ff_reader.get_val(doc_id);
-                if !id_exists(id) {
+                if !id_is_full_doc(id) {
                    continue;
                }
                let facet_ords: Vec<u64> = facet_reader.facet_ords(doc_id).collect();
@@ -2179,6 +2245,12 @@ mod tests {
        Ok(index)
    }

+    #[test]
+    fn test_fast_field_range() {
+        let ops: Vec<_> = (0..1000).map(|id| IndexingOp::AddDoc { id }).collect();
+        assert!(test_operation_strategy(&ops, false, true).is_ok());
+    }
+
    #[test]
    fn test_sort_index_on_opt_field_regression() {
        assert!(test_operation_strategy(
@@ -2426,6 +2498,13 @@ mod tests {
        test_operation_strategy(&ops[..], false, true).unwrap();
    }

+    #[test]
+    fn test_merge_regression_1() {
+        use IndexingOp::*;
+        let ops = &[AddDoc { id: 15 }, Commit, AddDoc { id: 9 }, Commit, Merge];
+        test_operation_strategy(&ops[..], false, true).unwrap();
+    }
+
    #[test]
    fn test_range_query_bug_1() {
        use IndexingOp::*;
--- a/src/indexer/merger.rs
+++ b/src/indexer/merger.rs
@@ -178,7 +178,7 @@ impl IndexMerger {
        alive_bitset_opt: Vec<Option<AliveBitSet>>,
    ) -> crate::Result<IndexMerger> {
        let mut readers = vec![];
-        for (segment, new_alive_bitset_opt) in segments.iter().zip(alive_bitset_opt.into_iter()) {
+        for (segment, new_alive_bitset_opt) in segments.iter().zip(alive_bitset_opt) {
            if segment.meta().num_docs() > 0 {
                let reader =
                    SegmentReader::open_with_custom_alive_set(segment, new_alive_bitset_opt)?;
--- a/src/indexer/mod.rs
+++ b/src/indexer/mod.rs
@@ -89,7 +89,7 @@ mod tests_mmap {
        let parse_query = QueryParser::for_index(&index, Vec::new());
        {
            let query = parse_query
-                .parse_query(r#"json.k8s\.container\.name:prometheus"#)
+                .parse_query(r"json.k8s\.container\.name:prometheus")
                .unwrap();
            let num_docs = searcher.search(&query, &Count).unwrap();
            assert_eq!(num_docs, 1);
@@ -127,7 +127,7 @@ mod tests_mmap {
        }
        {
            let query = parse_query
-                .parse_query(r#"json.k8s\.container\.name:prometheus"#)
+                .parse_query(r"json.k8s\.container\.name:prometheus")
                .unwrap();
            let num_docs = searcher.search(&query, &Count).unwrap();
            assert_eq!(num_docs, 1);
--- a/src/indexer/segment_updater.rs
+++ b/src/indexer/segment_updater.rs
@@ -6,7 +6,6 @@ use std::path::PathBuf;
 use std::sync::atomic::{AtomicBool, Ordering};
 use std::sync::{Arc, RwLock};

-use fail::fail_point;
 use rayon::{ThreadPool, ThreadPoolBuilder};

 use super::segment_manager::SegmentManager;
@@ -43,7 +42,7 @@ pub(crate) fn save_metas(metas: &IndexMeta, directory: &dyn Directory) -> crate:
    let mut buffer = serde_json::to_vec_pretty(metas)?;
    // Just adding a new line at the end of the buffer.
    writeln!(&mut buffer)?;
-    fail_point!("save_metas", |msg| Err(crate::TantivyError::from(
+    crate::fail_point!("save_metas", |msg| Err(crate::TantivyError::from(
        std::io::Error::new(
            std::io::ErrorKind::Other,
            msg.unwrap_or_else(|| "Undefined".to_string())
--- a/src/indexer/segment_writer.rs
+++ b/src/indexer/segment_writer.rs
@@ -1,5 +1,6 @@
 use columnar::MonotonicallyMappableToU64;
 use itertools::Itertools;
+use tokenizer_api::BoxTokenStream;

 use super::doc_id_mapping::{get_doc_id_mapping_from_field, DocIdMapping};
 use super::operation::AddOperation;
@@ -15,7 +16,7 @@ use crate::postings::{
 use crate::schema::{FieldEntry, FieldType, Schema, Term, Value, DATE_TIME_PRECISION_INDEXED};
 use crate::store::{StoreReader, StoreWriter};
 use crate::tokenizer::{FacetTokenizer, PreTokenizedStream, TextAnalyzer, Tokenizer};
-use crate::{DocId, Document, Opstamp, SegmentComponent};
+use crate::{DocId, Document, Opstamp, SegmentComponent, TantivyError};

 /// Computes the initial size of the hash table.
 ///
@@ -25,6 +26,8 @@ use crate::{DocId, Document, Opstamp, SegmentComponent};
 fn compute_initial_table_size(per_thread_memory_budget: usize) -> crate::Result<usize> {
    let table_memory_upper_bound = per_thread_memory_budget / 3;
    (10..20) // We cap it at 2^19 = 512K capacity.
+        // TODO: There are cases where this limit causes a
+        // reallocation in the hashmap. Check if this affects performance.
        .map(|power| 1 << power)
        .take_while(|capacity| compute_table_memory_size(*capacity) < table_memory_upper_bound)
        .last()
@@ -98,14 +101,18 @@ impl SegmentWriter {
                    }
                    _ => None,
                };
-                text_options
-                    .and_then(|text_index_option| {
-                        let tokenizer_name = &text_index_option.tokenizer();
-                        tokenizer_manager.get(tokenizer_name)
-                    })
-                    .unwrap_or_default()
+                let tokenizer_name = text_options
+                    .map(|text_index_option| text_index_option.tokenizer())
+                    .unwrap_or("default");
+
+                tokenizer_manager.get(tokenizer_name).ok_or_else(|| {
+                    TantivyError::SchemaError(format!(
+                        "Error getting tokenizer for field: {}",
+                        field_entry.name()
+                    ))
+                })
            })
-            .collect();
+            .collect::<Result<Vec<_>, _>>()?;
        Ok(SegmentWriter {
            max_doc: 0,
            ctx: IndexingContext::new(table_size),
@@ -185,10 +192,11 @@ impl SegmentWriter {

            match field_entry.field_type() {
                FieldType::Facet(_) => {
+                    let mut facet_tokenizer = FacetTokenizer::default(); // this can be global
                    for value in values {
                        let facet = value.as_facet().ok_or_else(make_schema_error)?;
                        let facet_str = facet.encoded_str();
-                        let mut facet_tokenizer = FacetTokenizer.token_stream(facet_str);
+                        let mut facet_tokenizer = facet_tokenizer.token_stream(facet_str);
                        let mut indexing_position = IndexingPosition::default();
                        postings_writer.index_text(
                            doc_id,
@@ -204,11 +212,11 @@ impl SegmentWriter {
                    for value in values {
                        let mut token_stream = match value {
                            Value::PreTokStr(tok_str) => {
-                                PreTokenizedStream::from(tok_str.clone()).into()
+                                BoxTokenStream::new(PreTokenizedStream::from(tok_str.clone()))
                            }
                            Value::Str(ref text) => {
                                let text_analyzer =
-                                    &self.per_field_text_analyzers[field.field_id() as usize];
+                                    &mut self.per_field_text_analyzers[field.field_id() as usize];
                                text_analyzer.token_stream(text)
                            }
                            _ => {
@@ -304,7 +312,8 @@ impl SegmentWriter {
                    }
                }
                FieldType::JsonObject(json_options) => {
-                    let text_analyzer = &self.per_field_text_analyzers[field.field_id() as usize];
+                    let text_analyzer =
+                        &mut self.per_field_text_analyzers[field.field_id() as usize];
                    let json_values_it =
                        values.map(|value| value.as_json().ok_or_else(make_schema_error));
                    index_json_values(
@@ -436,7 +445,9 @@ fn remap_and_write(

 #[cfg(test)]
 mod tests {
-    use std::path::Path;
+    use std::path::{Path, PathBuf};
+
+    use tempfile::TempDir;

    use super::compute_initial_table_size;
    use crate::collector::Count;
@@ -444,7 +455,9 @@ mod tests {
    use crate::directory::RamDirectory;
    use crate::postings::TermInfo;
    use crate::query::PhraseQuery;
-    use crate::schema::{IndexRecordOption, Schema, Type, STORED, STRING, TEXT};
+    use crate::schema::{
+        IndexRecordOption, Schema, TextFieldIndexing, TextOptions, Type, STORED, STRING, TEXT,
+    };
    use crate::store::{Compressor, StoreReader, StoreWriter};
    use crate::time::format_description::well_known::Rfc3339;
    use crate::time::OffsetDateTime;
@@ -457,7 +470,7 @@ mod tests {
    fn test_hashmap_size() {
        assert_eq!(compute_initial_table_size(100_000).unwrap(), 1 << 11);
        assert_eq!(compute_initial_table_size(1_000_000).unwrap(), 1 << 14);
-        assert_eq!(compute_initial_table_size(10_000_000).unwrap(), 1 << 18);
+        assert_eq!(compute_initial_table_size(15_000_000).unwrap(), 1 << 18);
        assert_eq!(compute_initial_table_size(1_000_000_000).unwrap(), 1 << 19);
        assert_eq!(compute_initial_table_size(4_000_000_000).unwrap(), 1 << 19);
    }
@@ -898,4 +911,32 @@ mod tests {
        postings.positions(&mut positions);
        assert_eq!(positions, &[4]); //< as opposed to 3 if we had a position length of 1.
    }
+
+    #[test]
+    fn test_show_error_when_tokenizer_not_registered() {
+        let text_field_indexing = TextFieldIndexing::default()
+            .set_tokenizer("custom_en")
+            .set_index_option(IndexRecordOption::WithFreqsAndPositions);
+        let text_options = TextOptions::default()
+            .set_indexing_options(text_field_indexing)
+            .set_stored();
+        let mut schema_builder = Schema::builder();
+        schema_builder.add_text_field("title", text_options);
+        let schema = schema_builder.build();
+        let tempdir = TempDir::new().unwrap();
+        let tempdir_path = PathBuf::from(tempdir.path());
+        Index::create_in_dir(&tempdir_path, schema).unwrap();
+        let index = Index::open_in_dir(tempdir_path).unwrap();
+        let schema = index.schema();
+        let mut index_writer = index.writer(50_000_000).unwrap();
+        let title = schema.get_field("title").unwrap();
+        let mut document = Document::default();
+        document.add_text(title, "The Old Man and the Sea");
+        index_writer.add_document(document).unwrap();
+        let error = index_writer.commit().unwrap_err();
+        assert_eq!(
+            error.to_string(),
+            "Schema error: 'Error getting tokenizer for field: title'"
+        );
+    }
 }
--- a/src/indexer/stamper.rs
+++ b/src/indexer/stamper.rs
@@ -101,6 +101,7 @@ mod test {

    use super::Stamper;

+    #[allow(clippy::redundant_clone)]
    #[test]
    fn test_stamper() {
        let stamper = Stamper::new(7u64);
@@ -116,6 +117,7 @@ mod test {
        assert_eq!(stamper.stamp(), 15u64);
    }

+    #[allow(clippy::redundant_clone)]
    #[test]
    fn test_stamper_revert() {
        let stamper = Stamper::new(7u64);
--- a/src/lib.rs
+++ b/src/lib.rs
@@ -191,6 +191,7 @@ pub use crate::schema::{DateOptions, DateTimePrecision, Document, Term};
 /// Index format version.
 const INDEX_FORMAT_VERSION: u32 = 5;

+#[cfg(all(feature = "mmap", unix))]
 pub use memmap2::Advice;

 /// Structure version for the index.
@@ -298,9 +299,39 @@ pub struct DocAddress {
    pub doc_id: DocId,
 }

+#[macro_export]
+/// Enable fail_point if feature is enabled.
+macro_rules! fail_point {
+    ($name:expr) => {{
+        #[cfg(feature = "failpoints")]
+        {
+            fail::eval($name, |_| {
+                panic!("Return is not supported for the fail point \"{}\"", $name);
+            });
+        }
+    }};
+    ($name:expr, $e:expr) => {{
+        #[cfg(feature = "failpoints")]
+        {
+            if let Some(res) = fail::eval($name, $e) {
+                return res;
+            }
+        }
+    }};
+    ($name:expr, $cond:expr, $e:expr) => {{
+        #[cfg(feature = "failpoints")]
+        {
+            if $cond {
+                fail::fail_point!($name, $e);
+            }
+        }
+    }};
+}
+
 #[cfg(test)]
 pub mod tests {
    use common::{BinarySerializable, FixedSize};
+    use query_grammar::{UserInputAst, UserInputLeaf, UserInputLiteral};
    use rand::distributions::{Bernoulli, Uniform};
    use rand::rngs::StdRng;
    use rand::{Rng, SeedableRng};
@@ -856,6 +887,95 @@ pub mod tests {
        Ok(())
    }

+    #[test]
+    fn test_searcher_on_json_field_with_type_inference() {
+        // When indexing and searching a json value, we infer its type.
+        // This tests aims to check the type infereence is consistent between indexing and search.
+        // Inference order is date, i64, u64, f64, bool.
+        let mut schema_builder = Schema::builder();
+        let json_field = schema_builder.add_json_field("json", STORED | TEXT);
+        let schema = schema_builder.build();
+        let json_val: serde_json::Map<String, serde_json::Value> = serde_json::from_str(
+            r#"{
+            "signed": 2,
+            "float": 2.0,
+            "unsigned": 10000000000000,
+            "date": "1985-04-12T23:20:50.52Z",
+            "bool": true
+        }"#,
+        )
+        .unwrap();
+        let doc = doc!(json_field=>json_val);
+        let index = Index::create_in_ram(schema);
+        let mut writer = index.writer_for_tests().unwrap();
+        writer.add_document(doc).unwrap();
+        writer.commit().unwrap();
+        let reader = index.reader().unwrap();
+        let searcher = reader.searcher();
+        let get_doc_ids = |user_input_literal: UserInputLiteral| {
+            let query_parser = crate::query::QueryParser::for_index(&index, Vec::new());
+            let query = query_parser
+                .build_query_from_user_input_ast(UserInputAst::from(UserInputLeaf::Literal(
+                    user_input_literal,
+                )))
+                .unwrap();
+            searcher
+                .search(&query, &TEST_COLLECTOR_WITH_SCORE)
+                .map(|topdocs| topdocs.docs().to_vec())
+                .unwrap()
+        };
+        {
+            let user_input_literal = UserInputLiteral {
+                field_name: Some("json.signed".to_string()),
+                phrase: "2".to_string(),
+                delimiter: crate::query_grammar::Delimiter::None,
+                slop: 0,
+                prefix: false,
+            };
+            assert_eq!(get_doc_ids(user_input_literal), vec![DocAddress::new(0, 0)]);
+        }
+        {
+            let user_input_literal = UserInputLiteral {
+                field_name: Some("json.float".to_string()),
+                phrase: "2.0".to_string(),
+                delimiter: crate::query_grammar::Delimiter::None,
+                slop: 0,
+                prefix: false,
+            };
+            assert_eq!(get_doc_ids(user_input_literal), vec![DocAddress::new(0, 0)]);
+        }
+        {
+            let user_input_literal = UserInputLiteral {
+                field_name: Some("json.date".to_string()),
+                phrase: "1985-04-12T23:20:50.52Z".to_string(),
+                delimiter: crate::query_grammar::Delimiter::None,
+                slop: 0,
+                prefix: false,
+            };
+            assert_eq!(get_doc_ids(user_input_literal), vec![DocAddress::new(0, 0)]);
+        }
+        {
+            let user_input_literal = UserInputLiteral {
+                field_name: Some("json.unsigned".to_string()),
+                phrase: "10000000000000".to_string(),
+                delimiter: crate::query_grammar::Delimiter::None,
+                slop: 0,
+                prefix: false,
+            };
+            assert_eq!(get_doc_ids(user_input_literal), vec![DocAddress::new(0, 0)]);
+        }
+        {
+            let user_input_literal = UserInputLiteral {
+                field_name: Some("json.bool".to_string()),
+                phrase: "true".to_string(),
+                delimiter: crate::query_grammar::Delimiter::None,
+                slop: 0,
+                prefix: false,
+            };
+            assert_eq!(get_doc_ids(user_input_literal), vec![DocAddress::new(0, 0)]);
+        }
+    }
+
    #[test]
    fn test_doc_macro() {
        let mut schema_builder = Schema::builder();
--- a/src/positions/mod.rs
+++ b/src/positions/mod.rs
@@ -119,7 +119,7 @@ pub mod tests {
        serializer.close_term()?;
        serializer.close()?;
        let position_delta = OwnedBytes::new(positions_buffer);
-        let mut output_delta_pos_buffer = vec![0u32; 5];
+        let mut output_delta_pos_buffer = [0u32; 5];
        let mut position_reader = PositionReader::open(position_delta)?;
        position_reader.read(0, &mut output_delta_pos_buffer[..]);
        assert_eq!(
--- a/src/postings/mod.rs
+++ b/src/postings/mod.rs
@@ -162,7 +162,7 @@ pub mod tests {
        let index = Index::create_in_ram(schema);
        index
            .tokenizers()
-            .register("simple_no_truncation", SimpleTokenizer);
+            .register("simple_no_truncation", SimpleTokenizer::default());
        let reader = index.reader()?;
        let mut index_writer = index.writer_for_tests()?;

@@ -194,7 +194,7 @@ pub mod tests {
        let index = Index::create_in_ram(schema);
        index
            .tokenizers()
-            .register("simple_no_truncation", SimpleTokenizer);
+            .register("simple_no_truncation", SimpleTokenizer::default());
        let reader = index.reader()?;
        let mut index_writer = index.writer_for_tests()?;

@@ -225,7 +225,7 @@ pub mod tests {

        {
            let mut segment_writer =
-                SegmentWriter::for_segment(3_000_000, segment.clone()).unwrap();
+                SegmentWriter::for_segment(15_000_000, segment.clone()).unwrap();
            {
                // checking that position works if the field has two values
                let op = AddOperation {
--- a/src/postings/serializer.rs
+++ b/src/postings/serializer.rs
@@ -2,7 +2,6 @@ use std::cmp::Ordering;
 use std::io::{self, Write};

 use common::{BinarySerializable, CountingWriter, VInt};
-use fail::fail_point;

 use super::TermInfo;
 use crate::core::Segment;
@@ -205,7 +204,7 @@ impl<'a> FieldSerializer<'a> {
    /// If the current block is incomplete, it needs to be encoded
    /// using `VInt` encoding.
    pub fn close_term(&mut self) -> io::Result<()> {
-        fail_point!("FieldSerializer::close_term", |msg: Option<String>| {
+        crate::fail_point!("FieldSerializer::close_term", |msg: Option<String>| {
            Err(io::Error::new(io::ErrorKind::Other, format!("{msg:?}")))
        });
        if self.term_open {
--- a/Show More
+++ b/Show More
Author	SHA1	Message	Date
Pascal Seitz	722b6c5205	bump version	2023-10-25 20:41:07 +08:00
Pascal Seitz	0f2211ca44	increase min memory to 15MB for indexing With tantivy 0.20 the minimum memory consumption per SegmentWriter increased to 12MB. 7MB are for the different fast field collectors types (they could be lazily created). Increase the minimum memory from 3MB to 15MB. Change memory variable naming from arena to budget. closes #2156	2023-10-25 20:37:47 +08:00
PSeitz	21aabf961c	Fix range query (#2226 ) Fix range query end check in advance Rename vars to reduce ambiguity add tests Fixes #2225	2023-10-25 20:37:36 +08:00
PSeitz	49448b31c6	chore: Release (#2168 ) * chore: Release * update CHANGELOG	2023-09-01 13:58:58 +02:00
PSeitz	ebede0bed7	update CHANGELOG (#2167 )	2023-08-31 10:01:44 +02:00
PSeitz	b1d8b072db	add missing aggregation part 2 (#2149 ) * add missing aggregation part 2 Add missing support for: - Mixed types columns - Key of type string on numerical fields The special aggregation is slower than the integrated one in TermsAggregation and therefore not chosen by default, although it can cover all use cases. * simplify, add num_docs to empty	2023-08-31 07:55:33 +02:00
ethever.eth	ee6a7c2bbb	fix a small typo (#2165 ) Co-authored-by: famouscat <onismaa@gmail.com>	2023-08-30 20:14:26 +02:00
PSeitz	c4e2708901	fix clippy, fmt (#2162 )	2023-08-30 08:04:26 +02:00
PSeitz	5c8cfa50eb	add missing parameter for percentiles (#2157 )	2023-08-29 13:04:24 +02:00
PSeitz	73cb71762f	add missing parameter for stats,min,max,count,sum,avg (#2151 ) * add missing parameter for stats,min,max,count,sum,avg add missing parameter for stats,min,max,count,sum,avg closes #1913 partially #1789 * Apply suggestions from code review Co-authored-by: Paul Masurel <paul@quickwit.io> --------- Co-authored-by: Paul Masurel <paul@quickwit.io>	2023-08-28 08:59:51 +02:00
Harrison Burt	267dfe58d7	Fix testing on windows (#2155 ) * Fix missing trait imports * Fix building tests on windows * Revert other PR change	2023-08-27 09:20:44 +09:00
Harrison Burt	131c10d318	Fix missing trait imports (#2154 )	2023-08-27 09:20:26 +09:00
Chris Tam	e6cacc40a9	Remove outdated fast field documentation (#2145 )	2023-08-24 07:49:49 +02:00
PSeitz	48d4847b38	Improve aggregation error message (#2150 ) * Improve aggregation error message Improve aggregation error message by wrapping the deserialization with a custom struct. This deserialization variant is slower, since we need to keep the deserialized data around twice with this approach. For now the valid variants list is manually updated. This could be replaced with a proc macro. closes #2143 * Simpler implementation --------- Co-authored-by: Paul Masurel <paul@quickwit.io>	2023-08-23 20:52:15 +02:00
PSeitz	59460c767f	delayed column opening during merge (#2132 ) * lazy columnar merge This is the first part of addressing #3633 Instead of loading all Column into memory for the merge, only the current column_name group is loaded. This can be done since the sstable streams the columns lexicographically. * refactor * add rustdoc * replace iterator with BTreeMap	2023-08-21 08:55:35 +02:00
Paul Masurel	756156beaf	Fix doc	2023-08-17 17:47:45 +09:00
PSeitz	480763db0d	track memory arena memory usage (#2148 )	2023-08-16 18:19:42 +02:00
PSeitz	62ece86f24	track ff dictionary indexing memory consumption (#2147 )	2023-08-16 14:00:08 +02:00
Caleb Hattingh	52d9e6f298	Fix doc typos in count aggregation metric (#2127 )	2023-08-15 08:50:23 +02:00
Caleb Hattingh	47b315ff18	doc: escape the backslash (#2144 )	2023-08-14 19:10:07 +02:00
PSeitz	ed1deee902	fix sort index by date (#2124 ) closes #2112	2023-08-14 17:36:52 +02:00
PSeitz	2e109018b7	add missing parameter to term agg (#2103 ) * add missing parameter to term agg * move missing handling to block accessor * add multivalue test, fix multivalue case, add comments * add documentation, deactivate special case * cargo fmt * resolve merge conflict	2023-08-14 14:22:18 +02:00
Adam Reichold	22c35b1e00	Fix explanation of boost queries seeking beyond query result. (#2142 ) * Make current nightly Clippy happy. * Fix explanation of boost queries seeking beyond query result.	2023-08-14 11:59:11 +09:00
trinity-1686a	b92082b748	implement lenient parser (#2129 ) * move query parser to nom * add suupport for term grouping * initial work on infallible parser * fmt * add tests and fix minor parsing bugs * address review comments * add support for lenient queries in tantivy * make lenient parser report errors * allow mixing occur and bool in query	2023-08-08 15:41:29 +02:00
PSeitz	c2be6603a2	alternative mixed field aggregation collection (#2135 ) * alternative mixed field aggregation collection instead of having multiple accessor in one AggregationWithAccessor split it into multiple independent AggregationWithAccessor * Update src/aggregation/agg_req_with_accessor.rs Co-authored-by: Paul Masurel <paul@quickwit.io> --------- Co-authored-by: Paul Masurel <paul@quickwit.io>	2023-07-27 12:25:31 +02:00
Adam Reichold	c805f08ca7	Fix a few more upcoming Clippy lints (#2133 )	2023-07-24 17:07:57 +09:00
Adam Reichold	ccc0335158	Minor improvements to OwnedBytes (#2134 ) This makes it obvious where the `StableDerefTrait` is invoked and avoids `transmute` when only a lifetime needs to be extended. Furthermore, it makes use of `slice::split_at` where that seemed appropriate.	2023-07-24 17:06:33 +09:00
Adam Reichold	42acd334f4	Fixes the new deny-by-default incorrect_partial_ord_impl_on_ord_type Clippy lint (#2131 )	2023-07-21 11:36:17 +09:00
Adam Reichold	820f126075	Remove support for Brotli and Snappy compression (#2123 ) LZ4 provides fast and simple compression whereas Zstd is exceptionally flexible so that the additional support for Brotli and Snappy does not really add any distinct functionality on top of those two algorithms. Removing them reduces our maintenance burden and reduces the number of choices users have to make when setting up their project based on Tantivy.	2023-07-14 16:54:59 +09:00
Adam Reichold	7e6c4a1856	Include only built-in compression algorithms as enum variants (#2121 ) * Include only built-in compression algorithms as enum variants This enables compile-time errors when a compression algorithm is requested which is not actually enabled for the current Cargo project. The cost is that indexes using other compression algorithms cannot even be loaded (even though they are not fully accessible in any case). As a drive-by, this also fixes `--no-default-features` on `cfg(unix)`. * Provide more instructive error messages for unsupported, but not unknown compression variants.	2023-07-14 11:02:49 +09:00
Adam Reichold	5fafe4b1ab	Add missing query_terms impl for TermSetQuery. (#2120 )	2023-07-13 14:54:29 +02:00
PSeitz	1e7cd48cfa	remove allocations in split compound words (#2080 ) * remove allocations in split compound words * clear reused data	2023-07-13 09:43:02 +09:00
dependabot[bot]	7f51d85bbd	Update lru requirement from 0.10.0 to 0.11.0 (#2117 ) Updates the requirements on [lru](https://github.com/jeromefroe/lru-rs) to permit the latest version. - [Changelog](https://github.com/jeromefroe/lru-rs/blob/master/CHANGELOG.md) - [Commits](https://github.com/jeromefroe/lru-rs/compare/0.10.0...0.11.0) --- updated-dependencies: - dependency-name: lru dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2023-07-13 09:42:21 +09:00
PSeitz	ad76e32398	Update CHANGELOG.md (#2091 ) * Update CHANGELOG.md * Update CHANGELOG.md	2023-07-11 13:58:49 +08:00
dependabot[bot]	7575f9bf1c	Update itertools requirement from 0.10.3 to 0.11.0 (#2098 ) Updates the requirements on [itertools](https://github.com/rust-itertools/itertools) to permit the latest version. - [Changelog](https://github.com/rust-itertools/itertools/blob/master/CHANGELOG.md) - [Commits](https://github.com/rust-itertools/itertools/compare/v0.10.5...v0.11.0) --- updated-dependencies: - dependency-name: itertools dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2023-07-07 11:14:46 +02:00
Naveen Aiathurai	67bdf3f5f6	fixes order_by_u64_field and order_by_fast_field should allow sorting in ascending order #1676 (#2111 ) * feat: order_by_fast_field allows sorting using parameter order * chore: change the corresponding values to original one * chore: fix formatting issues * fix: first_or_default_col should also sort by order * chore: empty doc to testcase and docstest fixes * chore: fix failure tests * core: add empty document without fastfield * chore: fix fmt * chore: change variable name	2023-07-06 05:10:10 +02:00
François Massot	3c300666ad	Merge pull request #2110 from quickwit-oss/fulmicoton/dynamic-follow-up Add dynamic filters to text analyzer builder.	2023-07-03 21:49:24 +02:00
François Massot	b91d3f6be4	Clean comment on 'TextAnalyzerBuilder::filter_dynamic' method.	2023-07-03 18:45:59 +02:00
François Massot	a8e76513bb	Remove useless clone.	2023-07-03 22:05:11 +09:00
François Massot	0a23201338	Fix stackoverflow and add docs.	2023-07-03 22:05:11 +09:00
François Massot	81330aaf89	WIP	2023-07-03 22:05:10 +09:00
Paul Masurel	98a3b01992	Removing the BoxedTokenizer	2023-07-03 22:05:10 +09:00
Paul Masurel	d341520938	Dynamic follow up	2023-07-03 22:05:10 +09:00
François Massot	5c9af73e41	Followup fulmicoton poc.	2023-07-03 22:05:10 +09:00
Paul Masurel	ad4c940fa3	proof of concept for dynamic tokenizer.	2023-07-03 22:05:10 +09:00
Paul Masurel	910b0b0c61	Cargo fmt	2023-07-03 22:03:31 +09:00
PSeitz	3fef052bf1	fix flaky test (#2107 ) closes #2099	2023-06-29 14:30:56 +08:00
PSeitz	040554f2f9	Update to lz4_flex 0.11 (#2106 )	2023-06-29 14:16:00 +08:00
PSeitz	17186ca9c9	improve docs (#2105 )	2023-06-27 13:37:14 +08:00
François Massot	212d59c9ab	Merge pull request #2102 from quickwit-oss/fmassot/ngram-new-should-return-error Ngram tokenizer now returns an error with invalid arguments.	2023-06-27 05:36:09 +02:00
dependabot[bot]	1a1f252a3f	Update memmap2 requirement from 0.6.0 to 0.7.1 (#2104 ) Updates the requirements on [memmap2](https://github.com/RazrFalcon/memmap2-rs) to permit the latest version. - [Changelog](https://github.com/RazrFalcon/memmap2-rs/blob/master/CHANGELOG.md) - [Commits](https://github.com/RazrFalcon/memmap2-rs/compare/v0.6.0...v0.7.1) --- updated-dependencies: - dependency-name: memmap2 dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2023-06-27 05:15:43 +02:00
François Massot	d73706dede	Ngram tokenizer now returns an error with invalid arguments.	2023-06-25 20:13:24 +02:00
PSeitz	44850e1036	move fail dep to dev only (#2094 ) wasm compilation fails with dep only	2023-06-22 06:59:11 +02:00
Adam Reichold	3b0cbf8102	Cosmetic updates to the warmer example. (#2095 ) Just some cosmetic tweaks to make the example easier on the eyes as a colleague was staring at this for quite some time this week.	2023-06-22 11:25:01 +09:00
Adam Reichold	4aa131c3db	Make TextAnalyzerBuilder publically accessible (#2097 ) This way, client code can name the type to e.g. store it inside structs without resorting to generics and it means that its documentation is part of the crate documentation generated by `cargo doc`.	2023-06-22 11:24:21 +09:00
Naveen Aiathurai	59962097d0	fix: #2078 return error when tokenizer not found while indexing (#2093 ) * fix: #2078 return error when tokenizer not found while indexing * chore: formatting issues * chore: fix review comments	2023-06-16 04:33:55 +02:00
Adam Reichold	ebc78127f3	Add BytesFilterCollector to support filtering based on a bytes fast field (#2075 ) * Do some Clippy- and Cargo-related boy-scouting. * Add BytesFilterCollector to support filtering based on a bytes fast field This is basically a copy of the existing FilterCollector but modified and specialised to work on a bytes fast field. * Changed semantics of filter collectors to consider multi-valued fields	2023-06-13 14:19:58 +09:00
PSeitz	8199aa7de7	bump version to 0.20.2 (#2089 )	2023-06-12 18:56:54 +08:00
PSeitz	657f0cd3bd	add missing Bytes validation to term_agg (#2077 ) returns empty for now instead of failing like before	2023-06-12 16:38:07 +08:00
Adam Reichold	3a82ef2560	Fix is_child_of function not considering the root facet. (#2086 )	2023-06-12 08:35:18 +02:00
PSeitz	3546e7fc63	small agg limit docs improvement (#2073 ) small docs improvement as follow up on bug https://github.com/quickwit-oss/quickwit/issues/3503	2023-06-12 10:55:24 +09:00
PSeitz	862f367f9e	release without Alice in Wonderland, bump version to 0.20.1 (#2087 ) * Release without Alice in Wonderland * bump version to 0.20.1	2023-06-12 10:54:03 +09:00
PSeitz	14137d91c4	Update CHANGELOG.md (#2081 )	2023-06-12 10:53:40 +09:00
François Massot	924fc70cb5	Merge pull request #2088 from quickwit-oss/fmassot/align-type-priorities-for-json-numbers Align numerical type priority order on the search side.	2023-06-11 22:04:54 +02:00
François Massot	07023948aa	Add test that indexes and searches a JSON field.	2023-06-11 21:47:52 +02:00
François Massot	0cb53207ec	Fix tests.	2023-06-11 12:13:35 +02:00
François Massot	17c783b4db	Align numerical type priority order on the search side.	2023-06-11 11:49:27 +02:00
Harrison Burt	7220df8a09	Fix building on windows with mmap (#2070 ) * Fix windows build * Make pub * Update docs * Re arrange * Fix compilation error on unix * Fix unix borrows * Revert "Fix unix borrows" This reverts commit `c1d94fd12b`. * Fix unix borrows and revert original change * Fix warning * Cleaner code. --------- Co-authored-by: Paul Masurel <paul@quickwit.io>	2023-06-10 18:32:39 +02:00
PSeitz	e3eacb4388	release tantivy (#2083 ) * prerelease * chore: Release	2023-06-09 10:47:46 +02:00
PSeitz	fdecb79273	tokenizer-api: reduce Tokenizer overhead (#2062 ) * tokenizer-api: reduce Tokenizer overhead Previously a new `Token` for each text encountered was created, which contains `String::with_capacity(200)` In the new API the token_stream gets mutable access to the tokenizer, this allows state to be shared (in this PR Token is shared). Ideally the allocation for the BoxTokenStream would also be removed, but this may require some lifetime tricks. * simplify api * move lowercase and ascii folding buffer to global * empty Token text as default	2023-06-08 18:37:58 +08:00
PSeitz	27f202083c	Improve Termmap Indexing Performance +~30% (#2058 ) * update benchmark * Improve Termmap Indexing Performance +~30% This contains many small changes to improve Termmap performance. Most notably: * Specialized byte compare and equality versions, instead of glibc calls. * ExpUnrolledLinkedList to not contain inline items. Allow compare hash only via a feature flag compare_hash_only: 64bits should be enough with a good hash function to compare strings by their hashes instead of comparing the strings. Disabled by default CreateHashMap/alice/174693 time: [642.23 µs 643.80 µs 645.24 µs] thrpt: [258.20 MiB/s 258.78 MiB/s 259.41 MiB/s] change: time: [-14.429% -13.303% -12.348%] (p = 0.00 < 0.05) thrpt: [+14.088% +15.344% +16.862%] Performance has improved. CreateHashMap/alice_expull/174693 time: [877.03 µs 880.44 µs 884.67 µs] thrpt: [188.32 MiB/s 189.22 MiB/s 189.96 MiB/s] change: time: [-26.460% -26.274% -26.091%] (p = 0.00 < 0.05) thrpt: [+35.301% +35.637% +35.981%] Performance has improved. CreateHashMap/numbers_zipf/8000000 time: [9.1198 ms 9.1573 ms 9.1961 ms] thrpt: [829.64 MiB/s 833.15 MiB/s 836.57 MiB/s] change: time: [-35.229% -34.828% -34.384%] (p = 0.00 < 0.05) thrpt: [+52.403% +53.440% +54.390%] Performance has improved. * clippy * add bench for ids * inline(always) to inline whole block with bounds checks * cleanup	2023-06-08 11:13:52 +02:00
PSeitz	ccb09aaa83	allow histogram bounds to be passed as Rfc3339 (#2076 )	2023-06-08 09:07:08 +02:00
Valerii	4b7c485a08	feat: add stop words for Hungarian language (#2069 )	2023-06-02 07:26:03 +02:00
PSeitz	3942fc6d2b	update CHANGELOG (#2068 )	2023-06-02 05:00:12 +02:00
Adam Reichold	b325d569ad	Expose phrase-prefix queries via the built-in query parser (#2044 ) * Expose phrase-prefix queries via the built-in query parser This proposes the less-than-imaginative syntax `field:"phrase ter"` to perform a phrase prefix query against `field` using `phrase` and `ter` as the terms. The aim of this is to make this type of query more discoverable and simplify manual testing. I did consider exposing the `max_expansions` parameter similar to how slop is handled, but I think that this is rather something that should be configured via the querser parser (similar to `set_field_boost` and `set_field_fuzzy`) as choosing it requires rather intimiate knowledge of the backing index. Prevent construction of zero or one term phrase-prefix queries via the query parser. * Add example using phrase-prefix search via surface API to improve feature discoverability.	2023-06-01 13:03:16 +02:00
Paul Masurel	7ee78bda52	Readding s in datetime precision variant names (#2065 ) There is no clear win and it change some serialization in quickwit.	2023-06-01 06:39:46 +02:00
Paul Masurel	184a9daa8a	Cancels concurrently running actions for the same PR. (#2067 )	2023-06-01 12:57:38 +09:00
Paul Masurel	47e01b345b	Simplified linear probing code (#2066 )	2023-06-01 04:58:42 +02:00