skip estimate phase for merge multivalue index

precompute stats for merge multivalue index + disable Line encoding for multivalue index. That combination allows to skip the first estimation pass. This gives up to 2x on merge performance on multivalue indices. This change may decrease compression as Line is very good compressible for documents, which have a fixed amount of values in each doc. The line codec should be replaced. ``` merge_multi_and_multi Avg: 22.7880ms (-47.15%) Median: 22.5469ms (-47.38%) [22.3691ms .. 25.8392ms] merge_dense_and_dense Avg: 14.4398ms (+2.18%) Median: 14.2465ms (+0.74%) [14.1620ms .. 16.1270ms] merge_sparse_and_sparse Avg: 10.6559ms (+1.10%) Median: 10.6318ms (+0.91%) [10.5527ms .. 11.2848ms] merge_sparse_and_dense Avg: 12.4886ms (+1.52%) Median: 12.4044ms (+0.84%) [12.3261ms .. 13.9439ms] merge_multi_and_dense Avg: 25.6686ms (-45.56%) Median: 25.4851ms (-45.84%) [25.1618ms .. 27.6226ms] merge_multi_and_sparse Avg: 24.3278ms (-47.00%) Median: 24.1917ms (-47.34%) [23.7159ms .. 27.0513ms] ```
fix compiler warning, cleanup (#2393 )
2026-06-10 20:40:42 +00:00 · 2024-06-11 20:22:00 +08:00 · 2024-06-11 16:03:50 +08:00 · 2024-06-11 16:02:57 +08:00 · 2024-06-10 11:19:01 +02:00 · 2024-06-10 16:26:16 +08:00
149 changed files with 5045 additions and 2473 deletions
--- a/.github/workflows/coverage.yml
+++ b/.github/workflows/coverage.yml
@@ -15,11 +15,11 @@ jobs:
    steps:
      - uses: actions/checkout@v4
      - name: Install Rust
-        run: rustup toolchain install nightly-2023-09-10 --profile minimal --component llvm-tools-preview
+        run: rustup toolchain install nightly-2024-04-10 --profile minimal --component llvm-tools-preview
      - uses: Swatinem/rust-cache@v2
      - uses: taiki-e/install-action@cargo-llvm-cov
      - name: Generate code coverage
-        run: cargo +nightly-2023-09-10 llvm-cov --all-features --workspace --doctests --lcov --output-path lcov.info
+        run: cargo +nightly-2024-04-10 llvm-cov --all-features --workspace --doctests --lcov --output-path lcov.info
      - name: Upload coverage to Codecov
        uses: codecov/codecov-action@v3
        continue-on-error: true
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -1,3 +1,65 @@
+Tantivy 0.22
+================================
+
+Tantivy 0.22 will be able to read indices created with Tantivy 0.21.
+
+#### Bugfixes
+- Fix null byte handling in JSON paths (null bytes in json keys caused panic during indexing) [#2345](https://github.com/quickwit-oss/tantivy/pull/2345)(@PSeitz)
+- Fix bug that can cause `get_docids_for_value_range` to panic. [#2295](https://github.com/quickwit-oss/tantivy/pull/2295)(@fulmicoton)
+- Avoid 1 document indices by increase min memory to 15MB for indexing [#2176](https://github.com/quickwit-oss/tantivy/pull/2176)(@PSeitz)
+- Fix merge panic for JSON fields [#2284](https://github.com/quickwit-oss/tantivy/pull/2284)(@PSeitz)
+- Fix bug occuring when merging JSON object indexed with positions. [#2253](https://github.com/quickwit-oss/tantivy/pull/2253)(@fulmicoton)
+- Fix empty DateHistogram gap bug [#2183](https://github.com/quickwit-oss/tantivy/pull/2183)(@PSeitz)
+- Fix range query end check (fields with less than 1 value per doc are affected) [#2226](https://github.com/quickwit-oss/tantivy/pull/2226)(@PSeitz)
+- Handle exclusive out of bounds ranges on fastfield range queries [#2174](https://github.com/quickwit-oss/tantivy/pull/2174)(@PSeitz)
+
+#### Breaking API Changes
+- rename ReloadPolicy onCommit to onCommitWithDelay [#2235](https://github.com/quickwit-oss/tantivy/pull/2235)(@giovannicuccu)
+- Move exports from the root into modules [#2220](https://github.com/quickwit-oss/tantivy/pull/2220)(@PSeitz)
+- Accept field name instead of `Field` in FilterCollector [#2196](https://github.com/quickwit-oss/tantivy/pull/2196)(@PSeitz)
+- remove deprecated IntOptions and DateTime [#2353](https://github.com/quickwit-oss/tantivy/pull/2353)(@PSeitz)
+
+#### Features/Improvements
+- Tantivy documents as a trait: Index data directly without converting to tantivy types first [#2071](https://github.com/quickwit-oss/tantivy/pull/2071)(@ChillFish8)
+- encode some part of posting list as -1 instead of direct values (smaller inverted indices) [#2185](https://github.com/quickwit-oss/tantivy/pull/2185)(@trinity-1686a)
+- **Aggregation**
+  - Support to deserialize f64 from string [#2311](https://github.com/quickwit-oss/tantivy/pull/2311)(@PSeitz)
+  - Add a top_hits aggregator [#2198](https://github.com/quickwit-oss/tantivy/pull/2198)(@ditsuke)
+  - Support bool type in term aggregation [#2318](https://github.com/quickwit-oss/tantivy/pull/2318)(@PSeitz)
+  - Support ip adresses in term aggregation [#2319](https://github.com/quickwit-oss/tantivy/pull/2319)(@PSeitz)
+  - Support date type in term aggregation [#2172](https://github.com/quickwit-oss/tantivy/pull/2172)(@PSeitz)
+  - Support escaped dot when addressing field [#2250](https://github.com/quickwit-oss/tantivy/pull/2250)(@PSeitz)
+
+- Add ExistsQuery to check documents that have a value [#2160](https://github.com/quickwit-oss/tantivy/pull/2160)(@imotov)
+- Expose TopDocs::order_by_u64_field again [#2282](https://github.com/quickwit-oss/tantivy/pull/2282)(@ditsuke)
+
+- **Memory/Performance**
+  - Faster TopN: replace BinaryHeap with TopNComputer [#2186](https://github.com/quickwit-oss/tantivy/pull/2186)(@PSeitz)
+  - reduce number of allocations during indexing [#2257](https://github.com/quickwit-oss/tantivy/pull/2257)(@PSeitz)
+  - Less Memory while indexing: docid deltas while indexing [#2249](https://github.com/quickwit-oss/tantivy/pull/2249)(@PSeitz)
+  - Faster indexing: use term hashmap in fastfield [#2243](https://github.com/quickwit-oss/tantivy/pull/2243)(@PSeitz)
+  - term hashmap remove copy in is_empty, unused unordered_id [#2229](https://github.com/quickwit-oss/tantivy/pull/2229)(@PSeitz)
+  - add method to fetch block of first values in columnar [#2330](https://github.com/quickwit-oss/tantivy/pull/2330)(@PSeitz)
+  - Faster aggregations: add fast path for full columns in fetch_block [#2328](https://github.com/quickwit-oss/tantivy/pull/2328)(@PSeitz)
+  - Faster sstable loading: use fst for sstable index [#2268](https://github.com/quickwit-oss/tantivy/pull/2268)(@trinity-1686a)
+
+- **QueryParser**
+  - allow newline where we allow space in query parser [#2302](https://github.com/quickwit-oss/tantivy/pull/2302)(@trinity-1686a)
+  - allow some mixing of occur and bool in strict query parser [#2323](https://github.com/quickwit-oss/tantivy/pull/2323)(@trinity-1686a)
+  - handle * inside term in lenient query parser [#2228](https://github.com/quickwit-oss/tantivy/pull/2228)(@trinity-1686a)
+  - add support for exists query syntax in query parser [#2170](https://github.com/quickwit-oss/tantivy/pull/2170)(@trinity-1686a)
+- Add shared search executor [#2312](https://github.com/quickwit-oss/tantivy/pull/2312)(@MochiXu)
+- Truncate keys to u16::MAX in term hashmap [#2299](https://github.com/quickwit-oss/tantivy/pull/2299)(@PSeitz)
+- report if a term matched when warming up posting list [#2309](https://github.com/quickwit-oss/tantivy/pull/2309)(@trinity-1686a)
+- Support json fields in FuzzyTermQuery [#2173](https://github.com/quickwit-oss/tantivy/pull/2173)(@PingXia-at)
+- Read list of fields encoded in term dictionary for JSON fields [#2184](https://github.com/quickwit-oss/tantivy/pull/2184)(@PSeitz)
+- add collect_block to BoxableSegmentCollector [#2331](https://github.com/quickwit-oss/tantivy/pull/2331)(@PSeitz)
+- expose collect_block buffer size [#2326](https://github.com/quickwit-oss/tantivy/pull/2326)(@PSeitz)
+- Forward regex parser errors [#2288](https://github.com/quickwit-oss/tantivy/pull/2288)(@adamreichold)
+- Make FacetCounts defaultable and cloneable. [#2322](https://github.com/quickwit-oss/tantivy/pull/2322)(@adamreichold)
+- Derive Debug for SchemaBuilder [#2254](https://github.com/quickwit-oss/tantivy/pull/2254)(@GodTamIt)
+- add missing inlines to tantivy options [#2245](https://github.com/quickwit-oss/tantivy/pull/2245)(@PSeitz)
+
 Tantivy 0.21.1
 ================================
 #### Bugfixes
--- a/Cargo.toml
+++ b/Cargo.toml
@@ -1,6 +1,6 @@
 [package]
 name = "tantivy"
-version = "0.22.0-dev"
+version = "0.23.0"
 authors = ["Paul Masurel <paul.masurel@gmail.com>"]
 license = "MIT"
 categories = ["database-implementations", "data-structures"]
@@ -11,16 +11,19 @@ repository = "https://github.com/quickwit-oss/tantivy"
 readme = "README.md"
 keywords = ["search", "information", "retrieval"]
 edition = "2021"
-rust-version = "1.62"
+rust-version = "1.63"
 exclude = ["benches/*.json", "benches/*.txt"]

 [dependencies]
-oneshot = "0.1.5"
+oneshot = "0.1.7"
 base64 = "0.22.0"
 byteorder = "1.4.3"
 crc32fast = "1.3.2"
 once_cell = "1.10.0"
-regex = { version = "1.5.5", default-features = false, features = ["std", "unicode"] }
+regex = { version = "1.5.5", default-features = false, features = [
+    "std",
+    "unicode",
+] }
 aho-corasick = "1.0"
 tantivy-fst = "0.5"
 memmap2 = { version = "0.9.0", optional = true }
@@ -30,14 +33,15 @@ tempfile = { version = "3.3.0", optional = true }
 log = "0.4.16"
 serde = { version = "1.0.136", features = ["derive"] }
 serde_json = "1.0.79"
-num_cpus = "1.13.1"
 fs4 = { version = "0.8.0", optional = true }
 levenshtein_automata = "0.2.1"
 uuid = { version = "1.0.0", features = ["v4", "serde"] }
 crossbeam-channel = "0.5.4"
 rust-stemmers = "1.2.0"
 downcast-rs = "1.2.0"
-bitpacking = { version = "0.9.2", default-features = false, features = ["bitpacker4x"] }
+bitpacking = { version = "0.9.2", default-features = false, features = [
+    "bitpacker4x",
+] }
 census = "0.4.2"
 rustc-hash = "1.1.0"
 thiserror = "1.0.30"
@@ -48,18 +52,18 @@ smallvec = "1.8.0"
 rayon = "1.5.2"
 lru = "0.12.0"
 fastdivide = "0.4.0"
-itertools = "0.12.0"
+itertools = "0.13.0"
 measure_time = "0.8.2"
 arc-swap = "1.5.0"

-columnar = { version= "0.2", path="./columnar", package ="tantivy-columnar" }
-sstable = { version= "0.2", path="./sstable", package ="tantivy-sstable", optional = true }
-stacker = { version= "0.2", path="./stacker", package ="tantivy-stacker" }
-query-grammar = { version= "0.21.0", path="./query-grammar", package = "tantivy-query-grammar" }
-tantivy-bitpacker = { version= "0.5", path="./bitpacker" }
-common = { version= "0.6", path = "./common/", package = "tantivy-common" }
-tokenizer-api = { version= "0.2", path="./tokenizer-api", package="tantivy-tokenizer-api" }
-sketches-ddsketch = { version = "0.2.1", features = ["use_serde"] }
+columnar = { version = "0.3", path = "./columnar", package = "tantivy-columnar" }
+sstable = { version = "0.3", path = "./sstable", package = "tantivy-sstable", optional = true }
+stacker = { version = "0.3", path = "./stacker", package = "tantivy-stacker" }
+query-grammar = { version = "0.22.0", path = "./query-grammar", package = "tantivy-query-grammar" }
+tantivy-bitpacker = { version = "0.6", path = "./bitpacker" }
+common = { version = "0.7", path = "./common/", package = "tantivy-common" }
+tokenizer-api = { version = "0.3", path = "./tokenizer-api", package = "tantivy-tokenizer-api" }
+sketches-ddsketch = { version = "0.3.0", features = ["use_serde"] }
 futures-util = { version = "0.3.28", optional = true }
 fnv = "1.0.7"

@@ -67,6 +71,7 @@ fnv = "1.0.7"
 winapi = "0.3.9"

 [dev-dependencies]
+binggan = "0.8.0"
 rand = "0.8.5"
 maplit = "1.0.2"
 matches = "0.1.9"
@@ -78,6 +83,9 @@ paste = "1.0.11"
 more-asserts = "0.3.1"
 rand_distr = "0.4.3"
 time = { version = "0.3.10", features = ["serde-well-known", "macros"] }
+postcard = { version = "1.0.4", features = [
+  "use-std",
+], default-features = false }

 [target.'cfg(not(windows))'.dev-dependencies]
 criterion = { version = "0.5", default-features = false }
@@ -109,17 +117,26 @@ lz4-compression = ["lz4_flex"]
 zstd-compression = ["zstd"]

 failpoints = ["fail", "fail/failpoints"]
-unstable = [] # useful for benches.
+unstable = []                            # useful for benches.

 quickwit = ["sstable", "futures-util"]

-# Compares only the hash of a string when indexing data. 
+# Compares only the hash of a string when indexing data.
 # Increases indexing speed, but may lead to extremely rare missing terms, when there's a hash collision.
 # Uses 64bit ahash.
 compare_hash_only = ["stacker/compare_hash_only"]

 [workspace]
-members = ["query-grammar", "bitpacker", "common", "ownedbytes", "stacker", "sstable", "tokenizer-api", "columnar"]
+members = [
+    "query-grammar",
+    "bitpacker",
+    "common",
+    "ownedbytes",
+    "stacker",
+    "sstable",
+    "tokenizer-api",
+    "columnar",
+]

 # Following the "fail" crate best practises, we isolate
 # tests that define specific behavior in fail check points
@@ -140,3 +157,7 @@ harness = false
 [[bench]]
 name = "index-bench"
 harness = false
+
+[[bench]]
+name = "agg_bench"
+harness = false
--- a/benches/agg_bench.rs
+++ b/benches/agg_bench.rs
@@ -0,0 +1,419 @@
+use binggan::{black_box, InputGroup, PeakMemAlloc, INSTRUMENTED_SYSTEM};
+use rand::prelude::SliceRandom;
+use rand::rngs::StdRng;
+use rand::{Rng, SeedableRng};
+use rand_distr::Distribution;
+use serde_json::json;
+use tantivy::aggregation::agg_req::Aggregations;
+use tantivy::aggregation::AggregationCollector;
+use tantivy::query::{AllQuery, TermQuery};
+use tantivy::schema::{IndexRecordOption, Schema, TextFieldIndexing, FAST, STRING};
+use tantivy::{doc, Index, Term};
+
+#[global_allocator]
+pub static GLOBAL: &PeakMemAlloc<std::alloc::System> = &INSTRUMENTED_SYSTEM;
+
+/// Mini macro to register a function via its name
+/// runner.register("average_u64", move |index| average_u64(index));
+macro_rules! register {
+    ($runner:expr, $func:ident) => {
+        $runner.register(stringify!($func), move |index| $func(index))
+    };
+}
+
+fn main() {
+    let inputs = vec![
+        ("full", get_test_index_bench(Cardinality::Full).unwrap()),
+        (
+            "dense",
+            get_test_index_bench(Cardinality::OptionalDense).unwrap(),
+        ),
+        (
+            "sparse",
+            get_test_index_bench(Cardinality::OptionalSparse).unwrap(),
+        ),
+        (
+            "multivalue",
+            get_test_index_bench(Cardinality::Multivalued).unwrap(),
+        ),
+    ];
+
+    bench_agg(InputGroup::new_with_inputs(inputs));
+}
+
+fn bench_agg(mut group: InputGroup<Index>) {
+    group.set_alloc(GLOBAL); // Set the peak mem allocator. This will enable peak memory reporting.
+    register!(group, average_u64);
+    register!(group, average_f64);
+    register!(group, average_f64_u64);
+    register!(group, stats_f64);
+    register!(group, extendedstats_f64);
+    register!(group, percentiles_f64);
+    register!(group, terms_few);
+    register!(group, terms_many);
+    register!(group, terms_many_order_by_term);
+    register!(group, terms_many_with_top_hits);
+    register!(group, terms_many_with_avg_sub_agg);
+    register!(group, terms_many_json_mixed_type_with_sub_agg_card);
+    register!(group, range_agg);
+    register!(group, range_agg_with_avg_sub_agg);
+    register!(group, range_agg_with_term_agg_few);
+    register!(group, range_agg_with_term_agg_many);
+    register!(group, histogram);
+    register!(group, histogram_hard_bounds);
+    register!(group, histogram_with_avg_sub_agg);
+    register!(group, avg_and_range_with_avg_sub_agg);
+
+    group.run();
+}
+
+fn exec_term_with_agg(index: &Index, agg_req: serde_json::Value) {
+    let agg_req: Aggregations = serde_json::from_value(agg_req).unwrap();
+
+    let reader = index.reader().unwrap();
+    let text_field = reader.searcher().schema().get_field("text").unwrap();
+    let term_query = TermQuery::new(
+        Term::from_field_text(text_field, "cool"),
+        IndexRecordOption::Basic,
+    );
+    let collector = get_collector(agg_req);
+    let searcher = reader.searcher();
+    black_box(searcher.search(&term_query, &collector).unwrap());
+}
+
+fn average_u64(index: &Index) {
+    let agg_req = json!({
+        "average": { "avg": { "field": "score", } }
+    });
+    exec_term_with_agg(index, agg_req)
+}
+fn average_f64(index: &Index) {
+    let agg_req = json!({
+        "average": { "avg": { "field": "score_f64", } }
+    });
+    exec_term_with_agg(index, agg_req)
+}
+fn average_f64_u64(index: &Index) {
+    let agg_req = json!({
+        "average_f64": { "avg": { "field": "score_f64" } },
+        "average": { "avg": { "field": "score" } },
+    });
+    exec_term_with_agg(index, agg_req)
+}
+fn stats_f64(index: &Index) {
+    let agg_req = json!({
+        "average_f64": { "stats": { "field": "score_f64", } }
+    });
+    exec_term_with_agg(index, agg_req)
+}
+fn extendedstats_f64(index: &Index) {
+    let agg_req = json!({
+        "extendedstats_f64": { "extended_stats": { "field": "score_f64", } }
+    });
+    exec_term_with_agg(index, agg_req)
+}
+fn percentiles_f64(index: &Index) {
+    let agg_req = json!({
+      "mypercentiles": {
+        "percentiles": {
+          "field": "score_f64",
+          "percents": [ 95, 99, 99.9 ]
+        }
+      }
+    });
+    execute_agg(index, agg_req);
+}
+fn terms_few(index: &Index) {
+    let agg_req = json!({
+        "my_texts": { "terms": { "field": "text_few_terms" } },
+    });
+    execute_agg(index, agg_req);
+}
+fn terms_many(index: &Index) {
+    let agg_req = json!({
+        "my_texts": { "terms": { "field": "text_many_terms" } },
+    });
+    execute_agg(index, agg_req);
+}
+fn terms_many_order_by_term(index: &Index) {
+    let agg_req = json!({
+        "my_texts": { "terms": { "field": "text_many_terms", "order": { "_key": "desc" } } },
+    });
+    execute_agg(index, agg_req);
+}
+fn terms_many_with_top_hits(index: &Index) {
+    let agg_req = json!({
+        "my_texts": {
+            "terms": { "field": "text_many_terms" },
+            "aggs": {
+                "top_hits": { "top_hits":
+                    {
+                        "sort": [
+                            { "score": "desc" }
+                        ],
+                        "size": 2,
+                        "doc_value_fields": ["score_f64"]
+                    }
+                }
+            }
+        },
+    });
+    execute_agg(index, agg_req);
+}
+fn terms_many_with_avg_sub_agg(index: &Index) {
+    let agg_req = json!({
+        "my_texts": {
+            "terms": { "field": "text_many_terms" },
+            "aggs": {
+                "average_f64": { "avg": { "field": "score_f64" } }
+            }
+        },
+    });
+    execute_agg(index, agg_req);
+}
+fn terms_many_json_mixed_type_with_sub_agg_card(index: &Index) {
+    let agg_req = json!({
+        "my_texts": {
+            "terms": { "field": "json.mixed_type" },
+            "aggs": {
+                "average_f64": { "avg": { "field": "score_f64" } }
+            }
+        },
+    });
+    execute_agg(index, agg_req);
+}
+
+fn execute_agg(index: &Index, agg_req: serde_json::Value) {
+    let agg_req: Aggregations = serde_json::from_value(agg_req).unwrap();
+    let collector = get_collector(agg_req);
+
+    let reader = index.reader().unwrap();
+    let searcher = reader.searcher();
+    black_box(searcher.search(&AllQuery, &collector).unwrap());
+}
+fn range_agg(index: &Index) {
+    let agg_req = json!({
+        "range_f64": { "range": { "field": "score_f64", "ranges": [
+            { "from": 3, "to": 7000 },
+            { "from": 7000, "to": 20000 },
+            { "from": 20000, "to": 30000 },
+            { "from": 30000, "to": 40000 },
+            { "from": 40000, "to": 50000 },
+            { "from": 50000, "to": 60000 }
+        ] } },
+    });
+    execute_agg(index, agg_req);
+}
+fn range_agg_with_avg_sub_agg(index: &Index) {
+    let agg_req = json!({
+        "rangef64": {
+            "range": {
+                "field": "score_f64",
+                "ranges": [
+                    { "from": 3, "to": 7000 },
+                    { "from": 7000, "to": 20000 },
+                    { "from": 20000, "to": 30000 },
+                    { "from": 30000, "to": 40000 },
+                    { "from": 40000, "to": 50000 },
+                    { "from": 50000, "to": 60000 }
+                ]
+            },
+            "aggs": {
+                "average_f64": { "avg": { "field": "score_f64" } }
+            }
+        },
+    });
+    execute_agg(index, agg_req);
+}
+
+fn range_agg_with_term_agg_few(index: &Index) {
+    let agg_req = json!({
+        "rangef64": {
+            "range": {
+                "field": "score_f64",
+                "ranges": [
+                    { "from": 3, "to": 7000 },
+                    { "from": 7000, "to": 20000 },
+                    { "from": 20000, "to": 30000 },
+                    { "from": 30000, "to": 40000 },
+                    { "from": 40000, "to": 50000 },
+                    { "from": 50000, "to": 60000 }
+                ]
+            },
+            "aggs": {
+                "my_texts": { "terms": { "field": "text_few_terms" } },
+            }
+        },
+    });
+    execute_agg(index, agg_req);
+}
+fn range_agg_with_term_agg_many(index: &Index) {
+    let agg_req = json!({
+        "rangef64": {
+            "range": {
+                "field": "score_f64",
+                "ranges": [
+                    { "from": 3, "to": 7000 },
+                    { "from": 7000, "to": 20000 },
+                    { "from": 20000, "to": 30000 },
+                    { "from": 30000, "to": 40000 },
+                    { "from": 40000, "to": 50000 },
+                    { "from": 50000, "to": 60000 }
+                ]
+            },
+            "aggs": {
+                "my_texts": { "terms": { "field": "text_many_terms" } },
+            }
+        },
+    });
+    execute_agg(index, agg_req);
+}
+fn histogram(index: &Index) {
+    let agg_req = json!({
+        "rangef64": {
+            "histogram": {
+                "field": "score_f64",
+                "interval": 100 // 1000 buckets
+            },
+        }
+    });
+    execute_agg(index, agg_req);
+}
+fn histogram_hard_bounds(index: &Index) {
+    let agg_req = json!({
+        "rangef64": { "histogram": { "field": "score_f64", "interval": 100, "hard_bounds": { "min": 1000, "max": 300000 } } },
+    });
+    execute_agg(index, agg_req);
+}
+fn histogram_with_avg_sub_agg(index: &Index) {
+    let agg_req = json!({
+        "rangef64": {
+            "histogram": { "field": "score_f64", "interval": 100 },
+            "aggs": {
+                "average_f64": { "avg": { "field": "score_f64" } }
+            }
+        }
+    });
+    execute_agg(index, agg_req);
+}
+fn avg_and_range_with_avg_sub_agg(index: &Index) {
+    let agg_req = json!({
+        "rangef64": {
+            "range": {
+                "field": "score_f64",
+                "ranges": [
+                    { "from": 3, "to": 7000 },
+                    { "from": 7000, "to": 20000 },
+                    { "from": 20000, "to": 60000 }
+                ]
+            },
+            "aggs": {
+                "average_in_range": { "avg": { "field": "score" } }
+            }
+        },
+        "average": { "avg": { "field": "score" } }
+    });
+    execute_agg(index, agg_req);
+}
+
+#[derive(Clone, Copy, Hash, Default, Debug, PartialEq, Eq, PartialOrd, Ord)]
+enum Cardinality {
+    /// All documents contain exactly one value.
+    /// `Full` is the default for auto-detecting the Cardinality, since it is the most strict.
+    #[default]
+    Full = 0,
+    /// All documents contain at most one value.
+    OptionalDense = 1,
+    /// All documents may contain any number of values.
+    Multivalued = 2,
+    /// 1 / 20 documents has a value
+    OptionalSparse = 3,
+}
+
+fn get_collector(agg_req: Aggregations) -> AggregationCollector {
+    AggregationCollector::from_aggs(agg_req, Default::default())
+}
+
+fn get_test_index_bench(cardinality: Cardinality) -> tantivy::Result<Index> {
+    let mut schema_builder = Schema::builder();
+    let text_fieldtype = tantivy::schema::TextOptions::default()
+        .set_indexing_options(
+            TextFieldIndexing::default().set_index_option(IndexRecordOption::WithFreqs),
+        )
+        .set_stored();
+    let text_field = schema_builder.add_text_field("text", text_fieldtype);
+    let json_field = schema_builder.add_json_field("json", FAST);
+    let text_field_many_terms = schema_builder.add_text_field("text_many_terms", STRING | FAST);
+    let text_field_few_terms = schema_builder.add_text_field("text_few_terms", STRING | FAST);
+    let score_fieldtype = tantivy::schema::NumericOptions::default().set_fast();
+    let score_field = schema_builder.add_u64_field("score", score_fieldtype.clone());
+    let score_field_f64 = schema_builder.add_f64_field("score_f64", score_fieldtype.clone());
+    let score_field_i64 = schema_builder.add_i64_field("score_i64", score_fieldtype);
+    let index = Index::create_from_tempdir(schema_builder.build())?;
+    let few_terms_data = ["INFO", "ERROR", "WARN", "DEBUG"];
+
+    let lg_norm = rand_distr::LogNormal::new(2.996f64, 0.979f64).unwrap();
+
+    let many_terms_data = (0..150_000)
+        .map(|num| format!("author{num}"))
+        .collect::<Vec<_>>();
+    {
+        let mut rng = StdRng::from_seed([1u8; 32]);
+        let mut index_writer = index.writer_with_num_threads(1, 200_000_000)?;
+        // To make the different test cases comparable we just change one doc to force the
+        // cardinality
+        if cardinality == Cardinality::OptionalDense {
+            index_writer.add_document(doc!())?;
+        }
+        if cardinality == Cardinality::Multivalued {
+            index_writer.add_document(doc!(
+                json_field => json!({"mixed_type": 10.0}),
+                json_field => json!({"mixed_type": 10.0}),
+                text_field => "cool",
+                text_field => "cool",
+                text_field_many_terms => "cool",
+                text_field_many_terms => "cool",
+                text_field_few_terms => "cool",
+                text_field_few_terms => "cool",
+                score_field => 1u64,
+                score_field => 1u64,
+                score_field_f64 => lg_norm.sample(&mut rng),
+                score_field_f64 => lg_norm.sample(&mut rng),
+                score_field_i64 => 1i64,
+                score_field_i64 => 1i64,
+            ))?;
+        }
+        let mut doc_with_value = 1_000_000;
+        if cardinality == Cardinality::OptionalSparse {
+            doc_with_value /= 20;
+        }
+        let _val_max = 1_000_000.0;
+        for _ in 0..doc_with_value {
+            let val: f64 = rng.gen_range(0.0..1_000_000.0);
+            let json = if rng.gen_bool(0.1) {
+                // 10% are numeric values
+                json!({ "mixed_type": val })
+            } else {
+                json!({"mixed_type": many_terms_data.choose(&mut rng).unwrap().to_string()})
+            };
+            index_writer.add_document(doc!(
+                text_field => "cool",
+                json_field => json,
+                text_field_many_terms => many_terms_data.choose(&mut rng).unwrap().to_string(),
+                text_field_few_terms => few_terms_data.choose(&mut rng).unwrap().to_string(),
+                score_field => val as u64,
+                score_field_f64 => lg_norm.sample(&mut rng),
+                score_field_i64 => val as i64,
+            ))?;
+            if cardinality == Cardinality::OptionalSparse {
+                for _ in 0..20 {
+                    index_writer.add_document(doc!(text_field => "cool"))?;
+                }
+            }
+        }
+        // writing the segment
+        index_writer.commit()?;
+    }
+
+    Ok(index)
+}
--- a/benches/index-bench.rs
+++ b/benches/index-bench.rs
@@ -18,7 +18,7 @@ fn benchmark(
        benchmark_dynamic_json(b, input, schema, commit, parse_json)
    } else {
        _benchmark(b, input, schema, commit, parse_json, |schema, doc_json| {
-            TantivyDocument::parse_json(&schema, doc_json).unwrap()
+            TantivyDocument::parse_json(schema, doc_json).unwrap()
        })
    }
 }
@@ -90,8 +90,7 @@ fn benchmark_dynamic_json(
 ) {
    let json_field = schema.get_field("json").unwrap();
    _benchmark(b, input, schema, commit, parse_json, |_schema, doc_json| {
-        let json_val: serde_json::Map<String, serde_json::Value> =
-            serde_json::from_str(doc_json).unwrap();
+        let json_val: serde_json::Value = serde_json::from_str(doc_json).unwrap();
        tantivy::doc!(json_field=>json_val)
    })
 }
@@ -138,15 +137,16 @@ pub fn hdfs_index_benchmark(c: &mut Criterion) {
    for (prefix, schema, is_dynamic) in benches {
        for commit in [false, true] {
            let suffix = if commit { "with-commit" } else { "no-commit" };
-            for parse_json in [false] {
+            {
+                let parse_json = false;
                // for parse_json in [false, true] {
                let suffix = if parse_json {
-                    format!("{}-with-json-parsing", suffix)
+                    format!("{suffix}-with-json-parsing")
                } else {
-                    format!("{}", suffix)
+                    suffix.to_string()
                };

-                let bench_name = format!("{}{}", prefix, suffix);
+                let bench_name = format!("{prefix}{suffix}");
                group.bench_function(bench_name, |b| {
                    benchmark(b, HDFS_LOGS, schema.clone(), commit, parse_json, is_dynamic)
                });
--- a/bitpacker/Cargo.toml
+++ b/bitpacker/Cargo.toml
@@ -1,6 +1,6 @@
 [package]
 name = "tantivy-bitpacker"
-version = "0.5.0"
+version = "0.6.0"
 edition = "2021"
 authors = ["Paul Masurel <paul.masurel@gmail.com>"]
 license = "MIT"
--- a/cliff.toml
+++ b/cliff.toml
@@ -1,6 +1,10 @@
 # configuration file for git-cliff{ pattern = "foo", replace = "bar"}
 # see https://github.com/orhun/git-cliff#configuration-file

+[remote.github]
+owner = "quickwit-oss"
+repo = "tantivy"
+
 [changelog]
 # changelog header
 header = """
@@ -8,15 +12,43 @@ header = """
 # template for the changelog body
 # https://tera.netlify.app/docs/#introduction
 body = """
-{% if version %}\
-    {{ version | trim_start_matches(pat="v") }} ({{ timestamp | date(format="%Y-%m-%d") }})
-    ==================
-{% else %}\
-    ## [unreleased]
-{% endif %}\
+## What's Changed
+
+{%- if version %} in {{ version }}{%- endif -%}
 {% for commit in commits %}
-    - {% if commit.breaking %}[**breaking**] {% endif %}{{ commit.message | split(pat="\n") | first | trim | upper_first }}(@{{ commit.author.name }})\
-{% endfor %}
+  {% if commit.github.pr_title -%}
+    {%- set commit_message = commit.github.pr_title -%}
+  {%- else -%}
+    {%- set commit_message = commit.message -%}
+  {%- endif -%}
+  - {{ commit_message | split(pat="\n") | first | trim }}\
+    {% if commit.github.pr_number %} \
+      [#{{ commit.github.pr_number }}]({{ self::remote_url() }}/pull/{{ commit.github.pr_number }}){% if commit.github.username %}(@{{ commit.github.username }}){%- endif -%} \
+    {%- endif %}
+{%- endfor -%}
+
+{% if github.contributors | filter(attribute="is_first_time", value=true) | length != 0 %}
+  {% raw %}\n{% endraw -%}
+  ## New Contributors
+{%- endif %}\
+{% for contributor in github.contributors | filter(attribute="is_first_time", value=true) %}
+  * @{{ contributor.username }} made their first contribution
+    {%- if contributor.pr_number %} in \
+      [#{{ contributor.pr_number }}]({{ self::remote_url() }}/pull/{{ contributor.pr_number }}) \
+    {%- endif %}
+{%- endfor -%}
+
+{% if version %}
+    {% if previous.version %}
+      **Full Changelog**: {{ self::remote_url() }}/compare/{{ previous.version }}...{{ version }}
+    {% endif %}
+{% else -%}
+  {% raw %}\n{% endraw %}
+{% endif %}
+
+{%- macro remote_url() -%}
+  https://github.com/{{ remote.github.owner }}/{{ remote.github.repo }}
+{%- endmacro -%}
 """
 # remove the leading and trailing whitespace from the template
 trim = true
@@ -25,53 +57,24 @@ footer = """
 """

 postprocessors = [
-    { pattern = 'Paul Masurel', replace = "fulmicoton"}, # replace with github user
-    { pattern = 'PSeitz', replace = "PSeitz"}, # replace with github user
-    { pattern = 'Adam Reichold', replace = "adamreichold"}, # replace with github user
-    { pattern = 'trinity-1686a', replace = "trinity-1686a"}, # replace with github user
-    { pattern = 'Michael Kleen', replace = "mkleen"}, # replace with github user
-    { pattern = 'Adrien Guillo', replace = "guilload"}, # replace with github user
-    { pattern = 'François Massot', replace = "fmassot"}, # replace with github user
-    { pattern = 'Naveen Aiathurai', replace = "naveenann"}, # replace with github user
-    { pattern = '', replace = ""}, # replace with github user
 ]

 [git]
 # parse the commits based on https://www.conventionalcommits.org
 # This is required or commit.message contains the whole commit message and not just the title
-conventional_commits = true
+conventional_commits = false
 # filter out the commits that are not conventional
-filter_unconventional = false
+filter_unconventional = true
 # process each line of a commit as an individual commit
 split_commits = false
 # regex for preprocessing the commit messages
 commit_preprocessors = [
-    { pattern = '\((\w+\s)?#([0-9]+)\)', replace = "[#${2}](https://github.com/quickwit-oss/tantivy/issues/${2})"}, # replace issue numbers
+    { pattern = '\((\w+\s)?#([0-9]+)\)', replace = ""},
 ]
 #link_parsers = [
    #{ pattern = "#(\\d+)", href = "https://github.com/quickwit-oss/tantivy/pulls/$1"},
 #]
 # regex for parsing and grouping commits
-commit_parsers = [
-    { message = "^feat", group = "Features"},
-    { message = "^fix", group = "Bug Fixes"},
-    { message = "^doc", group = "Documentation"},
-    { message = "^perf", group = "Performance"},
-    { message = "^refactor", group = "Refactor"},
-    { message = "^style", group = "Styling"},
-    { message = "^test", group = "Testing"},
-    { message = "^chore\\(release\\): prepare for", skip = true},
-    { message = "(?i)clippy", skip = true},
-    { message = "(?i)dependabot", skip = true},
-    { message = "(?i)fmt", skip = true},
-    { message = "(?i)bump", skip = true},
-    { message = "(?i)readme", skip = true},
-    { message = "(?i)comment", skip = true},
-    { message = "(?i)spelling", skip = true},
-    { message = "^chore", group = "Miscellaneous Tasks"},
-    { body = ".*security", group = "Security"},
-    { message = ".*", group = "Other", default_scope = "other"},
-]
 # protect breaking changes from being skipped due to matching a skipping commit_parser
 protect_breaking_commits = false
 # filter out the commits that are not matched by commit parsers
--- a/columnar/Cargo.toml
+++ b/columnar/Cargo.toml
@@ -1,6 +1,6 @@
 [package]
 name = "tantivy-columnar"
-version = "0.2.0"
+version = "0.3.0"
 edition = "2021"
 license = "MIT"
 homepage = "https://github.com/quickwit-oss/tantivy"
@@ -9,13 +9,13 @@ description = "column oriented storage for tantivy"
 categories = ["database-implementations", "data-structures", "compression"]

 [dependencies]
-itertools = "0.12.0"
+itertools = "0.13.0"
 fastdivide = "0.4.0"

-stacker = { version= "0.2", path = "../stacker", package="tantivy-stacker"}
-sstable = { version= "0.2", path = "../sstable", package = "tantivy-sstable" }
-common = { version= "0.6", path = "../common", package = "tantivy-common" }
-tantivy-bitpacker = { version= "0.5", path = "../bitpacker/" }
+stacker = { version= "0.3", path = "../stacker", package="tantivy-stacker"}
+sstable = { version= "0.3", path = "../sstable", package = "tantivy-sstable" }
+common = { version= "0.7", path = "../common", package = "tantivy-common" }
+tantivy-bitpacker = { version= "0.6", path = "../bitpacker/" }
 serde = "1.0.152"
 downcast-rs = "1.2.0"

@@ -23,6 +23,12 @@ downcast-rs = "1.2.0"
 proptest = "1"
 more-asserts = "0.3.1"
 rand = "0.8"
+binggan = "0.8.1"
+
+[[bench]]
+name = "bench_merge"
+harness = false
+

 [features]
 unstable = []
--- a/columnar/benches/bench_merge.rs
+++ b/columnar/benches/bench_merge.rs
@@ -0,0 +1,97 @@
+#![feature(test)]
+extern crate test;
+
+use core::fmt;
+use std::fmt::{Display, Formatter};
+
+use binggan::{black_box, BenchRunner};
+use tantivy_columnar::*;
+
+enum Card {
+    Multi,
+    Sparse,
+    Dense,
+}
+impl Display for Card {
+    fn fmt(&self, f: &mut Formatter) -> fmt::Result {
+        match self {
+            Card::Multi => write!(f, "multi"),
+            Card::Sparse => write!(f, "sparse"),
+            Card::Dense => write!(f, "dense"),
+        }
+    }
+}
+
+const NUM_DOCS: u32 = 1_000_000;
+
+fn generate_columnar(card: Card, num_docs: u32) -> ColumnarReader {
+    use tantivy_columnar::ColumnarWriter;
+
+    let mut columnar_writer = ColumnarWriter::default();
+
+    match card {
+        Card::Multi => {
+            columnar_writer.record_numerical(0, "price", 10u64);
+            columnar_writer.record_numerical(0, "price", 10u64);
+        }
+        _ => {}
+    }
+
+    for i in 0..num_docs {
+        match card {
+            Card::Multi | Card::Sparse => {
+                if i % 8 == 0 {
+                    columnar_writer.record_numerical(i, "price", i as u64);
+                }
+            }
+            Card::Dense => {
+                if i % 6 == 0 {
+                    columnar_writer.record_numerical(i, "price", i as u64);
+                }
+            }
+        }
+    }
+
+    let mut wrt: Vec<u8> = Vec::new();
+    columnar_writer.serialize(num_docs, None, &mut wrt).unwrap();
+
+    ColumnarReader::open(wrt).unwrap()
+}
+fn main() {
+    let mut inputs = Vec::new();
+
+    let mut add_combo = |card1: Card, card2: Card| {
+        inputs.push((
+            format!("merge_{card1}_and_{card2}"),
+            vec![
+                generate_columnar(card1, NUM_DOCS),
+                generate_columnar(card2, NUM_DOCS),
+            ],
+        ));
+    };
+
+    add_combo(Card::Multi, Card::Multi);
+    add_combo(Card::Dense, Card::Dense);
+    add_combo(Card::Sparse, Card::Sparse);
+    add_combo(Card::Sparse, Card::Dense);
+    add_combo(Card::Multi, Card::Dense);
+    add_combo(Card::Multi, Card::Sparse);
+
+    let runner: BenchRunner = BenchRunner::new();
+    let mut group = runner.new_group();
+    for (input_name, columnar_readers) in inputs.iter() {
+        group.register_with_input(
+            input_name,
+            columnar_readers,
+            move |columnar_readers: &Vec<ColumnarReader>| {
+                let mut out = vec![];
+                let columnar_readers = columnar_readers.iter().collect::<Vec<_>>();
+                let merge_row_order = StackMergeOrder::stack(&columnar_readers[..]);
+
+                merge_columnar(&columnar_readers, &[], merge_row_order.into(), &mut out).unwrap();
+                black_box(out);
+            },
+        );
+    }
+    group.run();
+}
--- a/columnar/src/column_index/merge/mod.rs
+++ b/columnar/src/column_index/merge/mod.rs
@@ -73,14 +73,18 @@ fn detect_cardinality(
 pub fn merge_column_index<'a>(
    columns: &'a [ColumnIndex],
    merge_row_order: &'a MergeRowOrder,
+    num_values: u32,
 ) -> SerializableColumnIndex<'a> {
    // For simplification, we do not try to detect whether the cardinality could be
    // downgraded thanks to deletes.
    let cardinality_after_merge = detect_cardinality(columns, merge_row_order);
    match merge_row_order {
-        MergeRowOrder::Stack(stack_merge_order) => {
-            merge_column_index_stacked(columns, cardinality_after_merge, stack_merge_order)
-        }
+        MergeRowOrder::Stack(stack_merge_order) => merge_column_index_stacked(
+            columns,
+            cardinality_after_merge,
+            stack_merge_order,
+            num_values,
+        ),
        MergeRowOrder::Shuffled(complex_merge_order) => {
            merge_column_index_shuffled(columns, cardinality_after_merge, complex_merge_order)
        }
@@ -167,8 +171,12 @@ mod tests {
            ],
        )
        .into();
-        let merged_column_index = merge_column_index(&column_indexes[..], &merge_row_order);
-        let SerializableColumnIndex::Multivalued(start_index_iterable) = merged_column_index else {
+        let merged_column_index = merge_column_index(&column_indexes[..], &merge_row_order, 3);
+        let SerializableColumnIndex::Multivalued {
+            indices: start_index_iterable,
+            ..
+        } = merged_column_index
+        else {
            panic!("Excpected a multivalued index")
        };
        let start_indexes: Vec<RowId> = start_index_iterable.boxed_iter().collect();
@@ -200,8 +208,12 @@ mod tests {
            ],
        )
        .into();
-        let merged_column_index = merge_column_index(&column_indexes[..], &merge_row_order);
-        let SerializableColumnIndex::Multivalued(start_index_iterable) = merged_column_index else {
+        let merged_column_index = merge_column_index(&column_indexes[..], &merge_row_order, 6);
+        let SerializableColumnIndex::Multivalued {
+            indices: start_index_iterable,
+            ..
+        } = merged_column_index
+        else {
            panic!("Excpected a multivalued index")
        };
        let start_indexes: Vec<RowId> = start_index_iterable.boxed_iter().collect();
--- a/columnar/src/column_index/merge/shuffled.rs
+++ b/columnar/src/column_index/merge/shuffled.rs
@@ -22,7 +22,10 @@ pub fn merge_column_index_shuffled<'a>(
        Cardinality::Multivalued => {
            let multivalue_start_index =
                merge_column_index_shuffled_multivalued(column_indexes, shuffle_merge_order);
-            SerializableColumnIndex::Multivalued(multivalue_start_index)
+            SerializableColumnIndex::Multivalued {
+                indices: multivalue_start_index,
+                stats: None,
+            }
        }
    }
 }
@@ -140,7 +143,7 @@ mod tests {
    #[test]
    fn test_merge_column_index_optional_shuffle() {
        let optional_index: ColumnIndex = OptionalIndex::for_test(2, &[0]).into();
-        let column_indexes = vec![optional_index, ColumnIndex::Full];
+        let column_indexes = [optional_index, ColumnIndex::Full];
        let row_addrs = vec![
            RowAddr {
                segment_ord: 0u32,
--- a/columnar/src/column_index/merge/stacked.rs
+++ b/columnar/src/column_index/merge/stacked.rs
@@ -1,6 +1,8 @@
 use std::iter;
+use std::num::NonZeroU64;

 use crate::column_index::{SerializableColumnIndex, Set};
+use crate::column_values::ColumnStats;
 use crate::iterable::Iterable;
 use crate::{Cardinality, ColumnIndex, RowId, StackMergeOrder};

@@ -12,6 +14,7 @@ pub fn merge_column_index_stacked<'a>(
    columns: &'a [ColumnIndex],
    cardinality_after_merge: Cardinality,
    stack_merge_order: &'a StackMergeOrder,
+    num_values: u32,
 ) -> SerializableColumnIndex<'a> {
    match cardinality_after_merge {
        Cardinality::Full => SerializableColumnIndex::Full,
@@ -27,7 +30,17 @@ pub fn merge_column_index_stacked<'a>(
                columns,
                stack_merge_order,
            };
-            SerializableColumnIndex::Multivalued(Box::new(stacked_multivalued_index))
+            SerializableColumnIndex::Multivalued {
+                indices: Box::new(stacked_multivalued_index),
+                stats: Some(ColumnStats {
+                    gcd: NonZeroU64::new(1).unwrap(),
+                    // The values in the multivalue index are the positions of the values
+                    min_value: 0,
+                    max_value: num_values as u64,
+                    // This is num docs, but it starts at 0 so we need +1
+                    num_rows: stack_merge_order.num_rows() + 1,
+                }),
+            }
        }
    }
 }
--- a/columnar/src/column_index/multivalued_index.rs
+++ b/columnar/src/column_index/multivalued_index.rs
@@ -6,20 +6,29 @@ use std::sync::Arc;
 use common::OwnedBytes;

 use crate::column_values::{
-    load_u64_based_column_values, serialize_u64_based_column_values, CodecType, ColumnValues,
+    load_u64_based_column_values, serialize_u64_based_column_values,
+    serialize_u64_with_codec_and_stats, CodecType, ColumnStats, ColumnValues,
 };
 use crate::iterable::Iterable;
 use crate::{DocId, RowId};

 pub fn serialize_multivalued_index(
    multivalued_index: &dyn Iterable<RowId>,
+    stats: Option<ColumnStats>,
    output: &mut impl Write,
 ) -> io::Result<()> {
-    serialize_u64_based_column_values(
-        multivalued_index,
-        &[CodecType::Bitpacked, CodecType::Linear],
-        output,
-    )?;
+    if let Some(stats) = stats {
+        // TODO: Add something with higher compression that doesn't require a full scan upfront
+        let estimator = CodecType::Bitpacked.estimator();
+        assert!(!estimator.requires_full_scan());
+        serialize_u64_with_codec_and_stats(multivalued_index, estimator, stats, output)?;
+    } else {
+        serialize_u64_based_column_values(
+            multivalued_index,
+            &[CodecType::Bitpacked, CodecType::Linear],
+            output,
+        )?;
+    }
    Ok(())
 }

@@ -52,7 +61,7 @@ impl From<Arc<dyn ColumnValues<RowId>>> for MultiValueIndex {
 impl MultiValueIndex {
    pub fn for_test(start_offsets: &[RowId]) -> MultiValueIndex {
        let mut buffer = Vec::new();
-        serialize_multivalued_index(&start_offsets, &mut buffer).unwrap();
+        serialize_multivalued_index(&start_offsets, None, &mut buffer).unwrap();
        let bytes = OwnedBytes::new(buffer);
        open_multivalued_index(bytes).unwrap()
    }
--- a/columnar/src/column_index/optional_index/mod.rs
+++ b/columnar/src/column_index/optional_index/mod.rs
@@ -196,6 +196,7 @@ impl Set<RowId> for OptionalIndex {
        } = row_addr_from_row_id(doc_id);
        let block_meta = self.block_metas[block_id as usize];
        let block = self.block(block_meta);
+
        let block_offset_row_id = match block {
            Block::Dense(dense_block) => dense_block.rank(in_block_row_id),
            Block::Sparse(sparse_block) => sparse_block.rank(in_block_row_id),
--- a/columnar/src/column_index/serialize.rs
+++ b/columnar/src/column_index/serialize.rs
@@ -6,6 +6,7 @@ use common::{CountingWriter, OwnedBytes};
 use crate::column_index::multivalued_index::serialize_multivalued_index;
 use crate::column_index::optional_index::serialize_optional_index;
 use crate::column_index::ColumnIndex;
+use crate::column_values::ColumnStats;
 use crate::iterable::Iterable;
 use crate::{Cardinality, RowId};

@@ -15,9 +16,12 @@ pub enum SerializableColumnIndex<'a> {
        non_null_row_ids: Box<dyn Iterable<RowId> + 'a>,
        num_rows: RowId,
    },
-    // TODO remove the Arc<dyn> apart from serialization this is not
-    // dynamic at all.
-    Multivalued(Box<dyn Iterable<RowId> + 'a>),
+    Multivalued {
+        /// Iterator emitting the indices for the index
+        indices: Box<dyn Iterable<RowId> + 'a>,
+        /// In the merge case we can precompute the column stats
+        stats: Option<ColumnStats>,
+    },
 }

 impl<'a> SerializableColumnIndex<'a> {
@@ -25,7 +29,7 @@ impl<'a> SerializableColumnIndex<'a> {
        match self {
            SerializableColumnIndex::Full => Cardinality::Full,
            SerializableColumnIndex::Optional { .. } => Cardinality::Optional,
-            SerializableColumnIndex::Multivalued(_) => Cardinality::Multivalued,
+            SerializableColumnIndex::Multivalued { .. } => Cardinality::Multivalued,
        }
    }
 }
@@ -44,9 +48,10 @@ pub fn serialize_column_index(
            non_null_row_ids,
            num_rows,
        } => serialize_optional_index(non_null_row_ids.as_ref(), num_rows, &mut output)?,
-        SerializableColumnIndex::Multivalued(multivalued_index) => {
-            serialize_multivalued_index(&*multivalued_index, &mut output)?
-        }
+        SerializableColumnIndex::Multivalued {
+            indices: multivalued_index,
+            stats,
+        } => serialize_multivalued_index(&*multivalued_index, stats, &mut output)?,
    }
    let column_index_num_bytes = output.written_bytes() as u32;
    Ok(column_index_num_bytes)
--- a/columnar/src/column_values/mod.rs
+++ b/columnar/src/column_values/mod.rs
@@ -32,7 +32,8 @@ pub use u128_based::{
 };
 pub use u64_based::{
    load_u64_based_column_values, serialize_and_load_u64_based_column_values,
-    serialize_u64_based_column_values, CodecType, ALL_U64_CODEC_TYPES,
+    serialize_u64_based_column_values, serialize_u64_with_codec_and_stats, CodecType,
+    ALL_U64_CODEC_TYPES,
 };
 pub use vec_column::VecColumn;

@@ -75,7 +76,7 @@ pub trait ColumnValues<T: PartialOrd = u64>: Send + Sync + DowncastSync {
        let out_and_idx_chunks = output
            .chunks_exact_mut(4)
            .into_remainder()
-            .into_iter()
+            .iter_mut()
            .zip(indexes.chunks_exact(4).remainder());
        for (out, idx) in out_and_idx_chunks {
            *out = self.get_val(*idx);
@@ -102,7 +103,7 @@ pub trait ColumnValues<T: PartialOrd = u64>: Send + Sync + DowncastSync {
        let out_and_idx_chunks = output
            .chunks_exact_mut(4)
            .into_remainder()
-            .into_iter()
+            .iter_mut()
            .zip(indexes.chunks_exact(4).remainder());
        for (out, idx) in out_and_idx_chunks {
            *out = Some(self.get_val(*idx));
--- a/columnar/src/column_values/u128_based/compact_space/mod.rs
+++ b/columnar/src/column_values/u128_based/compact_space/mod.rs
@@ -148,7 +148,7 @@ impl CompactSpace {
            .binary_search_by_key(&compact, |range_mapping| range_mapping.compact_start)
            // Correctness: Overflow. The first range starts at compact space 0, the error from
            // binary search can never be 0
-            .map_or_else(|e| e - 1, |v| v);
+            .unwrap_or_else(|e| e - 1);

        let range_mapping = &self.ranges_mapping[pos];
        let diff = compact - range_mapping.compact_start;
--- a/columnar/src/column_values/u64_based/bitpacked.rs
+++ b/columnar/src/column_values/u64_based/bitpacked.rs
@@ -128,6 +128,9 @@ impl ColumnCodecEstimator for BitpackedCodecEstimator {
        bit_packer.close(wrt)?;
        Ok(())
    }
+    fn codec_type(&self) -> super::CodecType {
+        super::CodecType::Bitpacked
+    }
 }

 pub struct BitpackedCodec;
--- a/columnar/src/column_values/u64_based/blockwise_linear.rs
+++ b/columnar/src/column_values/u64_based/blockwise_linear.rs
@@ -163,6 +163,10 @@ impl ColumnCodecEstimator for BlockwiseLinearEstimator {

        Ok(())
    }
+
+    fn codec_type(&self) -> super::CodecType {
+        super::CodecType::BlockwiseLinear
+    }
 }

 pub struct BlockwiseLinearCodec;
--- a/columnar/src/column_values/u64_based/linear.rs
+++ b/columnar/src/column_values/u64_based/linear.rs
@@ -153,6 +153,12 @@ impl ColumnCodecEstimator for LinearCodecEstimator {
            self.collect_before_line_estimation(value);
        }
    }
+    fn requires_full_scan(&self) -> bool {
+        true
+    }
+    fn codec_type(&self) -> super::CodecType {
+        super::CodecType::Linear
+    }
 }

 impl LinearCodecEstimator {
--- a/columnar/src/column_values/u64_based/mod.rs
+++ b/columnar/src/column_values/u64_based/mod.rs
@@ -37,7 +37,11 @@ pub trait ColumnCodecEstimator<T = u64>: 'static {
    /// This method will be called for each element of the column during
    /// `estimation`.
    fn collect(&mut self, value: u64);
-    /// Finalizes the first pass phase.
+    /// Returns true if the estimator needs a full pass over the column before serialization
+    fn requires_full_scan(&self) -> bool {
+        false
+    }
+    fn codec_type(&self) -> CodecType;
    fn finalize(&mut self) {}
    /// Returns an accurate estimation of the number of bytes that will
    /// be used to represent this column.
@@ -150,34 +154,45 @@ pub fn serialize_u64_based_column_values<T: MonotonicallyMappableToU64>(
    wrt: &mut dyn Write,
 ) -> io::Result<()> {
    let mut stats_collector = StatsCollector::default();
-    let mut estimators: Vec<(CodecType, Box<dyn ColumnCodecEstimator>)> =
-        Vec::with_capacity(codec_types.len());
+    let mut estimators: Vec<Box<dyn ColumnCodecEstimator>> = Vec::with_capacity(codec_types.len());
    for &codec_type in codec_types {
-        estimators.push((codec_type, codec_type.estimator()));
+        estimators.push(codec_type.estimator());
    }
    for val in vals.boxed_iter() {
        let val_u64 = val.to_u64();
        stats_collector.collect(val_u64);
-        for (_, estimator) in &mut estimators {
+        for estimator in &mut estimators {
            estimator.collect(val_u64);
        }
    }
-    for (_, estimator) in &mut estimators {
+    for estimator in &mut estimators {
        estimator.finalize();
    }
    let stats = stats_collector.stats();
-    let (_, best_codec, best_codec_estimator) = estimators
+    let (_, best_codec) = estimators
        .into_iter()
-        .flat_map(|(codec_type, estimator)| {
+        .flat_map(|estimator| {
            let num_bytes = estimator.estimate(&stats)?;
-            Some((num_bytes, codec_type, estimator))
+            Some((num_bytes, estimator))
        })
-        .min_by_key(|(num_bytes, _, _)| *num_bytes)
+        .min_by_key(|(num_bytes, _)| *num_bytes)
        .ok_or_else(|| {
            io::Error::new(io::ErrorKind::InvalidData, "No available applicable codec.")
        })?;
-    best_codec.to_code().serialize(wrt)?;
-    best_codec_estimator.serialize(
+    serialize_u64_with_codec_and_stats(vals, best_codec, stats, wrt)?;
+    Ok(())
+}
+
+/// Serializes a given column of u64-mapped values.
+/// The codec estimator needs to be collected fully for the Line codec before calling this.
+pub fn serialize_u64_with_codec_and_stats<T: MonotonicallyMappableToU64>(
+    vals: &dyn Iterable<T>,
+    codec: Box<dyn ColumnCodecEstimator>,
+    stats: ColumnStats,
+    wrt: &mut dyn Write,
+) -> io::Result<()> {
+    codec.codec_type().to_code().serialize(wrt)?;
+    codec.serialize(
        &stats,
        &mut vals.boxed_iter().map(MonotonicallyMappableToU64::to_u64),
        wrt,
--- a/columnar/src/columnar/merge/mod.rs
+++ b/columnar/src/columnar/merge/mod.rs
@@ -3,7 +3,7 @@ mod merge_mapping;
 mod term_merger;

 use std::collections::{BTreeMap, HashSet};
-use std::io;
+use std::io::{self};
 use std::net::Ipv6Addr;
 use std::sync::Arc;

@@ -156,8 +156,15 @@ fn merge_column(
                    column_values.push(None);
                }
            }
-            let merged_column_index =
-                crate::column_index::merge_column_index(&column_indexes[..], merge_row_order);
+            let num_values: u32 = column_values
+                .iter()
+                .map(|vals| vals.as_ref().map(|idx| idx.num_vals()).unwrap_or(0))
+                .sum();
+            let merged_column_index = crate::column_index::merge_column_index(
+                &column_indexes[..],
+                merge_row_order,
+                num_values,
+            );
            let merge_column_values = MergedColumnValues {
                column_indexes: &column_indexes[..],
                column_values: &column_values[..],
@@ -183,8 +190,15 @@ fn merge_column(
                }
            }

-            let merged_column_index =
-                crate::column_index::merge_column_index(&column_indexes[..], merge_row_order);
+            let num_values: u32 = column_values
+                .iter()
+                .map(|vals| vals.as_ref().map(|idx| idx.num_vals()).unwrap_or(0))
+                .sum();
+            let merged_column_index = crate::column_index::merge_column_index(
+                &column_indexes[..],
+                merge_row_order,
+                num_values,
+            );
            let merge_column_values = MergedColumnValues {
                column_indexes: &column_indexes[..],
                column_values: &column_values,
@@ -214,8 +228,19 @@ fn merge_column(
                    }
                }
            }
-            let merged_column_index =
-                crate::column_index::merge_column_index(&column_indexes[..], merge_row_order);
+            let num_values: u32 = bytes_columns
+                .iter()
+                .map(|vals| {
+                    vals.as_ref()
+                        .map(|idx| idx.term_ord_column.values.num_vals())
+                        .unwrap_or(0)
+                })
+                .sum();
+            let merged_column_index = crate::column_index::merge_column_index(
+                &column_indexes[..],
+                merge_row_order,
+                num_values,
+            );
            merge_bytes_or_str_column(merged_column_index, &bytes_columns, merge_row_order, wrt)?;
        }
    }
--- a/columnar/src/columnar/writer/mod.rs
+++ b/columnar/src/columnar/writer/mod.rs
@@ -59,22 +59,6 @@ pub struct ColumnarWriter {
    buffers: SpareBuffers,
 }

-#[inline]
-fn mutate_or_create_column<V, TMutator>(
-    arena_hash_map: &mut ArenaHashMap,
-    column_name: &str,
-    updater: TMutator,
-) where
-    V: Copy + 'static,
-    TMutator: FnMut(Option<V>) -> V,
-{
-    assert!(
-        !column_name.as_bytes().contains(&0u8),
-        "key may not contain the 0 byte"
-    );
-    arena_hash_map.mutate_or_create(column_name.as_bytes(), updater);
-}
-
 impl ColumnarWriter {
    pub fn mem_usage(&self) -> usize {
        self.arena.mem_usage()
@@ -175,9 +159,8 @@ impl ColumnarWriter {
                    },
                    &mut self.dictionaries,
                );
-                mutate_or_create_column(
-                    hash_map,
-                    column_name,
+                hash_map.mutate_or_create(
+                    column_name.as_bytes(),
                    |column_opt: Option<StrOrBytesColumnWriter>| {
                        let mut column_writer = if let Some(column_writer) = column_opt {
                            column_writer
@@ -192,24 +175,21 @@ impl ColumnarWriter {
                );
            }
            ColumnType::Bool => {
-                mutate_or_create_column(
-                    &mut self.bool_field_hash_map,
-                    column_name,
+                self.bool_field_hash_map.mutate_or_create(
+                    column_name.as_bytes(),
                    |column_opt: Option<ColumnWriter>| column_opt.unwrap_or_default(),
                );
            }
            ColumnType::DateTime => {
-                mutate_or_create_column(
-                    &mut self.datetime_field_hash_map,
-                    column_name,
+                self.datetime_field_hash_map.mutate_or_create(
+                    column_name.as_bytes(),
                    |column_opt: Option<ColumnWriter>| column_opt.unwrap_or_default(),
                );
            }
            ColumnType::I64 | ColumnType::F64 | ColumnType::U64 => {
                let numerical_type = column_type.numerical_type().unwrap();
-                mutate_or_create_column(
-                    &mut self.numerical_field_hash_map,
-                    column_name,
+                self.numerical_field_hash_map.mutate_or_create(
+                    column_name.as_bytes(),
                    |column_opt: Option<NumericalColumnWriter>| {
                        let mut column: NumericalColumnWriter = column_opt.unwrap_or_default();
                        column.force_numerical_type(numerical_type);
@@ -217,9 +197,8 @@ impl ColumnarWriter {
                    },
                );
            }
-            ColumnType::IpAddr => mutate_or_create_column(
-                &mut self.ip_addr_field_hash_map,
-                column_name,
+            ColumnType::IpAddr => self.ip_addr_field_hash_map.mutate_or_create(
+                column_name.as_bytes(),
                |column_opt: Option<ColumnWriter>| column_opt.unwrap_or_default(),
            ),
        }
@@ -232,9 +211,8 @@ impl ColumnarWriter {
        numerical_value: T,
    ) {
        let (hash_map, arena) = (&mut self.numerical_field_hash_map, &mut self.arena);
-        mutate_or_create_column(
-            hash_map,
-            column_name,
+        hash_map.mutate_or_create(
+            column_name.as_bytes(),
            |column_opt: Option<NumericalColumnWriter>| {
                let mut column: NumericalColumnWriter = column_opt.unwrap_or_default();
                column.record_numerical_value(doc, numerical_value.into(), arena);
@@ -244,10 +222,6 @@ impl ColumnarWriter {
    }

    pub fn record_ip_addr(&mut self, doc: RowId, column_name: &str, ip_addr: Ipv6Addr) {
-        assert!(
-            !column_name.as_bytes().contains(&0u8),
-            "key may not contain the 0 byte"
-        );
        let (hash_map, arena) = (&mut self.ip_addr_field_hash_map, &mut self.arena);
        hash_map.mutate_or_create(
            column_name.as_bytes(),
@@ -261,24 +235,30 @@ impl ColumnarWriter {

    pub fn record_bool(&mut self, doc: RowId, column_name: &str, val: bool) {
        let (hash_map, arena) = (&mut self.bool_field_hash_map, &mut self.arena);
-        mutate_or_create_column(hash_map, column_name, |column_opt: Option<ColumnWriter>| {
-            let mut column: ColumnWriter = column_opt.unwrap_or_default();
-            column.record(doc, val, arena);
-            column
-        });
+        hash_map.mutate_or_create(
+            column_name.as_bytes(),
+            |column_opt: Option<ColumnWriter>| {
+                let mut column: ColumnWriter = column_opt.unwrap_or_default();
+                column.record(doc, val, arena);
+                column
+            },
+        );
    }

    pub fn record_datetime(&mut self, doc: RowId, column_name: &str, datetime: common::DateTime) {
        let (hash_map, arena) = (&mut self.datetime_field_hash_map, &mut self.arena);
-        mutate_or_create_column(hash_map, column_name, |column_opt: Option<ColumnWriter>| {
-            let mut column: ColumnWriter = column_opt.unwrap_or_default();
-            column.record(
-                doc,
-                NumericalValue::I64(datetime.into_timestamp_nanos()),
-                arena,
-            );
-            column
-        });
+        hash_map.mutate_or_create(
+            column_name.as_bytes(),
+            |column_opt: Option<ColumnWriter>| {
+                let mut column: ColumnWriter = column_opt.unwrap_or_default();
+                column.record(
+                    doc,
+                    NumericalValue::I64(datetime.into_timestamp_nanos()),
+                    arena,
+                );
+                column
+            },
+        );
    }

    pub fn record_str(&mut self, doc: RowId, column_name: &str, value: &str) {
@@ -303,10 +283,6 @@ impl ColumnarWriter {
    }

    pub fn record_bytes(&mut self, doc: RowId, column_name: &str, value: &[u8]) {
-        assert!(
-            !column_name.as_bytes().contains(&0u8),
-            "key may not contain the 0 byte"
-        );
        let (hash_map, arena, dictionaries) = (
            &mut self.bytes_field_hash_map,
            &mut self.arena,
@@ -668,7 +644,10 @@ fn send_to_serialize_column_mappable_to_u128<
            let multivalued_index_builder = value_index_builders.borrow_multivalued_index_builder();
            consume_operation_iterator(op_iterator, multivalued_index_builder, values);
            let multivalued_index = multivalued_index_builder.finish(num_rows);
-            SerializableColumnIndex::Multivalued(Box::new(multivalued_index))
+            SerializableColumnIndex::Multivalued {
+                indices: Box::new(multivalued_index),
+                stats: Default::default(), // TODO: implement stats for u128
+            }
        }
    };
    crate::column::serialize_column_mappable_to_u128(
@@ -723,7 +702,10 @@ fn send_to_serialize_column_mappable_to_u64(
            if sort_values_within_row {
                sort_values_within_row_in_place(multivalued_index, values);
            }
-            SerializableColumnIndex::Multivalued(Box::new(multivalued_index))
+            SerializableColumnIndex::Multivalued {
+                indices: Box::new(multivalued_index),
+                stats: None,
+            }
        }
    };
    crate::column::serialize_column_mappable_to_u64(
--- a/columnar/src/columnar/writer/serializer.rs
+++ b/columnar/src/columnar/writer/serializer.rs
@@ -18,7 +18,12 @@ pub struct ColumnarSerializer<W: io::Write> {
 /// code.
 fn prepare_key(key: &[u8], column_type: ColumnType, buffer: &mut Vec<u8>) {
    buffer.clear();
-    buffer.extend_from_slice(key);
+    // Convert 0 bytes to '0' string, as 0 bytes are reserved for the end of the path.
+    if key.contains(&0u8) {
+        buffer.extend(key.iter().map(|&b| if b == 0 { b'0' } else { b }));
+    } else {
+        buffer.extend_from_slice(key);
+    }
    buffer.push(0u8);
    buffer.push(column_type.to_code());
 }
@@ -102,7 +107,7 @@ mod tests {
        let mut buffer: Vec<u8> = b"somegarbage".to_vec();
        prepare_key(b"root\0child", ColumnType::Str, &mut buffer);
        assert_eq!(buffer.len(), 12);
-        assert_eq!(&buffer[..10], b"root\0child");
+        assert_eq!(&buffer[..10], b"root0child");
        assert_eq!(buffer[10], 0u8);
        assert_eq!(buffer[11], ColumnType::Str.to_code());
    }
--- a/columnar/src/tests.rs
+++ b/columnar/src/tests.rs
@@ -738,35 +738,22 @@ proptest! {
    #![proptest_config(ProptestConfig::with_cases(1000))]
    #[test]
    fn test_columnar_merge_proptest(columnar_docs in proptest::collection::vec(columnar_docs_strategy(), 2..=3)) {
-        let columnar_readers: Vec<ColumnarReader> = columnar_docs.iter()
-            .map(|docs| build_columnar(&docs[..]))
-            .collect::<Vec<_>>();
-        let columnar_readers_arr: Vec<&ColumnarReader> = columnar_readers.iter().collect();
-        let mut output: Vec<u8> = Vec::new();
-        let stack_merge_order = StackMergeOrder::stack(&columnar_readers_arr[..]).into();
-        crate::merge_columnar(&columnar_readers_arr[..], &[], stack_merge_order, &mut output).unwrap();
-        let merged_columnar = ColumnarReader::open(output).unwrap();
-        let concat_rows: Vec<Vec<(&'static str, ColumnValue)>> = columnar_docs.iter().flatten().cloned().collect();
-        let expected_merged_columnar = build_columnar(&concat_rows[..]);
-        assert_columnar_eq_strict(&merged_columnar, &expected_merged_columnar);
+        test_columnar_docs(columnar_docs);
    }
 }

-#[test]
-fn test_columnar_merging_empty_columnar() {
-    let columnar_docs: Vec<Vec<Vec<(&str, ColumnValue)>>> =
-        vec![vec![], vec![vec![("c1", ColumnValue::Str("a"))]]];
+fn test_columnar_docs(columnar_docs: Vec<Vec<Vec<(&'static str, ColumnValue)>>>) {
    let columnar_readers: Vec<ColumnarReader> = columnar_docs
        .iter()
        .map(|docs| build_columnar(&docs[..]))
        .collect::<Vec<_>>();
    let columnar_readers_arr: Vec<&ColumnarReader> = columnar_readers.iter().collect();
    let mut output: Vec<u8> = Vec::new();
-    let stack_merge_order = StackMergeOrder::stack(&columnar_readers_arr[..]);
+    let stack_merge_order = StackMergeOrder::stack(&columnar_readers_arr[..]).into();
    crate::merge_columnar(
        &columnar_readers_arr[..],
        &[],
-        crate::MergeRowOrder::Stack(stack_merge_order),
+        stack_merge_order,
        &mut output,
    )
    .unwrap();
@@ -777,6 +764,24 @@ fn test_columnar_merging_empty_columnar() {
    assert_columnar_eq_strict(&merged_columnar, &expected_merged_columnar);
 }

+#[test]
+fn test_columnar_merging_empty_columnar() {
+    let columnar_docs: Vec<Vec<Vec<(&str, ColumnValue)>>> =
+        vec![vec![], vec![vec![("c1", ColumnValue::Str("a"))]]];
+    test_columnar_docs(columnar_docs);
+}
+#[test]
+fn test_columnar_merging_simple() {
+    let columnar_docs: Vec<Vec<Vec<(&str, ColumnValue)>>> = vec![
+        vec![],
+        vec![vec![
+            ("c1", ColumnValue::Numerical(0u64.into())),
+            ("c1", ColumnValue::Numerical(0u64.into())),
+        ]],
+    ];
+    test_columnar_docs(columnar_docs);
+}
+
 #[test]
 fn test_columnar_merging_number_columns() {
    let columnar_docs: Vec<Vec<Vec<(&str, ColumnValue)>>> = vec![
@@ -793,25 +798,7 @@ fn test_columnar_merging_number_columns() {
            vec![("c2", ColumnValue::Numerical(u64::MAX.into()))],
        ],
    ];
-    let columnar_readers: Vec<ColumnarReader> = columnar_docs
-        .iter()
-        .map(|docs| build_columnar(&docs[..]))
-        .collect::<Vec<_>>();
-    let columnar_readers_arr: Vec<&ColumnarReader> = columnar_readers.iter().collect();
-    let mut output: Vec<u8> = Vec::new();
-    let stack_merge_order = StackMergeOrder::stack(&columnar_readers_arr[..]);
-    crate::merge_columnar(
-        &columnar_readers_arr[..],
-        &[],
-        crate::MergeRowOrder::Stack(stack_merge_order),
-        &mut output,
-    )
-    .unwrap();
-    let merged_columnar = ColumnarReader::open(output).unwrap();
-    let concat_rows: Vec<Vec<(&'static str, ColumnValue)>> =
-        columnar_docs.iter().flatten().cloned().collect();
-    let expected_merged_columnar = build_columnar(&concat_rows[..]);
-    assert_columnar_eq_strict(&merged_columnar, &expected_merged_columnar);
+    test_columnar_docs(columnar_docs);
 }

 // TODO add non trivial remap and merge
--- a/common/Cargo.toml
+++ b/common/Cargo.toml
@@ -1,6 +1,6 @@
 [package]
 name = "tantivy-common"
-version = "0.6.0"
+version = "0.7.0"
 authors = ["Paul Masurel <paul@quickwit.io>", "Pascal Seitz <pascal@quickwit.io>"]
 license = "MIT"
 edition = "2021"
@@ -14,7 +14,7 @@ repository = "https://github.com/quickwit-oss/tantivy"

 [dependencies]
 byteorder = "1.4.3"
-ownedbytes = { version= "0.6", path="../ownedbytes" }
+ownedbytes = { version= "0.7", path="../ownedbytes" }
 async-trait = "0.1"
 time = { version = "0.3.10", features = ["serde-well-known"] }
 serde = { version = "1.0.136", features = ["derive"] }
@@ -22,3 +22,6 @@ serde = { version = "1.0.136", features = ["derive"] }
 [dev-dependencies]
 proptest = "1.0.0"
 rand = "0.8.4"
+
+[features]
+unstable = [] # useful for benches.
--- a/common/src/bitset.rs
+++ b/common/src/bitset.rs
@@ -1,5 +1,5 @@
 use std::io::Write;
-use std::{fmt, io, u64};
+use std::{fmt, io};

 use ownedbytes::OwnedBytes;

--- a/common/src/datetime.rs
+++ b/common/src/datetime.rs
@@ -1,5 +1,3 @@
-#![allow(deprecated)]
-
 use std::fmt;
 use std::io::{Read, Write};

@@ -27,9 +25,6 @@ pub enum DateTimePrecision {
    Nanoseconds,
 }

-#[deprecated(since = "0.20.0", note = "Use `DateTimePrecision` instead")]
-pub type DatePrecision = DateTimePrecision;
-
 /// A date/time value with nanoseconds precision.
 ///
 /// This timestamp does not carry any explicit time zone information.
@@ -40,7 +35,7 @@ pub type DatePrecision = DateTimePrecision;
 /// All constructors and conversions are provided as explicit
 /// functions and not by implementing any `From`/`Into` traits
 /// to prevent unintended usage.
-#[derive(Clone, Default, Copy, PartialEq, Eq, PartialOrd, Ord, Hash)]
+#[derive(Clone, Default, Copy, PartialEq, Eq, PartialOrd, Ord, Hash, Serialize, Deserialize)]
 pub struct DateTime {
    // Timestamp in nanoseconds.
    pub(crate) timestamp_nanos: i64,
--- a/common/src/json_path_writer.rs
+++ b/common/src/json_path_writer.rs
@@ -5,6 +5,12 @@ pub const JSON_PATH_SEGMENT_SEP: u8 = 1u8;
 pub const JSON_PATH_SEGMENT_SEP_STR: &str =
    unsafe { std::str::from_utf8_unchecked(&[JSON_PATH_SEGMENT_SEP]) };

+/// Separates the json path and the value in
+/// a JSON term binary representation.
+pub const JSON_END_OF_PATH: u8 = 0u8;
+pub const JSON_END_OF_PATH_STR: &str =
+    unsafe { std::str::from_utf8_unchecked(&[JSON_END_OF_PATH]) };
+
 /// Create a new JsonPathWriter, that creates flattened json paths for tantivy.
 #[derive(Clone, Debug, Default)]
 pub struct JsonPathWriter {
@@ -14,6 +20,14 @@ pub struct JsonPathWriter {
 }

 impl JsonPathWriter {
+    pub fn with_expand_dots(expand_dots: bool) -> Self {
+        JsonPathWriter {
+            path: String::new(),
+            indices: Vec::new(),
+            expand_dots,
+        }
+    }
+
    pub fn new() -> Self {
        JsonPathWriter {
            path: String::new(),
@@ -39,8 +53,8 @@ impl JsonPathWriter {
    pub fn push(&mut self, segment: &str) {
        let len_path = self.path.len();
        self.indices.push(len_path);
-        if !self.path.is_empty() {
-            self.path.push_str(JSON_PATH_SEGMENT_SEP_STR);
+        if self.indices.len() > 1 {
+            self.path.push(JSON_PATH_SEGMENT_SEP as char);
        }
        self.path.push_str(segment);
        if self.expand_dots {
@@ -55,6 +69,12 @@ impl JsonPathWriter {
        }
    }

+    /// Set the end of JSON path marker.
+    #[inline]
+    pub fn set_end(&mut self) {
+        self.path.push_str(JSON_END_OF_PATH_STR);
+    }
+
    /// Remove the last segment. Does nothing if the path is empty.
    #[inline]
    pub fn pop(&mut self) {
@@ -91,6 +111,7 @@ mod tests {
    #[test]
    fn json_path_writer_test() {
        let mut writer = JsonPathWriter::new();
+        writer.set_expand_dots(false);

        writer.push("root");
        assert_eq!(writer.as_str(), "root");
@@ -109,4 +130,15 @@ mod tests {
        writer.push("k8s.node.id");
        assert_eq!(writer.as_str(), "root\u{1}k8s\u{1}node\u{1}id");
    }
+
+    #[test]
+    fn test_json_path_expand_dots_enabled_pop_segment() {
+        let mut json_writer = JsonPathWriter::with_expand_dots(true);
+        json_writer.push("hello");
+        assert_eq!(json_writer.as_str(), "hello");
+        json_writer.push("color.hue");
+        assert_eq!(json_writer.as_str(), "hello\x01color\x01hue");
+        json_writer.pop();
+        assert_eq!(json_writer.as_str(), "hello");
+    }
 }
--- a/common/src/lib.rs
+++ b/common/src/lib.rs
@@ -9,14 +9,12 @@ mod byte_count;
 mod datetime;
 pub mod file_slice;
 mod group_by;
-mod json_path_writer;
+pub mod json_path_writer;
 mod serialize;
 mod vint;
 mod writer;
 pub use bitset::*;
 pub use byte_count::ByteCount;
-#[allow(deprecated)]
-pub use datetime::DatePrecision;
 pub use datetime::{DateTime, DateTimePrecision};
 pub use group_by::GroupByIteratorExtended;
 pub use json_path_writer::JsonPathWriter;
--- a/common/src/serialize.rs
+++ b/common/src/serialize.rs
@@ -290,8 +290,7 @@ impl<'a> BinarySerializable for Cow<'a, [u8]> {
 #[cfg(test)]
 pub mod test {

-    use super::{VInt, *};
-    use crate::serialize::BinarySerializable;
+    use super::*;
    pub fn fixed_size_test<O: BinarySerializable + FixedSize + Default>() {
        let mut buffer = Vec::new();
        O::default().serialize(&mut buffer).unwrap();
--- a/common/src/vint.rs
+++ b/common/src/vint.rs
@@ -151,7 +151,7 @@ pub fn read_u32_vint_no_advance(data: &[u8]) -> (u32, usize) {
    (result, vlen)
 }
 /// Write a `u32` as a vint payload.
-pub fn write_u32_vint<W: io::Write>(val: u32, writer: &mut W) -> io::Result<()> {
+pub fn write_u32_vint<W: io::Write + ?Sized>(val: u32, writer: &mut W) -> io::Result<()> {
    let mut buf = [0u8; 8];
    let data = serialize_vint_u32(val, &mut buf);
    writer.write_all(data)
--- a/examples/custom_collector.rs
+++ b/examples/custom_collector.rs
@@ -11,9 +11,10 @@ use columnar::Column;
 // ---
 // Importing tantivy...
 use tantivy::collector::{Collector, SegmentCollector};
+use tantivy::index::SegmentReader;
 use tantivy::query::QueryParser;
 use tantivy::schema::{Schema, FAST, INDEXED, TEXT};
-use tantivy::{doc, Index, IndexWriter, Score, SegmentReader};
+use tantivy::{doc, Index, IndexWriter, Score};

 #[derive(Default)]
 struct Stats {
--- a/examples/date_time_field.rs
+++ b/examples/date_time_field.rs
@@ -4,7 +4,7 @@

 use tantivy::collector::TopDocs;
 use tantivy::query::QueryParser;
-use tantivy::schema::{DateOptions, Document, OwnedValue, Schema, INDEXED, STORED, STRING};
+use tantivy::schema::{DateOptions, Document, Schema, Value, INDEXED, STORED, STRING};
 use tantivy::{Index, IndexWriter, TantivyDocument};

 fn main() -> tantivy::Result<()> {
@@ -13,7 +13,7 @@ fn main() -> tantivy::Result<()> {
    let opts = DateOptions::from(INDEXED)
        .set_stored()
        .set_fast()
-        .set_precision(tantivy::DateTimePrecision::Seconds);
+        .set_precision(tantivy::schema::DateTimePrecision::Seconds);
    // Add `occurred_at` date field type
    let occurred_at = schema_builder.add_date_field("occurred_at", opts);
    let event_type = schema_builder.add_text_field("event", STRING | STORED);
@@ -61,10 +61,12 @@ fn main() -> tantivy::Result<()> {
        assert_eq!(count_docs.len(), 1);
        for (_score, doc_address) in count_docs {
            let retrieved_doc = searcher.doc::<TantivyDocument>(doc_address)?;
-            assert!(matches!(
-                retrieved_doc.get_first(occurred_at),
-                Some(OwnedValue::Date(_))
-            ));
+            assert!(retrieved_doc
+                .get_first(occurred_at)
+                .unwrap()
+                .as_value()
+                .as_datetime()
+                .is_some(),);
            assert_eq!(
                retrieved_doc.to_json(&schema),
                r#"{"event":["comment"],"occurred_at":["2022-06-22T13:00:00.22Z"]}"#
--- a/examples/faceted_search_with_tweaked_score.rs
+++ b/examples/faceted_search_with_tweaked_score.rs
@@ -51,7 +51,7 @@ fn main() -> tantivy::Result<()> {
    let reader = index.reader()?;
    let searcher = reader.searcher();
    {
-        let facets = vec![
+        let facets = [
            Facet::from("/ingredient/egg"),
            Facet::from("/ingredient/oil"),
            Facet::from("/ingredient/garlic"),
@@ -94,9 +94,8 @@ fn main() -> tantivy::Result<()> {
                    .doc::<TantivyDocument>(*doc_id)
                    .unwrap()
                    .get_first(title)
-                    .and_then(|v| v.as_str())
+                    .and_then(|v| v.as_str().map(|el| el.to_string()))
                    .unwrap()
-                    .to_owned()
            })
            .collect();
        assert_eq!(titles, vec!["Fried egg", "Egg rolls"]);
--- a/examples/index_from_multiple_threads.rs
+++ b/examples/index_from_multiple_threads.rs
@@ -61,7 +61,7 @@ fn main() -> tantivy::Result<()> {
                        debris of the winter’s flooding; and sycamores with mottled, white, recumbent \
                        limbs and branches that arch over the pool"
                    ))?;
-            println!("add doc {} from thread 1 - opstamp {}", i, opstamp);
+            println!("add doc {i} from thread 1 - opstamp {opstamp}");
            thread::sleep(Duration::from_millis(20));
        }
        Result::<(), TantivyError>::Ok(())
@@ -82,7 +82,7 @@ fn main() -> tantivy::Result<()> {
                    body => "Some great book description..."
                ))?
            };
-            println!("add doc {} from thread 2 - opstamp {}", i, opstamp);
+            println!("add doc {i} from thread 2 - opstamp {opstamp}");
            thread::sleep(Duration::from_millis(10));
        }
        Result::<(), TantivyError>::Ok(())
--- a/examples/iterating_docs_and_positions.rs
+++ b/examples/iterating_docs_and_positions.rs
@@ -7,10 +7,11 @@
 // the list of documents containing a term, getting
 // its term frequency, and accessing its positions.

+use tantivy::postings::Postings;
 // ---
 // Importing tantivy...
 use tantivy::schema::*;
-use tantivy::{doc, DocSet, Index, IndexWriter, Postings, TERMINATED};
+use tantivy::{doc, DocSet, Index, IndexWriter, TERMINATED};

 fn main() -> tantivy::Result<()> {
    // We first create a schema for the sake of the
--- a/examples/warmer.rs
+++ b/examples/warmer.rs
@@ -3,10 +3,11 @@ use std::collections::{HashMap, HashSet};
 use std::sync::{Arc, RwLock, Weak};

 use tantivy::collector::TopDocs;
+use tantivy::index::SegmentId;
 use tantivy::query::QueryParser;
 use tantivy::schema::{Schema, FAST, TEXT};
 use tantivy::{
-    doc, DocAddress, DocId, Index, IndexWriter, Opstamp, Searcher, SearcherGeneration, SegmentId,
+    doc, DocAddress, DocId, Index, IndexWriter, Opstamp, Searcher, SearcherGeneration,
    SegmentReader, Warmer,
 };

--- a/ownedbytes/Cargo.toml
+++ b/ownedbytes/Cargo.toml
@@ -1,7 +1,7 @@
 [package]
 authors = ["Paul Masurel <paul@quickwit.io>", "Pascal Seitz <pascal@quickwit.io>"]
 name = "ownedbytes"
-version = "0.6.0"
+version = "0.7.0"
 edition = "2021"
 description = "Expose data as static slice"
 license = "MIT"
--- a/query-grammar/Cargo.toml
+++ b/query-grammar/Cargo.toml
@@ -1,6 +1,6 @@
 [package]
 name = "tantivy-query-grammar"
-version = "0.21.0"
+version = "0.22.0"
 authors = ["Paul Masurel <paul.masurel@gmail.com>"]
 license = "MIT"
 categories = ["database-implementations", "data-structures"]
--- a/query-grammar/src/query_grammar.rs
+++ b/query-grammar/src/query_grammar.rs
@@ -1,3 +1,4 @@
+use std::borrow::Cow;
 use std::iter::once;

 use nom::branch::alt;
@@ -19,7 +20,7 @@ use crate::Occur;
 // Note: '-' char is only forbidden at the beginning of a field name, would be clearer to add it to
 // special characters.
 const SPECIAL_CHARS: &[char] = &[
-    '+', '^', '`', ':', '{', '}', '"', '[', ']', '(', ')', '!', '\\', '*', ' ',
+    '+', '^', '`', ':', '{', '}', '"', '\'', '[', ']', '(', ')', '!', '\\', '*', ' ',
 ];

 /// consume a field name followed by colon. Return the field name with escape sequence
@@ -41,36 +42,92 @@ fn field_name(inp: &str) -> IResult<&str, String> {
    )(inp)
 }

+const ESCAPE_IN_WORD: &[char] = &['^', '`', ':', '{', '}', '"', '\'', '[', ']', '(', ')', '\\'];
+
+fn interpret_escape(source: &str) -> String {
+    let mut res = String::with_capacity(source.len());
+    let mut in_escape = false;
+    let require_escape = |c: char| c.is_whitespace() || ESCAPE_IN_WORD.contains(&c) || c == '-';
+
+    for c in source.chars() {
+        if in_escape {
+            if !require_escape(c) {
+                // we re-add the escape sequence
+                res.push('\\');
+            }
+            res.push(c);
+            in_escape = false;
+        } else if c == '\\' {
+            in_escape = true;
+        } else {
+            res.push(c);
+        }
+    }
+    res
+}
+
 /// Consume a word outside of any context.
 // TODO should support escape sequences
-fn word(inp: &str) -> IResult<&str, &str> {
+fn word(inp: &str) -> IResult<&str, Cow<str>> {
    map_res(
        recognize(tuple((
-            satisfy(|c| {
-                !c.is_whitespace()
-                    && !['-', '^', '`', ':', '{', '}', '"', '[', ']', '(', ')'].contains(&c)
-            }),
-            many0(satisfy(|c: char| {
-                !c.is_whitespace() && ![':', '^', '{', '}', '"', '[', ']', '(', ')'].contains(&c)
-            })),
+            alt((
+                preceded(char('\\'), anychar),
+                satisfy(|c| !c.is_whitespace() && !ESCAPE_IN_WORD.contains(&c) && c != '-'),
+            )),
+            many0(alt((
+                preceded(char('\\'), anychar),
+                satisfy(|c: char| !c.is_whitespace() && !ESCAPE_IN_WORD.contains(&c)),
+            ))),
        ))),
        |s| match s {
            "OR" | "AND" | "NOT" | "IN" => Err(Error::new(inp, ErrorKind::Tag)),
-            _ => Ok(s),
+            s if s.contains('\\') => Ok(Cow::Owned(interpret_escape(s))),
+            s => Ok(Cow::Borrowed(s)),
        },
    )(inp)
 }

-fn word_infallible(delimiter: &str) -> impl Fn(&str) -> JResult<&str, Option<&str>> + '_ {
-    |inp| {
-        opt_i_err(
-            preceded(
-                multispace0,
-                recognize(many1(satisfy(|c| {
-                    !c.is_whitespace() && !delimiter.contains(c)
-                }))),
+fn word_infallible(
+    delimiter: &str,
+    emit_error: bool,
+) -> impl Fn(&str) -> JResult<&str, Option<Cow<str>>> + '_ {
+    // emit error is set when receiving an unescaped `:` should emit an error
+
+    move |inp| {
+        map(
+            opt_i_err(
+                preceded(
+                    multispace0,
+                    recognize(many1(alt((
+                        preceded(char::<&str, _>('\\'), anychar),
+                        satisfy(|c| !c.is_whitespace() && !delimiter.contains(c)),
+                    )))),
+                ),
+                "expected word",
            ),
-            "expected word",
+            |(opt_s, mut errors)| match opt_s {
+                Some(s) => {
+                    if emit_error
+                        && (s
+                            .as_bytes()
+                            .windows(2)
+                            .any(|window| window[0] != b'\\' && window[1] == b':')
+                            || s.starts_with(':'))
+                    {
+                        errors.push(LenientErrorInternal {
+                            pos: inp.len(),
+                            message: "parsed possible invalid field as term".to_string(),
+                        });
+                    }
+                    if s.contains('\\') {
+                        (Some(Cow::Owned(interpret_escape(s))), errors)
+                    } else {
+                        (Some(Cow::Borrowed(s)), errors)
+                    }
+                }
+                None => (None, errors),
+            },
        )(inp)
    }
 }
@@ -159,7 +216,7 @@ fn simple_term_infallible(
                (value((), char('\'')), simple_quotes),
            ),
            // numbers are parsed with words in this case, as we allow string starting with a -
-            map(word_infallible(delimiter), |(text, errors)| {
+            map(word_infallible(delimiter, true), |(text, errors)| {
                (text.map(|text| (Delimiter::None, text.to_string())), errors)
            }),
        )(inp)
@@ -218,27 +275,14 @@ fn term_or_phrase_infallible(inp: &str) -> JResult<&str, Option<UserInputLeaf>>
 }

 fn term_group(inp: &str) -> IResult<&str, UserInputAst> {
-    let occur_symbol = alt((
-        value(Occur::MustNot, char('-')),
-        value(Occur::Must, char('+')),
-    ));
-
    map(
        tuple((
            terminated(field_name, multispace0),
-            delimited(
-                tuple((char('('), multispace0)),
-                separated_list0(multispace1, tuple((opt(occur_symbol), term_or_phrase))),
-                char(')'),
-            ),
+            delimited(tuple((char('('), multispace0)), ast, char(')')),
        )),
-        |(field_name, terms)| {
-            UserInputAst::Clause(
-                terms
-                    .into_iter()
-                    .map(|(occur, leaf)| (occur, leaf.set_field(Some(field_name.clone())).into()))
-                    .collect(),
-            )
+        |(field_name, mut ast)| {
+            ast.set_default_field(field_name);
+            ast
        },
    )(inp)
 }
@@ -258,46 +302,18 @@ fn term_group_precond(inp: &str) -> IResult<&str, (), ()> {
 }

 fn term_group_infallible(inp: &str) -> JResult<&str, UserInputAst> {
-    let (mut inp, (field_name, _, _, _)) =
+    let (inp, (field_name, _, _, _)) =
        tuple((field_name, multispace0, char('('), multispace0))(inp).expect("precondition failed");

-    let mut terms = Vec::new();
-    let mut errs = Vec::new();
-
-    let mut first_round = true;
-    loop {
-        let mut space_error = if first_round {
-            first_round = false;
-            Vec::new()
-        } else {
-            let (rest, (_, err)) = space1_infallible(inp)?;
-            inp = rest;
-            err
-        };
-        if inp.is_empty() {
-            errs.push(LenientErrorInternal {
-                pos: inp.len(),
-                message: "missing )".to_string(),
-            });
-            break Ok((inp, (UserInputAst::Clause(terms), errs)));
-        }
-        if let Some(inp) = inp.strip_prefix(')') {
-            break Ok((inp, (UserInputAst::Clause(terms), errs)));
-        }
-        // only append missing space error if we did not reach the end of group
-        errs.append(&mut space_error);
-
-        // here we do the assumption term_or_phrase_infallible always consume something if the
-        // first byte is not `)` or ' '. If it did not, we would end up looping.
-
-        let (rest, ((occur, leaf), mut err)) =
-            tuple_infallible((occur_symbol, term_or_phrase_infallible))(inp)?;
-        errs.append(&mut err);
-        if let Some(leaf) = leaf {
-            terms.push((occur, leaf.set_field(Some(field_name.clone())).into()));
-        }
-        inp = rest;
-    }
+    let res = delimited_infallible(
+        nothing,
+        map(ast_infallible, |(mut ast, errors)| {
+            ast.set_default_field(field_name.to_string());
+            (ast, errors)
+        }),
+        opt_i_err(char(')'), "expected ')'"),
+    )(inp);
+    res
 }

 fn exists(inp: &str) -> IResult<&str, UserInputLeaf> {
@@ -363,15 +379,6 @@ fn literal_no_group_infallible(inp: &str) -> JResult<&str, Option<UserInputAst>>
        |((field_name, _, leaf), mut errors)| {
            (
                leaf.map(|leaf| {
-                    if matches!(&leaf, UserInputLeaf::Literal(literal)
-                            if literal.phrase.contains(':') && literal.delimiter == Delimiter::None)
-                        && field_name.is_none()
-                    {
-                        errors.push(LenientErrorInternal {
-                            pos: inp.len(),
-                            message: "parsed possible invalid field as term".to_string(),
-                        });
-                    }
                    if matches!(&leaf, UserInputLeaf::Literal(literal)
                            if literal.phrase == "NOT" && literal.delimiter == Delimiter::None)
                        && field_name.is_none()
@@ -490,20 +497,20 @@ fn range_infallible(inp: &str) -> JResult<&str, UserInputLeaf> {
        tuple_infallible((
            opt_i(anychar),
            space0_infallible,
-            word_infallible("]}"),
+            word_infallible("]}", false),
            space1_infallible,
            opt_i_err(
                terminated(tag("TO"), alt((value((), multispace1), value((), eof)))),
                "missing keyword TO",
            ),
-            word_infallible("]}"),
+            word_infallible("]}", false),
            opt_i_err(one_of("]}"), "missing range delimiter"),
        )),
        |(
            (lower_bound_kind, _multispace0, lower, _multispace1, to, upper, upper_bound_kind),
            errs,
        )| {
-            let lower_bound = match (lower_bound_kind, lower) {
+            let lower_bound = match (lower_bound_kind, lower.as_deref()) {
                (_, Some("*")) => UserInputBound::Unbounded,
                (_, None) => UserInputBound::Unbounded,
                // if it is some, TO was actually the bound (i.e. [TO TO something])
@@ -512,7 +519,7 @@ fn range_infallible(inp: &str) -> JResult<&str, UserInputLeaf> {
                (Some('{'), Some(bound)) => UserInputBound::Exclusive(bound.to_string()),
                _ => unreachable!("precondition failed, range did not start with [ or {{"),
            };
-            let upper_bound = match (upper_bound_kind, upper) {
+            let upper_bound = match (upper_bound_kind, upper.as_deref()) {
                (_, Some("*")) => UserInputBound::Unbounded,
                (_, None) => UserInputBound::Unbounded,
                (Some(']'), Some(bound)) => UserInputBound::Inclusive(bound.to_string()),
@@ -529,7 +536,7 @@ fn range_infallible(inp: &str) -> JResult<&str, UserInputLeaf> {
            (
                (
                    value((), tag(">=")),
-                    map(word_infallible(""), |(bound, err)| {
+                    map(word_infallible("", false), |(bound, err)| {
                        (
                            (
                                bound
@@ -543,7 +550,7 @@ fn range_infallible(inp: &str) -> JResult<&str, UserInputLeaf> {
                ),
                (
                    value((), tag("<=")),
-                    map(word_infallible(""), |(bound, err)| {
+                    map(word_infallible("", false), |(bound, err)| {
                        (
                            (
                                UserInputBound::Unbounded,
@@ -557,7 +564,7 @@ fn range_infallible(inp: &str) -> JResult<&str, UserInputLeaf> {
                ),
                (
                    value((), tag(">")),
-                    map(word_infallible(""), |(bound, err)| {
+                    map(word_infallible("", false), |(bound, err)| {
                        (
                            (
                                bound
@@ -571,7 +578,7 @@ fn range_infallible(inp: &str) -> JResult<&str, UserInputLeaf> {
                ),
                (
                    value((), tag("<")),
-                    map(word_infallible(""), |(bound, err)| {
+                    map(word_infallible("", false), |(bound, err)| {
                        (
                            (
                                UserInputBound::Unbounded,
@@ -1198,6 +1205,12 @@ mod test {
        test_parse_query_to_ast_helper("weight: <= 70", "\"weight\":{\"*\" TO \"70\"]");

        test_parse_query_to_ast_helper("weight: <= 70.5", "\"weight\":{\"*\" TO \"70.5\"]");
+
+        test_parse_query_to_ast_helper(">a", "{\"a\" TO \"*\"}");
+        test_parse_query_to_ast_helper(">=a", "[\"a\" TO \"*\"}");
+        test_parse_query_to_ast_helper("<a", "{\"*\" TO \"a\"}");
+        test_parse_query_to_ast_helper("<=a", "{\"*\" TO \"a\"]");
+        test_parse_query_to_ast_helper("<=bsd", "{\"*\" TO \"bsd\"]");
    }

    #[test]
@@ -1468,8 +1481,18 @@ mod test {

    #[test]
    fn test_parse_query_term_group() {
-        test_parse_query_to_ast_helper(r#"field:(abc)"#, r#"(*"field":abc)"#);
+        test_parse_query_to_ast_helper(r#"field:(abc)"#, r#""field":abc"#);
        test_parse_query_to_ast_helper(r#"field:(+a -"b c")"#, r#"(+"field":a -"field":"b c")"#);
+        test_parse_query_to_ast_helper(r#"field:(a AND "b c")"#, r#"(+"field":a +"field":"b c")"#);
+        test_parse_query_to_ast_helper(r#"field:(a OR "b c")"#, r#"(?"field":a ?"field":"b c")"#);
+        test_parse_query_to_ast_helper(
+            r#"field:(a OR (b AND c))"#,
+            r#"(?"field":a ?(+"field":b +"field":c))"#,
+        );
+        test_parse_query_to_ast_helper(
+            r#"field:(a [b TO c])"#,
+            r#"(*"field":a *"field":["b" TO "c"])"#,
+        );

        test_is_parse_err(r#"field:(+a -"b c""#, r#"(+"field":a -"field":"b c")"#);
    }
@@ -1621,5 +1644,21 @@ mod test {
            r#"myfield:'hello\"happy\'tax'"#,
            r#""myfield":'hello"happy'tax'"#,
        );
+        // we don't process escape sequence for chars which don't require it
+        test_parse_query_to_ast_helper(r#"abc\*"#, r#"abc\*"#);
+    }
+
+    #[test]
+    fn test_queries_with_colons() {
+        test_parse_query_to_ast_helper(r#""abc:def""#, r#""abc:def""#);
+        test_parse_query_to_ast_helper(r#"'abc:def'"#, r#"'abc:def'"#);
+        test_parse_query_to_ast_helper(r#"abc\:def"#, r#"abc:def"#);
+        test_parse_query_to_ast_helper(r#""abc\:def""#, r#""abc:def""#);
+        test_parse_query_to_ast_helper(r#"'abc\:def'"#, r#"'abc:def'"#);
+    }
+
+    #[test]
+    fn test_invalid_field() {
+        test_is_parse_err(r#"!bc:def"#, "!bc:def");
    }
 }
--- a/query-grammar/src/user_input_ast.rs
+++ b/query-grammar/src/user_input_ast.rs
@@ -44,6 +44,26 @@ impl UserInputLeaf {
            },
        }
    }
+
+    pub(crate) fn set_default_field(&mut self, default_field: String) {
+        match self {
+            UserInputLeaf::Literal(ref mut literal) if literal.field_name.is_none() => {
+                literal.field_name = Some(default_field)
+            }
+            UserInputLeaf::All => {
+                *self = UserInputLeaf::Exists {
+                    field: default_field,
+                }
+            }
+            UserInputLeaf::Range { ref mut field, .. } if field.is_none() => {
+                *field = Some(default_field)
+            }
+            UserInputLeaf::Set { ref mut field, .. } if field.is_none() => {
+                *field = Some(default_field)
+            }
+            _ => (), // field was already set, do nothing
+        }
+    }
 }

 impl Debug for UserInputLeaf {
@@ -205,6 +225,16 @@ impl UserInputAst {
    pub fn or(asts: Vec<UserInputAst>) -> UserInputAst {
        UserInputAst::compose(Occur::Should, asts)
    }
+
+    pub(crate) fn set_default_field(&mut self, field: String) {
+        match self {
+            UserInputAst::Clause(clauses) => clauses
+                .iter_mut()
+                .for_each(|(_, ast)| ast.set_default_field(field.clone())),
+            UserInputAst::Leaf(leaf) => leaf.set_default_field(field),
+            UserInputAst::Boost(ref mut ast, _) => ast.set_default_field(field),
+        }
+    }
 }

 impl From<UserInputLiteral> for UserInputLeaf {
--- a/src/aggregation/agg_bench.rs
+++ b/src/aggregation/agg_bench.rs
@@ -1,585 +0,0 @@
-#[cfg(all(test, feature = "unstable"))]
-mod bench {
-
-    use rand::prelude::SliceRandom;
-    use rand::rngs::StdRng;
-    use rand::{Rng, SeedableRng};
-    use rand_distr::Distribution;
-    use serde_json::json;
-    use test::{self, Bencher};
-
-    use crate::aggregation::agg_req::Aggregations;
-    use crate::aggregation::AggregationCollector;
-    use crate::query::{AllQuery, TermQuery};
-    use crate::schema::{IndexRecordOption, Schema, TextFieldIndexing, FAST, STRING};
-    use crate::{Index, Term};
-
-    #[derive(Clone, Copy, Hash, Default, Debug, PartialEq, Eq, PartialOrd, Ord)]
-    enum Cardinality {
-        /// All documents contain exactly one value.
-        /// `Full` is the default for auto-detecting the Cardinality, since it is the most strict.
-        #[default]
-        Full = 0,
-        /// All documents contain at most one value.
-        Optional = 1,
-        /// All documents may contain any number of values.
-        Multivalued = 2,
-        /// 1 / 20 documents has a value
-        Sparse = 3,
-    }
-
-    fn get_collector(agg_req: Aggregations) -> AggregationCollector {
-        AggregationCollector::from_aggs(agg_req, Default::default())
-    }
-
-    fn get_test_index_bench(cardinality: Cardinality) -> crate::Result<Index> {
-        let mut schema_builder = Schema::builder();
-        let text_fieldtype = crate::schema::TextOptions::default()
-            .set_indexing_options(
-                TextFieldIndexing::default().set_index_option(IndexRecordOption::WithFreqs),
-            )
-            .set_stored();
-        let text_field = schema_builder.add_text_field("text", text_fieldtype);
-        let json_field = schema_builder.add_json_field("json", FAST);
-        let text_field_many_terms = schema_builder.add_text_field("text_many_terms", STRING | FAST);
-        let text_field_few_terms = schema_builder.add_text_field("text_few_terms", STRING | FAST);
-        let score_fieldtype = crate::schema::NumericOptions::default().set_fast();
-        let score_field = schema_builder.add_u64_field("score", score_fieldtype.clone());
-        let score_field_f64 = schema_builder.add_f64_field("score_f64", score_fieldtype.clone());
-        let score_field_i64 = schema_builder.add_i64_field("score_i64", score_fieldtype);
-        let index = Index::create_from_tempdir(schema_builder.build())?;
-        let few_terms_data = ["INFO", "ERROR", "WARN", "DEBUG"];
-
-        let lg_norm = rand_distr::LogNormal::new(2.996f64, 0.979f64).unwrap();
-
-        let many_terms_data = (0..150_000)
-            .map(|num| format!("author{}", num))
-            .collect::<Vec<_>>();
-        {
-            let mut rng = StdRng::from_seed([1u8; 32]);
-            let mut index_writer = index.writer_with_num_threads(1, 200_000_000)?;
-            // To make the different test cases comparable we just change one doc to force the
-            // cardinality
-            if cardinality == Cardinality::Optional {
-                index_writer.add_document(doc!())?;
-            }
-            if cardinality == Cardinality::Multivalued {
-                index_writer.add_document(doc!(
-                    json_field => json!({"mixed_type": 10.0}),
-                    json_field => json!({"mixed_type": 10.0}),
-                    text_field => "cool",
-                    text_field => "cool",
-                    text_field_many_terms => "cool",
-                    text_field_many_terms => "cool",
-                    text_field_few_terms => "cool",
-                    text_field_few_terms => "cool",
-                    score_field => 1u64,
-                    score_field => 1u64,
-                    score_field_f64 => lg_norm.sample(&mut rng),
-                    score_field_f64 => lg_norm.sample(&mut rng),
-                    score_field_i64 => 1i64,
-                    score_field_i64 => 1i64,
-                ))?;
-            }
-            let mut doc_with_value = 1_000_000;
-            if cardinality == Cardinality::Sparse {
-                doc_with_value /= 20;
-            }
-            let _val_max = 1_000_000.0;
-            for _ in 0..doc_with_value {
-                let val: f64 = rng.gen_range(0.0..1_000_000.0);
-                let json = if rng.gen_bool(0.1) {
-                    // 10% are numeric values
-                    json!({ "mixed_type": val })
-                } else {
-                    json!({"mixed_type": many_terms_data.choose(&mut rng).unwrap().to_string()})
-                };
-                index_writer.add_document(doc!(
-                    text_field => "cool",
-                    json_field => json,
-                    text_field_many_terms => many_terms_data.choose(&mut rng).unwrap().to_string(),
-                    text_field_few_terms => few_terms_data.choose(&mut rng).unwrap().to_string(),
-                    score_field => val as u64,
-                    score_field_f64 => lg_norm.sample(&mut rng),
-                    score_field_i64 => val as i64,
-                ))?;
-                if cardinality == Cardinality::Sparse {
-                    for _ in 0..20 {
-                        index_writer.add_document(doc!(text_field => "cool"))?;
-                    }
-                }
-            }
-            // writing the segment
-            index_writer.commit()?;
-        }
-
-        Ok(index)
-    }
-
-    use paste::paste;
-    #[macro_export]
-    macro_rules! bench_all_cardinalities {
-        (  $x:ident ) => {
-            paste! {
-                #[bench]
-                fn $x(b: &mut Bencher) {
-                    [<$x _card>](b, Cardinality::Full)
-                }
-
-                #[bench]
-                fn [<$x _opt>](b: &mut Bencher) {
-                    [<$x _card>](b, Cardinality::Optional)
-                }
-
-                #[bench]
-                fn [<$x _multi>](b: &mut Bencher) {
-                    [<$x _card>](b, Cardinality::Multivalued)
-                }
-
-                #[bench]
-                fn [<$x _sparse>](b: &mut Bencher) {
-                    [<$x _card>](b, Cardinality::Sparse)
-                }
-
-            }
-        };
-    }
-
-    bench_all_cardinalities!(bench_aggregation_average_u64);
-
-    fn bench_aggregation_average_u64_card(b: &mut Bencher, cardinality: Cardinality) {
-        let index = get_test_index_bench(cardinality).unwrap();
-        let reader = index.reader().unwrap();
-        let text_field = reader.searcher().schema().get_field("text").unwrap();
-
-        b.iter(|| {
-            let term_query = TermQuery::new(
-                Term::from_field_text(text_field, "cool"),
-                IndexRecordOption::Basic,
-            );
-
-            let agg_req_1: Aggregations = serde_json::from_value(json!({
-                "average": { "avg": { "field": "score", } }
-            }))
-            .unwrap();
-
-            let collector = get_collector(agg_req_1);
-
-            let searcher = reader.searcher();
-            searcher.search(&term_query, &collector).unwrap()
-        });
-    }
-
-    bench_all_cardinalities!(bench_aggregation_stats_f64);
-
-    fn bench_aggregation_stats_f64_card(b: &mut Bencher, cardinality: Cardinality) {
-        let index = get_test_index_bench(cardinality).unwrap();
-        let reader = index.reader().unwrap();
-        let text_field = reader.searcher().schema().get_field("text").unwrap();
-
-        b.iter(|| {
-            let term_query = TermQuery::new(
-                Term::from_field_text(text_field, "cool"),
-                IndexRecordOption::Basic,
-            );
-
-            let agg_req_1: Aggregations = serde_json::from_value(json!({
-                "average_f64": { "stats": { "field": "score_f64", } }
-            }))
-            .unwrap();
-
-            let collector = get_collector(agg_req_1);
-
-            let searcher = reader.searcher();
-            searcher.search(&term_query, &collector).unwrap()
-        });
-    }
-
-    bench_all_cardinalities!(bench_aggregation_average_f64);
-
-    fn bench_aggregation_average_f64_card(b: &mut Bencher, cardinality: Cardinality) {
-        let index = get_test_index_bench(cardinality).unwrap();
-        let reader = index.reader().unwrap();
-        let text_field = reader.searcher().schema().get_field("text").unwrap();
-
-        b.iter(|| {
-            let term_query = TermQuery::new(
-                Term::from_field_text(text_field, "cool"),
-                IndexRecordOption::Basic,
-            );
-
-            let agg_req_1: Aggregations = serde_json::from_value(json!({
-                "average_f64": { "avg": { "field": "score_f64", } }
-            }))
-            .unwrap();
-
-            let collector = get_collector(agg_req_1);
-
-            let searcher = reader.searcher();
-            searcher.search(&term_query, &collector).unwrap()
-        });
-    }
-
-    bench_all_cardinalities!(bench_aggregation_percentiles_f64);
-
-    fn bench_aggregation_percentiles_f64_card(b: &mut Bencher, cardinality: Cardinality) {
-        let index = get_test_index_bench(cardinality).unwrap();
-        let reader = index.reader().unwrap();
-
-        b.iter(|| {
-            let agg_req_str = r#"
-            {
-              "mypercentiles": {
-                "percentiles": {
-                  "field": "score_f64",
-                  "percents": [ 95, 99, 99.9 ]
-                }
-              }
-            } "#;
-            let agg_req_1: Aggregations = serde_json::from_str(agg_req_str).unwrap();
-
-            let collector = get_collector(agg_req_1);
-
-            let searcher = reader.searcher();
-            searcher.search(&AllQuery, &collector).unwrap()
-        });
-    }
-
-    bench_all_cardinalities!(bench_aggregation_average_u64_and_f64);
-
-    fn bench_aggregation_average_u64_and_f64_card(b: &mut Bencher, cardinality: Cardinality) {
-        let index = get_test_index_bench(cardinality).unwrap();
-        let reader = index.reader().unwrap();
-        let text_field = reader.searcher().schema().get_field("text").unwrap();
-
-        b.iter(|| {
-            let term_query = TermQuery::new(
-                Term::from_field_text(text_field, "cool"),
-                IndexRecordOption::Basic,
-            );
-
-            let agg_req_1: Aggregations = serde_json::from_value(json!({
-                "average_f64": { "avg": { "field": "score_f64" } },
-                "average": { "avg": { "field": "score" } },
-            }))
-            .unwrap();
-
-            let collector = get_collector(agg_req_1);
-
-            let searcher = reader.searcher();
-            searcher.search(&term_query, &collector).unwrap()
-        });
-    }
-
-    bench_all_cardinalities!(bench_aggregation_terms_few);
-
-    fn bench_aggregation_terms_few_card(b: &mut Bencher, cardinality: Cardinality) {
-        let index = get_test_index_bench(cardinality).unwrap();
-        let reader = index.reader().unwrap();
-
-        b.iter(|| {
-            let agg_req: Aggregations = serde_json::from_value(json!({
-                "my_texts": { "terms": { "field": "text_few_terms" } },
-            }))
-            .unwrap();
-
-            let collector = get_collector(agg_req);
-
-            let searcher = reader.searcher();
-            searcher.search(&AllQuery, &collector).unwrap()
-        });
-    }
-
-    bench_all_cardinalities!(bench_aggregation_terms_many_with_top_hits_agg);
-
-    fn bench_aggregation_terms_many_with_top_hits_agg_card(
-        b: &mut Bencher,
-        cardinality: Cardinality,
-    ) {
-        let index = get_test_index_bench(cardinality).unwrap();
-        let reader = index.reader().unwrap();
-
-        b.iter(|| {
-            let agg_req: Aggregations = serde_json::from_value(json!({
-                "my_texts": {
-                    "terms": { "field": "text_many_terms" },
-                    "aggs": {
-                        "top_hits": { "top_hits":
-                            {
-                                "sort": [
-                                    { "score": "desc" }
-                                ],
-                                "size": 2,
-                                "doc_value_fields": ["score_f64"]
-                            }
-                        }
-                    }
-                },
-            }))
-            .unwrap();
-
-            let collector = get_collector(agg_req);
-
-            let searcher = reader.searcher();
-            searcher.search(&AllQuery, &collector).unwrap()
-        });
-    }
-
-    bench_all_cardinalities!(bench_aggregation_terms_many_with_sub_agg);
-
-    fn bench_aggregation_terms_many_with_sub_agg_card(b: &mut Bencher, cardinality: Cardinality) {
-        let index = get_test_index_bench(cardinality).unwrap();
-        let reader = index.reader().unwrap();
-
-        b.iter(|| {
-            let agg_req: Aggregations = serde_json::from_value(json!({
-                "my_texts": {
-                    "terms": { "field": "text_many_terms" },
-                    "aggs": {
-                        "average_f64": { "avg": { "field": "score_f64" } }
-                    }
-                },
-            }))
-            .unwrap();
-
-            let collector = get_collector(agg_req);
-
-            let searcher = reader.searcher();
-            searcher.search(&AllQuery, &collector).unwrap()
-        });
-    }
-
-    bench_all_cardinalities!(bench_aggregation_terms_many_json_mixed_type_with_sub_agg);
-
-    fn bench_aggregation_terms_many_json_mixed_type_with_sub_agg_card(
-        b: &mut Bencher,
-        cardinality: Cardinality,
-    ) {
-        let index = get_test_index_bench(cardinality).unwrap();
-        let reader = index.reader().unwrap();
-
-        b.iter(|| {
-            let agg_req: Aggregations = serde_json::from_value(json!({
-                "my_texts": {
-                    "terms": { "field": "json.mixed_type" },
-                    "aggs": {
-                        "average_f64": { "avg": { "field": "score_f64" } }
-                    }
-                },
-            }))
-            .unwrap();
-
-            let collector = get_collector(agg_req);
-
-            let searcher = reader.searcher();
-            searcher.search(&AllQuery, &collector).unwrap()
-        });
-    }
-
-    bench_all_cardinalities!(bench_aggregation_terms_many2);
-
-    fn bench_aggregation_terms_many2_card(b: &mut Bencher, cardinality: Cardinality) {
-        let index = get_test_index_bench(cardinality).unwrap();
-        let reader = index.reader().unwrap();
-
-        b.iter(|| {
-            let agg_req: Aggregations = serde_json::from_value(json!({
-                "my_texts": { "terms": { "field": "text_many_terms" } },
-            }))
-            .unwrap();
-
-            let collector = get_collector(agg_req);
-
-            let searcher = reader.searcher();
-            searcher.search(&AllQuery, &collector).unwrap()
-        });
-    }
-
-    bench_all_cardinalities!(bench_aggregation_terms_many_order_by_term);
-
-    fn bench_aggregation_terms_many_order_by_term_card(b: &mut Bencher, cardinality: Cardinality) {
-        let index = get_test_index_bench(cardinality).unwrap();
-        let reader = index.reader().unwrap();
-
-        b.iter(|| {
-            let agg_req: Aggregations = serde_json::from_value(json!({
-                "my_texts": { "terms": { "field": "text_many_terms", "order": { "_key": "desc" } } },
-            }))
-            .unwrap();
-
-            let collector = get_collector(agg_req);
-
-            let searcher = reader.searcher();
-            searcher.search(&AllQuery, &collector).unwrap()
-        });
-    }
-
-    bench_all_cardinalities!(bench_aggregation_range_only);
-
-    fn bench_aggregation_range_only_card(b: &mut Bencher, cardinality: Cardinality) {
-        let index = get_test_index_bench(cardinality).unwrap();
-        let reader = index.reader().unwrap();
-
-        b.iter(|| {
-            let agg_req_1: Aggregations = serde_json::from_value(json!({
-                "range_f64": { "range": { "field": "score_f64", "ranges": [
-                    { "from": 3, "to": 7000 },
-                    { "from": 7000, "to": 20000 },
-                    { "from": 20000, "to": 30000 },
-                    { "from": 30000, "to": 40000 },
-                    { "from": 40000, "to": 50000 },
-                    { "from": 50000, "to": 60000 }
-                ] } },
-            }))
-            .unwrap();
-
-            let collector = get_collector(agg_req_1);
-
-            let searcher = reader.searcher();
-            searcher.search(&AllQuery, &collector).unwrap()
-        });
-    }
-
-    bench_all_cardinalities!(bench_aggregation_range_with_avg);
-
-    fn bench_aggregation_range_with_avg_card(b: &mut Bencher, cardinality: Cardinality) {
-        let index = get_test_index_bench(cardinality).unwrap();
-        let reader = index.reader().unwrap();
-
-        b.iter(|| {
-            let agg_req_1: Aggregations = serde_json::from_value(json!({
-                "rangef64": {
-                    "range": {
-                        "field": "score_f64",
-                        "ranges": [
-                            { "from": 3, "to": 7000 },
-                            { "from": 7000, "to": 20000 },
-                            { "from": 20000, "to": 30000 },
-                            { "from": 30000, "to": 40000 },
-                            { "from": 40000, "to": 50000 },
-                            { "from": 50000, "to": 60000 }
-                        ]
-                    },
-                    "aggs": {
-                        "average_f64": { "avg": { "field": "score_f64" } }
-                    }
-                },
-            }))
-            .unwrap();
-
-            let collector = get_collector(agg_req_1);
-
-            let searcher = reader.searcher();
-            searcher.search(&AllQuery, &collector).unwrap()
-        });
-    }
-
-    // hard bounds has a different algorithm, because it actually limits collection range
-    //
-    bench_all_cardinalities!(bench_aggregation_histogram_only_hard_bounds);
-
-    fn bench_aggregation_histogram_only_hard_bounds_card(
-        b: &mut Bencher,
-        cardinality: Cardinality,
-    ) {
-        let index = get_test_index_bench(cardinality).unwrap();
-        let reader = index.reader().unwrap();
-
-        b.iter(|| {
-            let agg_req_1: Aggregations = serde_json::from_value(json!({
-                "rangef64": { "histogram": { "field": "score_f64", "interval": 100, "hard_bounds": { "min": 1000, "max": 300000 } } },
-            }))
-            .unwrap();
-
-            let collector = get_collector(agg_req_1);
-            let searcher = reader.searcher();
-            searcher.search(&AllQuery, &collector).unwrap()
-        });
-    }
-
-    bench_all_cardinalities!(bench_aggregation_histogram_with_avg);
-
-    fn bench_aggregation_histogram_with_avg_card(b: &mut Bencher, cardinality: Cardinality) {
-        let index = get_test_index_bench(cardinality).unwrap();
-        let reader = index.reader().unwrap();
-
-        b.iter(|| {
-            let agg_req_1: Aggregations = serde_json::from_value(json!({
-                "rangef64": {
-                    "histogram": { "field": "score_f64", "interval": 100 },
-                    "aggs": {
-                        "average_f64": { "avg": { "field": "score_f64" } }
-                    }
-                }
-            }))
-            .unwrap();
-
-            let collector = get_collector(agg_req_1);
-
-            let searcher = reader.searcher();
-            searcher.search(&AllQuery, &collector).unwrap()
-        });
-    }
-
-    bench_all_cardinalities!(bench_aggregation_histogram_only);
-
-    fn bench_aggregation_histogram_only_card(b: &mut Bencher, cardinality: Cardinality) {
-        let index = get_test_index_bench(cardinality).unwrap();
-        let reader = index.reader().unwrap();
-
-        b.iter(|| {
-            let agg_req_1: Aggregations = serde_json::from_value(json!({
-                "rangef64": {
-                    "histogram": {
-                        "field": "score_f64",
-                        "interval": 100 // 1000 buckets
-                    },
-                }
-            }))
-            .unwrap();
-
-            let collector = get_collector(agg_req_1);
-
-            let searcher = reader.searcher();
-            searcher.search(&AllQuery, &collector).unwrap()
-        });
-    }
-
-    bench_all_cardinalities!(bench_aggregation_avg_and_range_with_avg);
-
-    fn bench_aggregation_avg_and_range_with_avg_card(b: &mut Bencher, cardinality: Cardinality) {
-        let index = get_test_index_bench(cardinality).unwrap();
-        let reader = index.reader().unwrap();
-        let text_field = reader.searcher().schema().get_field("text").unwrap();
-
-        b.iter(|| {
-            let term_query = TermQuery::new(
-                Term::from_field_text(text_field, "cool"),
-                IndexRecordOption::Basic,
-            );
-
-            let agg_req_1: Aggregations = serde_json::from_value(json!({
-                "rangef64": {
-                    "range": {
-                        "field": "score_f64",
-                        "ranges": [
-                            { "from": 3, "to": 7000 },
-                            { "from": 7000, "to": 20000 },
-                            { "from": 20000, "to": 60000 }
-                        ]
-                    },
-                    "aggs": {
-                        "average_in_range": { "avg": { "field": "score" } }
-                    }
-                },
-                "average": { "avg": { "field": "score" } }
-            }))
-            .unwrap();
-
-            let collector = get_collector(agg_req_1);
-
-            let searcher = reader.searcher();
-            searcher.search(&term_query, &collector).unwrap()
-        });
-    }
-}
--- a/src/aggregation/agg_limits.rs
+++ b/src/aggregation/agg_limits.rs
@@ -81,10 +81,11 @@ impl AggregationLimits {
        }
    }

-    pub(crate) fn add_memory_consumed(&self, num_bytes: u64) -> crate::Result<()> {
-        self.memory_consumption
-            .fetch_add(num_bytes, Ordering::Relaxed);
-        validate_memory_consumption(&self.memory_consumption, self.memory_limit)?;
+    pub(crate) fn add_memory_consumed(&self, add_num_bytes: u64) -> crate::Result<()> {
+        let prev_value = self
+            .memory_consumption
+            .fetch_add(add_num_bytes, Ordering::Relaxed);
+        validate_memory_consumption(prev_value + add_num_bytes, self.memory_limit)?;
        Ok(())
    }

@@ -94,11 +95,11 @@ impl AggregationLimits {
 }

 fn validate_memory_consumption(
-    memory_consumption: &AtomicU64,
+    memory_consumption: u64,
    memory_limit: ByteCount,
 ) -> Result<(), AggregationError> {
    // Load the estimated memory consumed by the aggregations
-    let memory_consumed: ByteCount = memory_consumption.load(Ordering::Relaxed).into();
+    let memory_consumed: ByteCount = memory_consumption.into();
    if memory_consumed > memory_limit {
        return Err(AggregationError::MemoryExceeded {
            limit: memory_limit,
@@ -118,10 +119,11 @@ pub struct ResourceLimitGuard {
 }

 impl ResourceLimitGuard {
-    pub(crate) fn add_memory_consumed(&self, num_bytes: u64) -> crate::Result<()> {
-        self.memory_consumption
-            .fetch_add(num_bytes, Ordering::Relaxed);
-        validate_memory_consumption(&self.memory_consumption, self.memory_limit)?;
+    pub(crate) fn add_memory_consumed(&self, add_num_bytes: u64) -> crate::Result<()> {
+        let prev_value = self
+            .memory_consumption
+            .fetch_add(add_num_bytes, Ordering::Relaxed);
+        validate_memory_consumption(prev_value + add_num_bytes, self.memory_limit)?;
        Ok(())
    }
 }
--- a/src/aggregation/agg_req.rs
+++ b/src/aggregation/agg_req.rs
@@ -34,7 +34,7 @@ use super::bucket::{
    DateHistogramAggregationReq, HistogramAggregation, RangeAggregation, TermsAggregation,
 };
 use super::metric::{
-    AverageAggregation, CountAggregation, MaxAggregation, MinAggregation,
+    AverageAggregation, CountAggregation, ExtendedStatsAggregation, MaxAggregation, MinAggregation,
    PercentilesAggregationReq, StatsAggregation, SumAggregation, TopHitsAggregation,
 };

@@ -146,6 +146,11 @@ pub enum AggregationVariants {
    /// extracted values.
    #[serde(rename = "stats")]
    Stats(StatsAggregation),
+    /// Computes a collection of estended statistics (`min`, `max`, `sum`, `count`, `avg`,
+    /// `sum_of_squares`, `variance`, `variance_sampling`, `std_deviation`,
+    /// `std_deviation_sampling`) over the  extracted values.
+    #[serde(rename = "extended_stats")]
+    ExtendedStats(ExtendedStatsAggregation),
    /// Computes the sum of the extracted values.
    #[serde(rename = "sum")]
    Sum(SumAggregation),
@@ -170,6 +175,7 @@ impl AggregationVariants {
            AggregationVariants::Max(max) => vec![max.field_name()],
            AggregationVariants::Min(min) => vec![min.field_name()],
            AggregationVariants::Stats(stats) => vec![stats.field_name()],
+            AggregationVariants::ExtendedStats(extended_stats) => vec![extended_stats.field_name()],
            AggregationVariants::Sum(sum) => vec![sum.field_name()],
            AggregationVariants::Percentiles(per) => vec![per.field_name()],
            AggregationVariants::TopHits(top_hits) => top_hits.field_names(),
@@ -197,6 +203,12 @@ impl AggregationVariants {
            _ => None,
        }
    }
+    pub(crate) fn as_top_hits(&self) -> Option<&TopHitsAggregation> {
+        match &self {
+            AggregationVariants::TopHits(top_hits) => Some(top_hits),
+            _ => None,
+        }
+    }

    pub(crate) fn as_percentile(&self) -> Option<&PercentilesAggregationReq> {
        match &self {
--- a/src/aggregation/agg_req_with_accessor.rs
+++ b/src/aggregation/agg_req_with_accessor.rs
@@ -11,13 +11,14 @@ use super::bucket::{
    DateHistogramAggregationReq, HistogramAggregation, RangeAggregation, TermsAggregation,
 };
 use super::metric::{
-    AverageAggregation, CountAggregation, MaxAggregation, MinAggregation, StatsAggregation,
-    SumAggregation,
+    AverageAggregation, CountAggregation, ExtendedStatsAggregation, MaxAggregation, MinAggregation,
+    StatsAggregation, SumAggregation,
 };
 use super::segment_agg_result::AggregationLimits;
 use super::VecWithNames;
 use crate::aggregation::{f64_to_fastfield_u64, Key};
-use crate::{SegmentOrdinal, SegmentReader};
+use crate::index::SegmentReader;
+use crate::SegmentOrdinal;

 #[derive(Default)]
 pub(crate) struct AggregationsWithAccessor {
@@ -275,6 +276,10 @@ impl AggregationWithAccessor {
                field: ref field_name,
                ..
            })
+            | ExtendedStats(ExtendedStatsAggregation {
+                field: ref field_name,
+                ..
+            })
            | Sum(SumAggregation {
                field: ref field_name,
                ..
@@ -292,7 +297,7 @@ impl AggregationWithAccessor {
                add_agg_with_accessor(&agg, accessor, column_type, &mut res)?;
            }
            TopHits(ref mut top_hits) => {
-                top_hits.validate_and_resolve(reader.fast_fields().columnar())?;
+                top_hits.validate_and_resolve_field_names(reader.fast_fields().columnar())?;
                let accessors: Vec<(Column<u64>, ColumnType)> = top_hits
                    .field_names()
                    .iter()
@@ -334,8 +339,8 @@ fn get_missing_val(
        }
        _ => {
            return Err(crate::TantivyError::InvalidArgument(format!(
-                "Missing value {:?} for field {} is not supported for column type {:?}",
-                missing, field_name, column_type
+                "Missing value {missing:?} for field {field_name} is not supported for column \
+                 type {column_type:?}"
            )));
        }
    };
@@ -402,7 +407,7 @@ fn get_dynamic_columns(
        .iter()
        .map(|h| h.open())
        .collect::<io::Result<_>>()?;
-    assert!(!ff_fields.is_empty(), "field {} not found", field_name);
+    assert!(!ff_fields.is_empty(), "field {field_name} not found");
    Ok(cols)
 }

--- a/src/aggregation/agg_result.rs
+++ b/src/aggregation/agg_result.rs
@@ -8,7 +8,9 @@ use rustc_hash::FxHashMap;
 use serde::{Deserialize, Serialize};

 use super::bucket::GetDocCount;
-use super::metric::{PercentilesMetricResult, SingleMetricResult, Stats, TopHitsMetricResult};
+use super::metric::{
+    ExtendedStats, PercentilesMetricResult, SingleMetricResult, Stats, TopHitsMetricResult,
+};
 use super::{AggregationError, Key};
 use crate::TantivyError;

@@ -88,6 +90,8 @@ pub enum MetricResult {
    Min(SingleMetricResult),
    /// Stats metric result.
    Stats(Stats),
+    /// ExtendedStats metric result.
+    ExtendedStats(Box<ExtendedStats>),
    /// Sum metric result.
    Sum(SingleMetricResult),
    /// Percentiles metric result.
@@ -104,6 +108,7 @@ impl MetricResult {
            MetricResult::Max(max) => Ok(max.value),
            MetricResult::Min(min) => Ok(min.value),
            MetricResult::Stats(stats) => stats.get_value(agg_property),
+            MetricResult::ExtendedStats(extended_stats) => extended_stats.get_value(agg_property),
            MetricResult::Sum(sum) => Ok(sum.value),
            MetricResult::Percentiles(_) => Err(TantivyError::AggregationError(
                AggregationError::InvalidRequest("percentiles can't be used to order".to_string()),
--- a/src/aggregation/agg_tests.rs
+++ b/src/aggregation/agg_tests.rs
@@ -4,6 +4,7 @@ use crate::aggregation::agg_req::{Aggregation, Aggregations};
 use crate::aggregation::agg_result::AggregationResults;
 use crate::aggregation::buf_collector::DOC_BLOCK_SIZE;
 use crate::aggregation::collector::AggregationCollector;
+use crate::aggregation::intermediate_agg_result::IntermediateAggregationResults;
 use crate::aggregation::segment_agg_result::AggregationLimits;
 use crate::aggregation::tests::{get_test_index_2_segments, get_test_index_from_values_and_terms};
 use crate::aggregation::DistributedAggregationCollector;
@@ -66,6 +67,22 @@ fn test_aggregation_flushing(
            }
        }
    },
+    "top_hits_test":{
+        "terms": {
+            "field": "string_id"
+        },
+        "aggs": {
+            "bucketsL2": {
+                "top_hits": {
+                    "size": 2,
+                    "sort": [
+                        { "score": "asc" }
+                    ],
+                    "docvalue_fields": ["score"]
+                }
+            }
+        }
+    },
    "histogram_test":{
        "histogram": {
            "field": "score",
@@ -108,6 +125,16 @@ fn test_aggregation_flushing(

        let searcher = reader.searcher();
        let intermediate_agg_result = searcher.search(&AllQuery, &collector).unwrap();
+
+        // Test postcard roundtrip serialization
+        let intermediate_agg_result_bytes = postcard::to_allocvec(&intermediate_agg_result).expect(
+            "Postcard Serialization failed, flatten etc. is not supported in the intermediate \
+             result",
+        );
+        let intermediate_agg_result: IntermediateAggregationResults =
+            postcard::from_bytes(&intermediate_agg_result_bytes)
+                .expect("Post deserialization failed");
+
        intermediate_agg_result
            .into_final_result(agg_req, &Default::default())
            .unwrap()
--- a/src/aggregation/bucket/histogram/histogram.rs
+++ b/src/aggregation/bucket/histogram/histogram.rs
@@ -1,7 +1,5 @@
 use std::cmp::Ordering;

-use columnar::ColumnType;
-use itertools::Itertools;
 use rustc_hash::FxHashMap;
 use serde::{Deserialize, Serialize};
 use tantivy_bitpacker::minmax;
@@ -17,7 +15,7 @@ use crate::aggregation::intermediate_agg_result::{
    IntermediateHistogramBucketEntry,
 };
 use crate::aggregation::segment_agg_result::{
-    build_segment_agg_collector, AggregationLimits, SegmentAggregationCollector,
+    build_segment_agg_collector, SegmentAggregationCollector,
 };
 use crate::aggregation::*;
 use crate::TantivyError;
@@ -333,9 +331,11 @@ impl SegmentAggregationCollector for SegmentHistogramCollector {
        }

        let mem_delta = self.get_memory_consumption() - mem_pre;
-        bucket_agg_accessor
-            .limits
-            .add_memory_consumed(mem_delta as u64)?;
+        if mem_delta > 0 {
+            bucket_agg_accessor
+                .limits
+                .add_memory_consumed(mem_delta as u64)?;
+        }

        Ok(())
    }
--- a/src/aggregation/bucket/mod.rs
+++ b/src/aggregation/bucket/mod.rs
@@ -28,6 +28,7 @@ mod term_agg;
 mod term_missing_agg;

 use std::collections::HashMap;
+use std::fmt;

 pub use histogram::*;
 pub use range::*;
@@ -72,12 +73,12 @@ impl From<&str> for OrderTarget {
    }
 }

-impl ToString for OrderTarget {
-    fn to_string(&self) -> String {
+impl fmt::Display for OrderTarget {
+    fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
        match self {
-            OrderTarget::Key => "_key".to_string(),
-            OrderTarget::Count => "_count".to_string(),
-            OrderTarget::SubAggregation(agg) => agg.to_string(),
+            OrderTarget::Key => f.write_str("_key"),
+            OrderTarget::Count => f.write_str("_count"),
+            OrderTarget::SubAggregation(agg) => agg.fmt(f),
        }
    }
 }
--- a/src/aggregation/bucket/range.rs
+++ b/src/aggregation/bucket/range.rs
@@ -1,7 +1,6 @@
 use std::fmt::Debug;
 use std::ops::Range;

-use columnar::{ColumnType, MonotonicallyMappableToU64};
 use rustc_hash::FxHashMap;
 use serde::{Deserialize, Serialize};

@@ -450,7 +449,6 @@ pub(crate) fn range_to_key(range: &Range<u64>, field_type: &ColumnType) -> crate
 #[cfg(test)]
 mod tests {

-    use columnar::MonotonicallyMappableToU64;
    use serde_json::Value;

    use super::*;
@@ -459,7 +457,6 @@ mod tests {
        exec_request, exec_request_with_query, get_test_index_2_segments,
        get_test_index_with_num_docs,
    };
-    use crate::aggregation::AggregationLimits;

    pub fn get_collector_from_ranges(
        ranges: Vec<RangeAggregationRange>,
--- a/src/aggregation/bucket/term_agg.rs
+++ b/src/aggregation/bucket/term_agg.rs
@@ -324,9 +324,11 @@ impl SegmentAggregationCollector for SegmentTermCollector {
        }

        let mem_delta = self.get_memory_consumption() - mem_pre;
-        bucket_agg_accessor
-            .limits
-            .add_memory_consumed(mem_delta as u64)?;
+        if mem_delta > 0 {
+            bucket_agg_accessor
+                .limits
+                .add_memory_consumed(mem_delta as u64)?;
+        }

        Ok(())
    }
@@ -355,8 +357,7 @@ impl SegmentTermCollector {
    ) -> crate::Result<Self> {
        if field_type == ColumnType::Bytes {
            return Err(TantivyError::InvalidArgument(format!(
-                "terms aggregation is not supported for column type {:?}",
-                field_type
+                "terms aggregation is not supported for column type {field_type:?}"
            )));
        }
        let term_buckets = TermBuckets::default();
--- a/src/aggregation/collector.rs
+++ b/src/aggregation/collector.rs
@@ -8,7 +8,8 @@ use super::segment_agg_result::{
 };
 use crate::aggregation::agg_req_with_accessor::get_aggs_with_segment_accessor_and_validate;
 use crate::collector::{Collector, SegmentCollector};
-use crate::{DocId, SegmentOrdinal, SegmentReader, TantivyError};
+use crate::index::SegmentReader;
+use crate::{DocId, SegmentOrdinal, TantivyError};

 /// The default max bucket count, before the aggregation fails.
 pub const DEFAULT_BUCKET_LIMIT: u32 = 65000;
--- a/src/aggregation/intermediate_agg_result.rs
+++ b/src/aggregation/intermediate_agg_result.rs
@@ -19,8 +19,8 @@ use super::bucket::{
    GetDocCount, Order, OrderTarget, RangeAggregation, TermsAggregation,
 };
 use super::metric::{
-    IntermediateAverage, IntermediateCount, IntermediateMax, IntermediateMin, IntermediateStats,
-    IntermediateSum, PercentilesCollector, TopHitsCollector,
+    IntermediateAverage, IntermediateCount, IntermediateExtendedStats, IntermediateMax,
+    IntermediateMin, IntermediateStats, IntermediateSum, PercentilesCollector, TopHitsTopNComputer,
 };
 use super::segment_agg_result::AggregationLimits;
 use super::{format_date, AggregationError, Key, SerializedKey};
@@ -215,15 +215,18 @@ pub(crate) fn empty_from_req(req: &Aggregation) -> IntermediateAggregationResult
        Stats(_) => IntermediateAggregationResult::Metric(IntermediateMetricResult::Stats(
            IntermediateStats::default(),
        )),
+        ExtendedStats(_) => IntermediateAggregationResult::Metric(
+            IntermediateMetricResult::ExtendedStats(IntermediateExtendedStats::default()),
+        ),
        Sum(_) => IntermediateAggregationResult::Metric(IntermediateMetricResult::Sum(
            IntermediateSum::default(),
        )),
        Percentiles(_) => IntermediateAggregationResult::Metric(
            IntermediateMetricResult::Percentiles(PercentilesCollector::default()),
        ),
-        TopHits(_) => IntermediateAggregationResult::Metric(IntermediateMetricResult::TopHits(
-            TopHitsCollector::default(),
-        )),
+        TopHits(ref req) => IntermediateAggregationResult::Metric(
+            IntermediateMetricResult::TopHits(TopHitsTopNComputer::new(req)),
+        ),
    }
 }

@@ -282,10 +285,12 @@ pub enum IntermediateMetricResult {
    Min(IntermediateMin),
    /// Intermediate stats result.
    Stats(IntermediateStats),
+    /// Intermediate stats result.
+    ExtendedStats(IntermediateExtendedStats),
    /// Intermediate sum result.
    Sum(IntermediateSum),
    /// Intermediate top_hits result
-    TopHits(TopHitsCollector),
+    TopHits(TopHitsTopNComputer),
 }

 impl IntermediateMetricResult {
@@ -306,6 +311,9 @@ impl IntermediateMetricResult {
            IntermediateMetricResult::Stats(intermediate_stats) => {
                MetricResult::Stats(intermediate_stats.finalize())
            }
+            IntermediateMetricResult::ExtendedStats(intermediate_stats) => {
+                MetricResult::ExtendedStats(intermediate_stats.finalize())
+            }
            IntermediateMetricResult::Sum(intermediate_sum) => {
                MetricResult::Sum(intermediate_sum.finalize().into())
            }
@@ -314,7 +322,7 @@ impl IntermediateMetricResult {
                    .into_final_result(req.agg.as_percentile().expect("unexpected metric type")),
            ),
            IntermediateMetricResult::TopHits(top_hits) => {
-                MetricResult::TopHits(top_hits.finalize())
+                MetricResult::TopHits(top_hits.into_final_result())
            }
        }
    }
@@ -346,6 +354,12 @@ impl IntermediateMetricResult {
            ) => {
                stats_left.merge_fruits(stats_right);
            }
+            (
+                IntermediateMetricResult::ExtendedStats(extended_stats_left),
+                IntermediateMetricResult::ExtendedStats(extended_stats_right),
+            ) => {
+                extended_stats_left.merge_fruits(extended_stats_right);
+            }
            (IntermediateMetricResult::Sum(sum_left), IntermediateMetricResult::Sum(sum_right)) => {
                sum_left.merge_fruits(sum_right);
            }
--- a/src/aggregation/metric/extended_stats.rs
+++ b/src/aggregation/metric/extended_stats.rs
--- a/src/aggregation/metric/mod.rs
+++ b/src/aggregation/metric/mod.rs
@@ -18,6 +18,7 @@

 mod average;
 mod count;
+mod extended_stats;
 mod max;
 mod min;
 mod percentiles;
@@ -25,8 +26,11 @@ mod stats;
 mod sum;
 mod top_hits;

+use std::collections::HashMap;
+
 pub use average::*;
 pub use count::*;
+pub use extended_stats::*;
 pub use max::*;
 pub use min::*;
 pub use percentiles::*;
@@ -36,6 +40,8 @@ pub use stats::*;
 pub use sum::*;
 pub use top_hits::*;

+use crate::schema::OwnedValue;
+
 /// Single-metric aggregations use this common result structure.
 ///
 /// Main reason to wrap it in value is to match elasticsearch output structure.
@@ -92,8 +98,9 @@ pub struct TopHitsVecEntry {

    /// Search results, for queries that include field retrieval requests
    /// (`docvalue_fields`).
-    #[serde(flatten)]
-    pub search_results: FieldRetrivalResult,
+    #[serde(rename = "docvalue_fields")]
+    #[serde(skip_serializing_if = "HashMap::is_empty")]
+    pub doc_value_fields: HashMap<String, OwnedValue>,
 }

 /// The top_hits metric aggregation results a list of top hits by sort criteria.
--- a/src/aggregation/metric/percentiles.rs
+++ b/src/aggregation/metric/percentiles.rs
@@ -1,6 +1,5 @@
 use std::fmt::Debug;

-use columnar::ColumnType;
 use serde::{Deserialize, Serialize};

 use super::*;
--- a/src/aggregation/metric/stats.rs
+++ b/src/aggregation/metric/stats.rs
@@ -1,4 +1,5 @@
-use columnar::ColumnType;
+use std::fmt::Debug;
+
 use serde::{Deserialize, Serialize};

 use super::*;
@@ -86,13 +87,15 @@ impl Stats {
 #[derive(Clone, Debug, PartialEq, Serialize, Deserialize)]
 pub struct IntermediateStats {
    /// The number of extracted values.
-    count: u64,
+    pub(crate) count: u64,
    /// The sum of the extracted values.
-    sum: f64,
+    pub(crate) sum: f64,
+    /// delta for sum needed for [Kahan algorithm for summation](https://en.wikipedia.org/wiki/Kahan_summation_algorithm)
+    pub(crate) delta: f64,
    /// The min value.
-    min: f64,
+    pub(crate) min: f64,
    /// The max value.
-    max: f64,
+    pub(crate) max: f64,
 }

 impl Default for IntermediateStats {
@@ -100,6 +103,7 @@ impl Default for IntermediateStats {
        Self {
            count: 0,
            sum: 0.0,
+            delta: 0.0,
            min: f64::MAX,
            max: f64::MIN,
        }
@@ -110,7 +114,13 @@ impl IntermediateStats {
    /// Merges the other stats intermediate result into self.
    pub fn merge_fruits(&mut self, other: IntermediateStats) {
        self.count += other.count;
-        self.sum += other.sum;
+
+        // kahan algorithm for sum
+        let y = other.sum - (self.delta + other.delta);
+        let t = self.sum + y;
+        self.delta = (t - self.sum) - y;
+        self.sum = t;
+
        self.min = self.min.min(other.min);
        self.max = self.max.max(other.max);
    }
@@ -142,9 +152,15 @@ impl IntermediateStats {
    }

    #[inline]
-    fn collect(&mut self, value: f64) {
+    pub(in crate::aggregation::metric) fn collect(&mut self, value: f64) {
        self.count += 1;
-        self.sum += value;
+
+        // kahan algorithm for sum
+        let y = value - self.delta;
+        let t = self.sum + y;
+        self.delta = (t - self.sum) - y;
+        self.sum = t;
+
        self.min = self.min.min(value);
        self.max = self.max.max(value);
    }
@@ -289,7 +305,6 @@ impl SegmentAggregationCollector for SegmentStatsCollector {

 #[cfg(test)]
 mod tests {
-
    use serde_json::Value;

    use crate::aggregation::agg_req::{Aggregation, Aggregations};
--- a/src/aggregation/metric/top_hits.rs
+++ b/src/aggregation/metric/top_hits.rs
@@ -1,7 +1,9 @@
 use std::collections::HashMap;
-use std::fmt::Formatter;
+use std::net::Ipv6Addr;

-use columnar::{ColumnarReader, DynamicColumn};
+use columnar::{Column, ColumnType, ColumnarReader, DynamicColumn};
+use common::json_path_writer::JSON_PATH_SEGMENT_SEP_STR;
+use common::DateTime;
 use regex::Regex;
 use serde::ser::SerializeMap;
 use serde::{Deserialize, Deserializer, Serialize, Serializer};
@@ -12,8 +14,8 @@ use crate::aggregation::intermediate_agg_result::{
    IntermediateAggregationResult, IntermediateMetricResult,
 };
 use crate::aggregation::segment_agg_result::SegmentAggregationCollector;
+use crate::aggregation::AggregationError;
 use crate::collector::TopNComputer;
-use crate::schema::term::JSON_PATH_SEGMENT_SEP_STR;
 use crate::schema::OwnedValue;
 use crate::{DocAddress, DocId, SegmentOrdinal};

@@ -92,53 +94,101 @@ pub struct TopHitsAggregation {
    size: usize,
    from: Option<usize>,

-    #[serde(flatten)]
-    retrieval: RetrievalFields,
-}
-
-const fn default_doc_value_fields() -> Vec<String> {
-    Vec::new()
-}
-
-/// Search query spec for each matched document
-/// TODO: move this to a common module
-#[derive(Debug, Clone, PartialEq, Serialize, Deserialize, Default)]
-pub struct RetrievalFields {
-    /// The fast fields to return for each hit.
-    /// This is the only variant supported for now.
-    /// TODO: support the {field, format} variant for custom formatting.
    #[serde(rename = "docvalue_fields")]
-    #[serde(default = "default_doc_value_fields")]
-    pub doc_value_fields: Vec<String>,
+    #[serde(default)]
+    doc_value_fields: Vec<String>,
+
+    // Not supported
+    _source: Option<serde_json::Value>,
+    fields: Option<serde_json::Value>,
+    script_fields: Option<serde_json::Value>,
+    highlight: Option<serde_json::Value>,
+    explain: Option<serde_json::Value>,
+    version: Option<serde_json::Value>,
 }

-/// Search query result for each matched document
-/// TODO: move this to a common module
-#[derive(Debug, Clone, PartialEq, Serialize, Deserialize, Default)]
-pub struct FieldRetrivalResult {
-    /// The fast fields returned for each hit.
-    #[serde(rename = "docvalue_fields")]
-    #[serde(skip_serializing_if = "HashMap::is_empty")]
-    pub doc_value_fields: HashMap<String, OwnedValue>,
+#[derive(Debug, Clone, PartialEq, Default)]
+struct KeyOrder {
+    field: String,
+    order: Order,
 }

-impl RetrievalFields {
-    fn get_field_names(&self) -> Vec<&str> {
-        self.doc_value_fields.iter().map(|s| s.as_str()).collect()
+impl Serialize for KeyOrder {
+    fn serialize<S: Serializer>(&self, serializer: S) -> Result<S::Ok, S::Error> {
+        let KeyOrder { field, order } = self;
+        let mut map = serializer.serialize_map(Some(1))?;
+        map.serialize_entry(field, order)?;
+        map.end()
    }
+}
+
+impl<'de> Deserialize<'de> for KeyOrder {
+    fn deserialize<D>(deserializer: D) -> Result<Self, D::Error>
+    where D: Deserializer<'de> {
+        let mut key_order = <HashMap<String, Order>>::deserialize(deserializer)?.into_iter();
+        let (field, order) = key_order.next().ok_or(serde::de::Error::custom(
+            "Expected exactly one key-value pair in sort parameter of top_hits, found none",
+        ))?;
+        if key_order.next().is_some() {
+            return Err(serde::de::Error::custom(format!(
+                "Expected exactly one key-value pair in sort parameter of top_hits, found \
+                 {key_order:?}"
+            )));
+        }
+        Ok(Self { field, order })
+    }
+}
+
+// Tranform a glob (`pattern*`, for example) into a regex::Regex (`^pattern.*$`)
+fn globbed_string_to_regex(glob: &str) -> Result<Regex, crate::TantivyError> {
+    // Replace `*` glob with `.*` regex
+    let sanitized = format!("^{}$", regex::escape(glob).replace(r"\*", ".*"));
+    Regex::new(&sanitized.replace('*', ".*")).map_err(|e| {
+        crate::TantivyError::SchemaError(format!("Invalid regex '{glob}' in docvalue_fields: {e}"))
+    })
+}
+
+fn use_doc_value_fields_err(parameter: &str) -> crate::Result<()> {
+    Err(crate::TantivyError::AggregationError(
+        AggregationError::InvalidRequest(format!(
+            "The `{parameter}` parameter is not supported, only `docvalue_fields` is supported in \
+             `top_hits` aggregation"
+        )),
+    ))
+}
+fn unsupported_err(parameter: &str) -> crate::Result<()> {
+    Err(crate::TantivyError::AggregationError(
+        AggregationError::InvalidRequest(format!(
+            "The `{parameter}` parameter is not supported in the `top_hits` aggregation"
+        )),
+    ))
+}
+
+impl TopHitsAggregation {
+    /// Validate and resolve field retrieval parameters
+    pub fn validate_and_resolve_field_names(
+        &mut self,
+        reader: &ColumnarReader,
+    ) -> crate::Result<()> {
+        if self._source.is_some() {
+            use_doc_value_fields_err("_source")?;
+        }
+        if self.fields.is_some() {
+            use_doc_value_fields_err("fields")?;
+        }
+        if self.script_fields.is_some() {
+            use_doc_value_fields_err("script_fields")?;
+        }
+        if self.explain.is_some() {
+            unsupported_err("explain")?;
+        }
+        if self.highlight.is_some() {
+            unsupported_err("highlight")?;
+        }
+        if self.version.is_some() {
+            unsupported_err("version")?;
+        }

-    fn resolve_field_names(&mut self, reader: &ColumnarReader) -> crate::Result<()> {
-        // Tranform a glob (`pattern*`, for example) into a regex::Regex (`^pattern.*$`)
-        let globbed_string_to_regex = |glob: &str| {
-            // Replace `*` glob with `.*` regex
-            let sanitized = format!("^{}$", regex::escape(glob).replace(r"\*", ".*"));
-            Regex::new(&sanitized.replace('*', ".*")).map_err(|e| {
-                crate::TantivyError::SchemaError(format!(
-                    "Invalid regex '{}' in docvalue_fields: {}",
-                    glob, e
-                ))
-            })
-        };
        self.doc_value_fields = self
            .doc_value_fields
            .iter()
@@ -162,8 +212,7 @@ impl RetrievalFields {
                    .collect::<Vec<_>>();
                assert!(
                    !fields.is_empty(),
-                    "No fields matched the glob '{}' in docvalue_fields",
-                    field
+                    "No fields matched the glob '{field}' in docvalue_fields"
                );
                Ok(fields)
            })
@@ -175,121 +224,6 @@ impl RetrievalFields {
        Ok(())
    }

-    fn get_document_field_data(
-        &self,
-        accessors: &HashMap<String, Vec<DynamicColumn>>,
-        doc_id: DocId,
-    ) -> FieldRetrivalResult {
-        let dvf = self
-            .doc_value_fields
-            .iter()
-            .map(|field| {
-                let accessors = accessors
-                    .get(field)
-                    .unwrap_or_else(|| panic!("field '{}' not found in accessors", field));
-
-                let values: Vec<OwnedValue> = accessors
-                    .iter()
-                    .flat_map(|accessor| match accessor {
-                        DynamicColumn::U64(accessor) => accessor
-                            .values_for_doc(doc_id)
-                            .map(OwnedValue::U64)
-                            .collect::<Vec<_>>(),
-                        DynamicColumn::I64(accessor) => accessor
-                            .values_for_doc(doc_id)
-                            .map(OwnedValue::I64)
-                            .collect::<Vec<_>>(),
-                        DynamicColumn::F64(accessor) => accessor
-                            .values_for_doc(doc_id)
-                            .map(OwnedValue::F64)
-                            .collect::<Vec<_>>(),
-                        DynamicColumn::Bytes(accessor) => accessor
-                            .term_ords(doc_id)
-                            .map(|term_ord| {
-                                let mut buffer = vec![];
-                                assert!(
-                                    accessor
-                                        .ord_to_bytes(term_ord, &mut buffer)
-                                        .expect("could not read term dictionary"),
-                                    "term corresponding to term_ord does not exist"
-                                );
-                                OwnedValue::Bytes(buffer)
-                            })
-                            .collect::<Vec<_>>(),
-                        DynamicColumn::Str(accessor) => accessor
-                            .term_ords(doc_id)
-                            .map(|term_ord| {
-                                let mut buffer = vec![];
-                                assert!(
-                                    accessor
-                                        .ord_to_bytes(term_ord, &mut buffer)
-                                        .expect("could not read term dictionary"),
-                                    "term corresponding to term_ord does not exist"
-                                );
-                                OwnedValue::Str(String::from_utf8(buffer).unwrap())
-                            })
-                            .collect::<Vec<_>>(),
-                        DynamicColumn::Bool(accessor) => accessor
-                            .values_for_doc(doc_id)
-                            .map(OwnedValue::Bool)
-                            .collect::<Vec<_>>(),
-                        DynamicColumn::IpAddr(accessor) => accessor
-                            .values_for_doc(doc_id)
-                            .map(OwnedValue::IpAddr)
-                            .collect::<Vec<_>>(),
-                        DynamicColumn::DateTime(accessor) => accessor
-                            .values_for_doc(doc_id)
-                            .map(OwnedValue::Date)
-                            .collect::<Vec<_>>(),
-                    })
-                    .collect();
-
-                (field.to_owned(), OwnedValue::Array(values))
-            })
-            .collect();
-        FieldRetrivalResult {
-            doc_value_fields: dvf,
-        }
-    }
-}
-
-#[derive(Debug, Clone, PartialEq, Default)]
-struct KeyOrder {
-    field: String,
-    order: Order,
-}
-
-impl Serialize for KeyOrder {
-    fn serialize<S: Serializer>(&self, serializer: S) -> Result<S::Ok, S::Error> {
-        let KeyOrder { field, order } = self;
-        let mut map = serializer.serialize_map(Some(1))?;
-        map.serialize_entry(field, order)?;
-        map.end()
-    }
-}
-
-impl<'de> Deserialize<'de> for KeyOrder {
-    fn deserialize<D>(deserializer: D) -> Result<Self, D::Error>
-    where D: Deserializer<'de> {
-        let mut k_o = <HashMap<String, Order>>::deserialize(deserializer)?.into_iter();
-        let (k, v) = k_o.next().ok_or(serde::de::Error::custom(
-            "Expected exactly one key-value pair in KeyOrder, found none",
-        ))?;
-        if k_o.next().is_some() {
-            return Err(serde::de::Error::custom(
-                "Expected exactly one key-value pair in KeyOrder, found more",
-            ));
-        }
-        Ok(Self { field: k, order: v })
-    }
-}
-
-impl TopHitsAggregation {
-    /// Validate and resolve field retrieval parameters
-    pub fn validate_and_resolve(&mut self, reader: &ColumnarReader) -> crate::Result<()> {
-        self.retrieval.resolve_field_names(reader)
-    }
-
    /// Return fields accessed by the aggregator, in order.
    pub fn field_names(&self) -> Vec<&str> {
        self.sort
@@ -300,20 +234,136 @@ impl TopHitsAggregation {

    /// Return fields accessed by the aggregator's value retrieval.
    pub fn value_field_names(&self) -> Vec<&str> {
-        self.retrieval.get_field_names()
+        self.doc_value_fields.iter().map(|s| s.as_str()).collect()
+    }
+
+    fn get_document_field_data(
+        &self,
+        accessors: &HashMap<String, Vec<DynamicColumn>>,
+        doc_id: DocId,
+    ) -> HashMap<String, FastFieldValue> {
+        let doc_value_fields = self
+            .doc_value_fields
+            .iter()
+            .map(|field| {
+                let accessors = accessors
+                    .get(field)
+                    .unwrap_or_else(|| panic!("field '{field}' not found in accessors"));
+
+                let values: Vec<FastFieldValue> = accessors
+                    .iter()
+                    .flat_map(|accessor| match accessor {
+                        DynamicColumn::U64(accessor) => accessor
+                            .values_for_doc(doc_id)
+                            .map(FastFieldValue::U64)
+                            .collect::<Vec<_>>(),
+                        DynamicColumn::I64(accessor) => accessor
+                            .values_for_doc(doc_id)
+                            .map(FastFieldValue::I64)
+                            .collect::<Vec<_>>(),
+                        DynamicColumn::F64(accessor) => accessor
+                            .values_for_doc(doc_id)
+                            .map(FastFieldValue::F64)
+                            .collect::<Vec<_>>(),
+                        DynamicColumn::Bytes(accessor) => accessor
+                            .term_ords(doc_id)
+                            .map(|term_ord| {
+                                let mut buffer = vec![];
+                                assert!(
+                                    accessor
+                                        .ord_to_bytes(term_ord, &mut buffer)
+                                        .expect("could not read term dictionary"),
+                                    "term corresponding to term_ord does not exist"
+                                );
+                                FastFieldValue::Bytes(buffer)
+                            })
+                            .collect::<Vec<_>>(),
+                        DynamicColumn::Str(accessor) => accessor
+                            .term_ords(doc_id)
+                            .map(|term_ord| {
+                                let mut buffer = vec![];
+                                assert!(
+                                    accessor
+                                        .ord_to_bytes(term_ord, &mut buffer)
+                                        .expect("could not read term dictionary"),
+                                    "term corresponding to term_ord does not exist"
+                                );
+                                FastFieldValue::Str(String::from_utf8(buffer).unwrap())
+                            })
+                            .collect::<Vec<_>>(),
+                        DynamicColumn::Bool(accessor) => accessor
+                            .values_for_doc(doc_id)
+                            .map(FastFieldValue::Bool)
+                            .collect::<Vec<_>>(),
+                        DynamicColumn::IpAddr(accessor) => accessor
+                            .values_for_doc(doc_id)
+                            .map(FastFieldValue::IpAddr)
+                            .collect::<Vec<_>>(),
+                        DynamicColumn::DateTime(accessor) => accessor
+                            .values_for_doc(doc_id)
+                            .map(FastFieldValue::Date)
+                            .collect::<Vec<_>>(),
+                    })
+                    .collect();
+
+                (field.to_owned(), FastFieldValue::Array(values))
+            })
+            .collect();
+        doc_value_fields
    }
 }

-/// Holds a single comparable doc feature, and the order in which it should be sorted.
+/// A retrieved value from a fast field.
+#[derive(Debug, Clone, PartialEq, Serialize, Deserialize)]
+pub enum FastFieldValue {
+    /// The str type is used for any text information.
+    Str(String),
+    /// Unsigned 64-bits Integer `u64`
+    U64(u64),
+    /// Signed 64-bits Integer `i64`
+    I64(i64),
+    /// 64-bits Float `f64`
+    F64(f64),
+    /// Bool value
+    Bool(bool),
+    /// Date/time with nanoseconds precision
+    Date(DateTime),
+    /// Arbitrarily sized byte array
+    Bytes(Vec<u8>),
+    /// IpV6 Address. Internally there is no IpV4, it needs to be converted to `Ipv6Addr`.
+    IpAddr(Ipv6Addr),
+    /// A list of values.
+    Array(Vec<Self>),
+}
+
+impl From<FastFieldValue> for OwnedValue {
+    fn from(value: FastFieldValue) -> Self {
+        match value {
+            FastFieldValue::Str(s) => OwnedValue::Str(s),
+            FastFieldValue::U64(u) => OwnedValue::U64(u),
+            FastFieldValue::I64(i) => OwnedValue::I64(i),
+            FastFieldValue::F64(f) => OwnedValue::F64(f),
+            FastFieldValue::Bool(b) => OwnedValue::Bool(b),
+            FastFieldValue::Date(d) => OwnedValue::Date(d),
+            FastFieldValue::Bytes(b) => OwnedValue::Bytes(b),
+            FastFieldValue::IpAddr(ip) => OwnedValue::IpAddr(ip),
+            FastFieldValue::Array(a) => {
+                OwnedValue::Array(a.into_iter().map(OwnedValue::from).collect())
+            }
+        }
+    }
+}
+
+/// Holds a fast field value in its u64 representation, and the order in which it should be sorted.
 #[derive(Clone, Serialize, Deserialize, Debug)]
-struct ComparableDocFeature {
-    /// Stores any u64-mappable feature.
+struct DocValueAndOrder {
+    /// A fast field value in its u64 representation.
    value: Option<u64>,
-    /// Sort order for the doc feature
+    /// Sort order for the value
    order: Order,
 }

-impl Ord for ComparableDocFeature {
+impl Ord for DocValueAndOrder {
    fn cmp(&self, other: &Self) -> std::cmp::Ordering {
        let invert = |cmp: std::cmp::Ordering| match self.order {
            Order::Asc => cmp,
@@ -329,26 +379,32 @@ impl Ord for ComparableDocFeature {
    }
 }

-impl PartialOrd for ComparableDocFeature {
+impl PartialOrd for DocValueAndOrder {
    fn partial_cmp(&self, other: &Self) -> Option<std::cmp::Ordering> {
        Some(self.cmp(other))
    }
 }

-impl PartialEq for ComparableDocFeature {
+impl PartialEq for DocValueAndOrder {
    fn eq(&self, other: &Self) -> bool {
        self.value.cmp(&other.value) == std::cmp::Ordering::Equal
    }
 }

-impl Eq for ComparableDocFeature {}
+impl Eq for DocValueAndOrder {}

 #[derive(Clone, Serialize, Deserialize, Debug)]
-struct ComparableDocFeatures(Vec<ComparableDocFeature>, FieldRetrivalResult);
+struct DocSortValuesAndFields {
+    sorts: Vec<DocValueAndOrder>,

-impl Ord for ComparableDocFeatures {
+    #[serde(rename = "docvalue_fields")]
+    #[serde(skip_serializing_if = "HashMap::is_empty")]
+    doc_value_fields: HashMap<String, FastFieldValue>,
+}
+
+impl Ord for DocSortValuesAndFields {
    fn cmp(&self, other: &Self) -> std::cmp::Ordering {
-        for (self_feature, other_feature) in self.0.iter().zip(other.0.iter()) {
+        for (self_feature, other_feature) in self.sorts.iter().zip(other.sorts.iter()) {
            let cmp = self_feature.cmp(other_feature);
            if cmp != std::cmp::Ordering::Equal {
                return cmp;
@@ -358,53 +414,43 @@ impl Ord for ComparableDocFeatures {
    }
 }

-impl PartialOrd for ComparableDocFeatures {
+impl PartialOrd for DocSortValuesAndFields {
    fn partial_cmp(&self, other: &Self) -> Option<std::cmp::Ordering> {
        Some(self.cmp(other))
    }
 }

-impl PartialEq for ComparableDocFeatures {
+impl PartialEq for DocSortValuesAndFields {
    fn eq(&self, other: &Self) -> bool {
        self.cmp(other) == std::cmp::Ordering::Equal
    }
 }

-impl Eq for ComparableDocFeatures {}
+impl Eq for DocSortValuesAndFields {}

 /// The TopHitsCollector used for collecting over segments and merging results.
-#[derive(Clone, Serialize, Deserialize)]
-pub struct TopHitsCollector {
+#[derive(Clone, Serialize, Deserialize, Debug)]
+pub struct TopHitsTopNComputer {
    req: TopHitsAggregation,
-    top_n: TopNComputer<ComparableDocFeatures, DocAddress, false>,
+    top_n: TopNComputer<DocSortValuesAndFields, DocAddress, false>,
 }

-impl Default for TopHitsCollector {
-    fn default() -> Self {
-        Self {
-            req: TopHitsAggregation::default(),
-            top_n: TopNComputer::new(1),
-        }
-    }
-}
-
-impl std::fmt::Debug for TopHitsCollector {
-    fn fmt(&self, f: &mut Formatter<'_>) -> std::fmt::Result {
-        f.debug_struct("TopHitsCollector")
-            .field("req", &self.req)
-            .field("top_n_threshold", &self.top_n.threshold)
-            .finish()
-    }
-}
-
-impl std::cmp::PartialEq for TopHitsCollector {
+impl std::cmp::PartialEq for TopHitsTopNComputer {
    fn eq(&self, _other: &Self) -> bool {
        false
    }
 }

-impl TopHitsCollector {
-    fn collect(&mut self, features: ComparableDocFeatures, doc: DocAddress) {
+impl TopHitsTopNComputer {
+    /// Create a new TopHitsCollector
+    pub fn new(req: &TopHitsAggregation) -> Self {
+        Self {
+            top_n: TopNComputer::new(req.size + req.from.unwrap_or(0)),
+            req: req.clone(),
+        }
+    }
+
+    fn collect(&mut self, features: DocSortValuesAndFields, doc: DocAddress) {
        self.top_n.push(features, doc);
    }

@@ -416,14 +462,19 @@ impl TopHitsCollector {
    }

    /// Finalize by converting self into the final result form
-    pub fn finalize(self) -> TopHitsMetricResult {
+    pub fn into_final_result(self) -> TopHitsMetricResult {
        let mut hits: Vec<TopHitsVecEntry> = self
            .top_n
            .into_sorted_vec()
            .into_iter()
            .map(|doc| TopHitsVecEntry {
-                sort: doc.feature.0.iter().map(|f| f.value).collect(),
-                search_results: doc.feature.1,
+                sort: doc.feature.sorts.iter().map(|f| f.value).collect(),
+                doc_value_fields: doc
+                    .feature
+                    .doc_value_fields
+                    .into_iter()
+                    .map(|(k, v)| (k, v.into()))
+                    .collect(),
            })
            .collect();

@@ -436,64 +487,55 @@ impl TopHitsCollector {
    }
 }

-#[derive(Clone)]
-pub(crate) struct SegmentTopHitsCollector {
+#[derive(Clone, Debug)]
+pub(crate) struct TopHitsSegmentCollector {
    segment_ordinal: SegmentOrdinal,
    accessor_idx: usize,
-    inner_collector: TopHitsCollector,
+    top_n: TopNComputer<Vec<DocValueAndOrder>, DocAddress, false>,
 }

-impl SegmentTopHitsCollector {
+impl TopHitsSegmentCollector {
    pub fn from_req(
        req: &TopHitsAggregation,
        accessor_idx: usize,
        segment_ordinal: SegmentOrdinal,
    ) -> Self {
        Self {
-            inner_collector: TopHitsCollector {
-                req: req.clone(),
-                top_n: TopNComputer::new(req.size + req.from.unwrap_or(0)),
-            },
+            top_n: TopNComputer::new(req.size + req.from.unwrap_or(0)),
            segment_ordinal,
            accessor_idx,
        }
    }
-}
+    fn into_top_hits_collector(
+        self,
+        value_accessors: &HashMap<String, Vec<DynamicColumn>>,
+        req: &TopHitsAggregation,
+    ) -> TopHitsTopNComputer {
+        let mut top_hits_computer = TopHitsTopNComputer::new(req);
+        let top_results = self.top_n.into_vec();

-impl std::fmt::Debug for SegmentTopHitsCollector {
-    fn fmt(&self, f: &mut Formatter<'_>) -> std::fmt::Result {
-        f.debug_struct("SegmentTopHitsCollector")
-            .field("segment_id", &self.segment_ordinal)
-            .field("accessor_idx", &self.accessor_idx)
-            .field("inner_collector", &self.inner_collector)
-            .finish()
-    }
-}
+        for res in top_results {
+            let doc_value_fields = req.get_document_field_data(value_accessors, res.doc.doc_id);
+            top_hits_computer.collect(
+                DocSortValuesAndFields {
+                    sorts: res.feature,
+                    doc_value_fields,
+                },
+                res.doc,
+            );
+        }

-impl SegmentAggregationCollector for SegmentTopHitsCollector {
-    fn add_intermediate_aggregation_result(
-        self: Box<Self>,
-        agg_with_accessor: &crate::aggregation::agg_req_with_accessor::AggregationsWithAccessor,
-        results: &mut crate::aggregation::intermediate_agg_result::IntermediateAggregationResults,
-    ) -> crate::Result<()> {
-        let name = agg_with_accessor.aggs.keys[self.accessor_idx].to_string();
-        let intermediate_result = IntermediateMetricResult::TopHits(self.inner_collector);
-        results.push(
-            name,
-            IntermediateAggregationResult::Metric(intermediate_result),
-        )
+        top_hits_computer
    }

-    fn collect(
+    /// TODO add a specialized variant for a single sort field
+    fn collect_with(
        &mut self,
        doc_id: crate::DocId,
-        agg_with_accessor: &mut crate::aggregation::agg_req_with_accessor::AggregationsWithAccessor,
+        req: &TopHitsAggregation,
+        accessors: &[(Column<u64>, ColumnType)],
    ) -> crate::Result<()> {
-        let accessors = &agg_with_accessor.aggs.values[self.accessor_idx].accessors;
-        let value_accessors = &agg_with_accessor.aggs.values[self.accessor_idx].value_accessors;
-        let features: Vec<ComparableDocFeature> = self
-            .inner_collector
-            .req
+        let sorts: Vec<DocValueAndOrder> = req
            .sort
            .iter()
            .enumerate()
@@ -505,18 +547,12 @@ impl SegmentAggregationCollector for SegmentTopHitsCollector {
                    .0
                    .values_for_doc(doc_id)
                    .next();
-                ComparableDocFeature { value, order }
+                DocValueAndOrder { value, order }
            })
            .collect();

-        let retrieval_result = self
-            .inner_collector
-            .req
-            .retrieval
-            .get_document_field_data(value_accessors, doc_id);
-
-        self.inner_collector.collect(
-            ComparableDocFeatures(features, retrieval_result),
+        self.top_n.push(
+            sorts,
            DocAddress {
                segment_ord: self.segment_ordinal,
                doc_id,
@@ -524,19 +560,62 @@ impl SegmentAggregationCollector for SegmentTopHitsCollector {
        );
        Ok(())
    }
+}
+
+impl SegmentAggregationCollector for TopHitsSegmentCollector {
+    fn add_intermediate_aggregation_result(
+        self: Box<Self>,
+        agg_with_accessor: &crate::aggregation::agg_req_with_accessor::AggregationsWithAccessor,
+        results: &mut crate::aggregation::intermediate_agg_result::IntermediateAggregationResults,
+    ) -> crate::Result<()> {
+        let name = agg_with_accessor.aggs.keys[self.accessor_idx].to_string();
+
+        let value_accessors = &agg_with_accessor.aggs.values[self.accessor_idx].value_accessors;
+        let tophits_req = &agg_with_accessor.aggs.values[self.accessor_idx]
+            .agg
+            .agg
+            .as_top_hits()
+            .expect("aggregation request must be of type top hits");
+
+        let intermediate_result = IntermediateMetricResult::TopHits(
+            self.into_top_hits_collector(value_accessors, tophits_req),
+        );
+        results.push(
+            name,
+            IntermediateAggregationResult::Metric(intermediate_result),
+        )
+    }
+
+    /// TODO: Consider a caching layer to reduce the call overhead
+    fn collect(
+        &mut self,
+        doc_id: crate::DocId,
+        agg_with_accessor: &mut crate::aggregation::agg_req_with_accessor::AggregationsWithAccessor,
+    ) -> crate::Result<()> {
+        let tophits_req = &agg_with_accessor.aggs.values[self.accessor_idx]
+            .agg
+            .agg
+            .as_top_hits()
+            .expect("aggregation request must be of type top hits");
+        let accessors = &agg_with_accessor.aggs.values[self.accessor_idx].accessors;
+        self.collect_with(doc_id, tophits_req, accessors)?;
+        Ok(())
+    }

    fn collect_block(
        &mut self,
        docs: &[crate::DocId],
        agg_with_accessor: &mut crate::aggregation::agg_req_with_accessor::AggregationsWithAccessor,
    ) -> crate::Result<()> {
-        // TODO: Consider getting fields with the column block accessor and refactor this.
-        // ---
-        // Would the additional complexity of getting fields with the column_block_accessor
-        // make sense here? Probably yes, but I want to get a first-pass review first
-        // before proceeding.
+        let tophits_req = &agg_with_accessor.aggs.values[self.accessor_idx]
+            .agg
+            .agg
+            .as_top_hits()
+            .expect("aggregation request must be of type top hits");
+        let accessors = &agg_with_accessor.aggs.values[self.accessor_idx].accessors;
+        // TODO: Consider getting fields with the column block accessor.
        for doc in docs {
-            self.collect(*doc, agg_with_accessor)?;
+            self.collect_with(*doc, tophits_req, accessors)?;
        }
        Ok(())
    }
@@ -549,7 +628,7 @@ mod tests {
    use serde_json::Value;
    use time::macros::datetime;

-    use super::{ComparableDocFeature, ComparableDocFeatures, Order};
+    use super::{DocSortValuesAndFields, DocValueAndOrder, Order};
    use crate::aggregation::agg_req::Aggregations;
    use crate::aggregation::agg_result::AggregationResults;
    use crate::aggregation::bucket::tests::get_test_index_from_docs;
@@ -557,44 +636,44 @@ mod tests {
    use crate::aggregation::AggregationCollector;
    use crate::collector::ComparableDoc;
    use crate::query::AllQuery;
-    use crate::schema::OwnedValue as SchemaValue;
+    use crate::schema::OwnedValue;

-    fn invert_order(cmp_feature: ComparableDocFeature) -> ComparableDocFeature {
-        let ComparableDocFeature { value, order } = cmp_feature;
+    fn invert_order(cmp_feature: DocValueAndOrder) -> DocValueAndOrder {
+        let DocValueAndOrder { value, order } = cmp_feature;
        let order = match order {
            Order::Asc => Order::Desc,
            Order::Desc => Order::Asc,
        };
-        ComparableDocFeature { value, order }
+        DocValueAndOrder { value, order }
    }

-    fn collector_with_capacity(capacity: usize) -> super::TopHitsCollector {
-        super::TopHitsCollector {
+    fn collector_with_capacity(capacity: usize) -> super::TopHitsTopNComputer {
+        super::TopHitsTopNComputer {
            top_n: super::TopNComputer::new(capacity),
-            ..Default::default()
+            req: Default::default(),
        }
    }

-    fn invert_order_features(cmp_features: ComparableDocFeatures) -> ComparableDocFeatures {
-        let ComparableDocFeatures(cmp_features, search_results) = cmp_features;
-        let cmp_features = cmp_features
+    fn invert_order_features(mut cmp_features: DocSortValuesAndFields) -> DocSortValuesAndFields {
+        cmp_features.sorts = cmp_features
+            .sorts
            .into_iter()
            .map(invert_order)
            .collect::<Vec<_>>();
-        ComparableDocFeatures(cmp_features, search_results)
+        cmp_features
    }

    #[test]
    fn test_comparable_doc_feature() -> crate::Result<()> {
-        let small = ComparableDocFeature {
+        let small = DocValueAndOrder {
            value: Some(1),
            order: Order::Asc,
        };
-        let big = ComparableDocFeature {
+        let big = DocValueAndOrder {
            value: Some(2),
            order: Order::Asc,
        };
-        let none = ComparableDocFeature {
+        let none = DocValueAndOrder {
            value: None,
            order: Order::Asc,
        };
@@ -616,21 +695,21 @@ mod tests {

    #[test]
    fn test_comparable_doc_features() -> crate::Result<()> {
-        let features_1 = ComparableDocFeatures(
-            vec![ComparableDocFeature {
+        let features_1 = DocSortValuesAndFields {
+            sorts: vec![DocValueAndOrder {
                value: Some(1),
                order: Order::Asc,
            }],
-            Default::default(),
-        );
+            doc_value_fields: Default::default(),
+        };

-        let features_2 = ComparableDocFeatures(
-            vec![ComparableDocFeature {
+        let features_2 = DocSortValuesAndFields {
+            sorts: vec![DocValueAndOrder {
                value: Some(2),
                order: Order::Asc,
            }],
-            Default::default(),
-        );
+            doc_value_fields: Default::default(),
+        };

        assert!(features_1 < features_2);

@@ -689,39 +768,39 @@ mod tests {
                    segment_ord: 0,
                    doc_id: 0,
                },
-                feature: ComparableDocFeatures(
-                    vec![ComparableDocFeature {
+                feature: DocSortValuesAndFields {
+                    sorts: vec![DocValueAndOrder {
                        value: Some(1),
                        order: Order::Asc,
                    }],
-                    Default::default(),
-                ),
+                    doc_value_fields: Default::default(),
+                },
            },
            ComparableDoc {
                doc: crate::DocAddress {
                    segment_ord: 0,
                    doc_id: 2,
                },
-                feature: ComparableDocFeatures(
-                    vec![ComparableDocFeature {
+                feature: DocSortValuesAndFields {
+                    sorts: vec![DocValueAndOrder {
                        value: Some(3),
                        order: Order::Asc,
                    }],
-                    Default::default(),
-                ),
+                    doc_value_fields: Default::default(),
+                },
            },
            ComparableDoc {
                doc: crate::DocAddress {
                    segment_ord: 0,
                    doc_id: 1,
                },
-                feature: ComparableDocFeatures(
-                    vec![ComparableDocFeature {
+                feature: DocSortValuesAndFields {
+                    sorts: vec![DocValueAndOrder {
                        value: Some(5),
                        order: Order::Asc,
                    }],
-                    Default::default(),
-                ),
+                    doc_value_fields: Default::default(),
+                },
            },
        ];

@@ -730,23 +809,23 @@ mod tests {
            collector.collect(doc.feature, doc.doc);
        }

-        let res = collector.finalize();
+        let res = collector.into_final_result();

        assert_eq!(
            res,
            super::TopHitsMetricResult {
                hits: vec![
                    super::TopHitsVecEntry {
-                        sort: vec![docs[0].feature.0[0].value],
-                        search_results: Default::default(),
+                        sort: vec![docs[0].feature.sorts[0].value],
+                        doc_value_fields: Default::default(),
                    },
                    super::TopHitsVecEntry {
-                        sort: vec![docs[1].feature.0[0].value],
-                        search_results: Default::default(),
+                        sort: vec![docs[1].feature.sorts[0].value],
+                        doc_value_fields: Default::default(),
                    },
                    super::TopHitsVecEntry {
-                        sort: vec![docs[2].feature.0[0].value],
-                        search_results: Default::default(),
+                        sort: vec![docs[2].feature.sorts[0].value],
+                        doc_value_fields: Default::default(),
                    },
                ]
            }
@@ -803,7 +882,7 @@ mod tests {
                    {
                        "sort": [common::i64_to_u64(date_2017.unix_timestamp_nanos() as i64)],
                        "docvalue_fields": {
-                            "date": [ SchemaValue::Date(DateTime::from_utc(date_2017)) ],
+                            "date": [ OwnedValue::Date(DateTime::from_utc(date_2017)) ],
                            "text": [ "ccc" ],
                            "text2": [ "ddd" ],
                            "mixed.dyn_arr": [ 3, "4" ],
@@ -812,7 +891,7 @@ mod tests {
                    {
                        "sort": [common::i64_to_u64(date_2016.unix_timestamp_nanos() as i64)],
                        "docvalue_fields": {
-                            "date": [ SchemaValue::Date(DateTime::from_utc(date_2016)) ],
+                            "date": [ OwnedValue::Date(DateTime::from_utc(date_2016)) ],
                            "text": [ "aaa" ],
                            "text2": [ "bbb" ],
                            "mixed.dyn_arr": [ 6, "7" ],
--- a/src/aggregation/mod.rs
+++ b/src/aggregation/mod.rs
@@ -143,8 +143,6 @@ use std::fmt::Display;
 #[cfg(test)]
 mod agg_tests;

-mod agg_bench;
-
 use core::fmt;

 pub use agg_limits::AggregationLimits;
@@ -160,15 +158,14 @@ use serde::de::{self, Visitor};
 use serde::{Deserialize, Deserializer, Serialize};

 fn parse_str_into_f64<E: de::Error>(value: &str) -> Result<f64, E> {
-    let parsed = value.parse::<f64>().map_err(|_err| {
-        de::Error::custom(format!("Failed to parse f64 from string: {:?}", value))
-    })?;
+    let parsed = value
+        .parse::<f64>()
+        .map_err(|_err| de::Error::custom(format!("Failed to parse f64 from string: {value:?}")))?;

    // Check if the parsed value is NaN or infinity
    if parsed.is_nan() || parsed.is_infinite() {
        Err(de::Error::custom(format!(
-            "Value is not a valid f64 (NaN or Infinity): {:?}",
-            value
+            "Value is not a valid f64 (NaN or Infinity): {value:?}"
        )))
    } else {
        Ok(parsed)
@@ -417,7 +414,6 @@ mod tests {
    use time::OffsetDateTime;

    use super::agg_req::Aggregations;
-    use super::segment_agg_result::AggregationLimits;
    use super::*;
    use crate::indexer::NoMergePolicy;
    use crate::query::{AllQuery, TermQuery};
--- a/src/aggregation/segment_agg_result.rs
+++ b/src/aggregation/segment_agg_result.rs
@@ -11,12 +11,12 @@ use super::agg_req_with_accessor::{AggregationWithAccessor, AggregationsWithAcce
 use super::bucket::{SegmentHistogramCollector, SegmentRangeCollector, SegmentTermCollector};
 use super::intermediate_agg_result::IntermediateAggregationResults;
 use super::metric::{
-    AverageAggregation, CountAggregation, MaxAggregation, MinAggregation,
+    AverageAggregation, CountAggregation, ExtendedStatsAggregation, MaxAggregation, MinAggregation,
    SegmentPercentilesCollector, SegmentStatsCollector, SegmentStatsType, StatsAggregation,
    SumAggregation,
 };
 use crate::aggregation::bucket::TermMissingAgg;
-use crate::aggregation::metric::SegmentTopHitsCollector;
+use crate::aggregation::metric::{SegmentExtendedStatsCollector, TopHitsSegmentCollector};

 pub(crate) trait SegmentAggregationCollector: CollectorClone + Debug {
    fn add_intermediate_aggregation_result(
@@ -148,6 +148,9 @@ pub(crate) fn build_single_agg_segment_collector(
            accessor_idx,
            *missing,
        ))),
+        ExtendedStats(ExtendedStatsAggregation { missing, sigma, .. }) => Ok(Box::new(
+            SegmentExtendedStatsCollector::from_req(req.field_type, *sigma, accessor_idx, *missing),
+        )),
        Sum(SumAggregation { missing, .. }) => Ok(Box::new(SegmentStatsCollector::from_req(
            req.field_type,
            SegmentStatsType::Sum,
@@ -161,7 +164,7 @@ pub(crate) fn build_single_agg_segment_collector(
                accessor_idx,
            )?,
        )),
-        TopHits(top_hits_req) => Ok(Box::new(SegmentTopHitsCollector::from_req(
+        TopHits(top_hits_req) => Ok(Box::new(TopHitsSegmentCollector::from_req(
            top_hits_req,
            accessor_idx,
            req.segment_ordinal,
--- a/src/collector/facet_collector.rs
+++ b/src/collector/facet_collector.rs
@@ -1,7 +1,7 @@
 use std::cmp::Ordering;
 use std::collections::{btree_map, BTreeMap, BTreeSet, BinaryHeap};
+use std::io;
 use std::ops::Bound;
-use std::{io, u64, usize};

 use crate::collector::{Collector, SegmentCollector};
 use crate::fastfield::FacetReader;
@@ -598,7 +598,7 @@ mod tests {
                let mid = n % 4;
                n /= 4;
                let leaf = n % 5;
-                Facet::from(&format!("/top{}/mid{}/leaf{}", top, mid, leaf))
+                Facet::from(&format!("/top{top}/mid{mid}/leaf{leaf}"))
            })
            .collect();
        for i in 0..num_facets * 10 {
@@ -737,7 +737,7 @@ mod tests {
            vec![("a", 10), ("b", 100), ("c", 7), ("d", 12), ("e", 21)]
                .into_iter()
                .flat_map(|(c, count)| {
-                    let facet = Facet::from(&format!("/facet/{}", c));
+                    let facet = Facet::from(&format!("/facet/{c}"));
                    let doc = doc!(facet_field => facet);
                    iter::repeat(doc).take(count)
                })
@@ -785,7 +785,7 @@ mod tests {
        let docs: Vec<TantivyDocument> = vec![("b", 2), ("a", 2), ("c", 4)]
            .into_iter()
            .flat_map(|(c, count)| {
-                let facet = Facet::from(&format!("/facet/{}", c));
+                let facet = Facet::from(&format!("/facet/{c}"));
                let doc = doc!(facet_field => facet);
                iter::repeat(doc).take(count)
            })
--- a/src/collector/histogram_collector.rs
+++ b/src/collector/histogram_collector.rs
@@ -160,7 +160,7 @@ mod tests {
    use super::{add_vecs, HistogramCollector, HistogramComputer};
    use crate::schema::{Schema, FAST};
    use crate::time::{Date, Month};
-    use crate::{doc, query, DateTime, Index};
+    use crate::{query, DateTime, Index};

    #[test]
    fn test_add_histograms_simple() {
--- a/src/collector/tests.rs
+++ b/src/collector/tests.rs
@@ -1,15 +1,11 @@
 use columnar::{BytesColumn, Column};

 use super::*;
-use crate::collector::{Count, FilterCollector, TopDocs};
-use crate::index::SegmentReader;
 use crate::query::{AllQuery, QueryParser};
 use crate::schema::{Schema, FAST, TEXT};
 use crate::time::format_description::well_known::Rfc3339;
 use crate::time::OffsetDateTime;
-use crate::{
-    doc, DateTime, DocAddress, DocId, Index, Score, Searcher, SegmentOrdinal, TantivyDocument,
-};
+use crate::{DateTime, DocAddress, Index, Searcher, TantivyDocument};

 pub const TEST_COLLECTOR_WITH_SCORE: TestCollector = TestCollector {
    compute_score: true,
--- a/src/collector/top_collector.rs
+++ b/src/collector/top_collector.rs
@@ -4,7 +4,8 @@ use std::marker::PhantomData;
 use serde::{Deserialize, Serialize};

 use super::top_score_collector::TopNComputer;
-use crate::{DocAddress, DocId, SegmentOrdinal, SegmentReader};
+use crate::index::SegmentReader;
+use crate::{DocAddress, DocId, SegmentOrdinal};

 /// Contains a feature (field, score, etc.) of a document along with the document address.
 ///
--- a/src/collector/top_score_collector.rs
+++ b/src/collector/top_score_collector.rs
@@ -732,6 +732,19 @@ pub struct TopNComputer<Score, D, const REVERSE_ORDER: bool = true> {
    top_n: usize,
    pub(crate) threshold: Option<Score>,
 }
+
+impl<Score: std::fmt::Debug, D, const REVERSE_ORDER: bool> std::fmt::Debug
+    for TopNComputer<Score, D, REVERSE_ORDER>
+{
+    fn fmt(&self, f: &mut fmt::Formatter<'_>) -> std::fmt::Result {
+        f.debug_struct("TopNComputer")
+            .field("buffer_len", &self.buffer.len())
+            .field("top_n", &self.top_n)
+            .field("current_threshold", &self.threshold)
+            .finish()
+    }
+}
+
 // Intermediate struct for TopNComputer for deserialization, to keep vec capacity
 #[derive(Deserialize)]
 struct TopNComputerDeser<Score, D, const REVERSE_ORDER: bool> {
@@ -858,7 +871,10 @@ mod tests {
    use crate::schema::{Field, Schema, FAST, STORED, TEXT};
    use crate::time::format_description::well_known::Rfc3339;
    use crate::time::OffsetDateTime;
-    use crate::{DateTime, DocAddress, DocId, Index, IndexWriter, Order, Score, SegmentReader};
+    use crate::{
+        assert_nearly_equals, DateTime, DocAddress, DocId, Index, IndexWriter, Order, Score,
+        SegmentReader,
+    };

    fn make_index() -> crate::Result<Index> {
        let mut schema_builder = Schema::builder();
--- a/src/core/executor.rs
+++ b/src/core/executor.rs
@@ -1,19 +1,25 @@
-use rayon::{ThreadPool, ThreadPoolBuilder};
+use std::sync::Arc;
+
+#[cfg(feature = "quickwit")]
+use futures_util::{future::Either, FutureExt};

 use crate::TantivyError;

-/// Search executor whether search request are single thread or multithread.
-///
-/// We don't expose Rayon thread pool directly here for several reasons.
-///
-/// First dependency hell. It is not a good idea to expose the
-/// API of a dependency, knowing it might conflict with a different version
-/// used by the client. Second, we may stop using rayon in the future.
+/// Executor makes it possible to run tasks in single thread or
+/// in a thread pool.
+#[derive(Clone)]
 pub enum Executor {
    /// Single thread variant of an Executor
    SingleThread,
    /// Thread pool variant of an Executor
-    ThreadPool(ThreadPool),
+    ThreadPool(Arc<rayon::ThreadPool>),
+}
+
+#[cfg(feature = "quickwit")]
+impl From<Arc<rayon::ThreadPool>> for Executor {
+    fn from(thread_pool: Arc<rayon::ThreadPool>) -> Self {
+        Executor::ThreadPool(thread_pool)
+    }
 }

 impl Executor {
@@ -24,11 +30,11 @@ impl Executor {

    /// Creates an Executor that dispatches the tasks in a thread pool.
    pub fn multi_thread(num_threads: usize, prefix: &'static str) -> crate::Result<Executor> {
-        let pool = ThreadPoolBuilder::new()
+        let pool = rayon::ThreadPoolBuilder::new()
            .num_threads(num_threads)
            .thread_name(move |num| format!("{prefix}{num}"))
            .build()?;
-        Ok(Executor::ThreadPool(pool))
+        Ok(Executor::ThreadPool(Arc::new(pool)))
    }

    /// Perform a map in the thread pool.
@@ -91,11 +97,36 @@ impl Executor {
            }
        }
    }
+
+    /// Spawn a task on the pool, returning a future completing on task success.
+    ///
+    /// If the task panic, returns `Err(())`.
+    #[cfg(feature = "quickwit")]
+    pub fn spawn_blocking<T: Send + 'static>(
+        &self,
+        cpu_intensive_task: impl FnOnce() -> T + Send + 'static,
+    ) -> impl std::future::Future<Output = Result<T, ()>> {
+        match self {
+            Executor::SingleThread => Either::Left(std::future::ready(Ok(cpu_intensive_task()))),
+            Executor::ThreadPool(pool) => {
+                let (sender, receiver) = oneshot::channel();
+                pool.spawn(|| {
+                    if sender.is_closed() {
+                        return;
+                    }
+                    let task_result = cpu_intensive_task();
+                    let _ = sender.send(task_result);
+                });
+
+                let res = receiver.map(|res| res.map_err(|_| ()));
+                Either::Right(res)
+            }
+        }
+    }
 }

 #[cfg(test)]
 mod tests {
-
    use super::Executor;

    #[test]
@@ -147,4 +178,62 @@ mod tests {
            assert_eq!(result[i], i * 2);
        }
    }
+
+    #[cfg(feature = "quickwit")]
+    #[test]
+    fn test_cancel_cpu_intensive_tasks() {
+        use std::sync::atomic::{AtomicU64, Ordering};
+        use std::sync::Arc;
+
+        let counter: Arc<AtomicU64> = Default::default();
+
+        let other_counter: Arc<AtomicU64> = Default::default();
+
+        let mut futures = Vec::new();
+        let mut other_futures = Vec::new();
+
+        let (tx, rx) = crossbeam_channel::bounded::<()>(0);
+        let rx = Arc::new(rx);
+        let executor = Executor::multi_thread(3, "search-test").unwrap();
+        for _ in 0..1000 {
+            let counter_clone: Arc<AtomicU64> = counter.clone();
+            let other_counter_clone: Arc<AtomicU64> = other_counter.clone();
+
+            let rx_clone = rx.clone();
+            let rx_clone2 = rx.clone();
+            let fut = executor.spawn_blocking(move || {
+                counter_clone.fetch_add(1, Ordering::SeqCst);
+                let _ = rx_clone.recv();
+            });
+            futures.push(fut);
+            let other_fut = executor.spawn_blocking(move || {
+                other_counter_clone.fetch_add(1, Ordering::SeqCst);
+                let _ = rx_clone2.recv();
+            });
+            other_futures.push(other_fut);
+        }
+
+        // We execute 100 futures.
+        for _ in 0..100 {
+            tx.send(()).unwrap();
+        }
+
+        let counter_val = counter.load(Ordering::SeqCst);
+        let other_counter_val = other_counter.load(Ordering::SeqCst);
+        assert!(counter_val >= 30);
+        assert!(other_counter_val >= 30);
+
+        drop(other_futures);
+
+        // We execute 100 futures.
+        for _ in 0..100 {
+            tx.send(()).unwrap();
+        }
+
+        let counter_val2 = counter.load(Ordering::SeqCst);
+        assert!(counter_val2 >= counter_val + 100 - 6);
+
+        let other_counter_val2 = other_counter.load(Ordering::SeqCst);
+        assert!(other_counter_val2 <= other_counter_val + 6);
+    }
 }
--- a/src/core/json_utils.rs
+++ b/src/core/json_utils.rs
@@ -1,12 +1,10 @@
-use columnar::MonotonicallyMappableToU64;
+use common::json_path_writer::JSON_PATH_SEGMENT_SEP;
 use common::{replace_in_place, JsonPathWriter};
 use rustc_hash::FxHashMap;

-use crate::fastfield::FastValue;
 use crate::postings::{IndexingContext, IndexingPosition, PostingsWriter};
 use crate::schema::document::{ReferenceValue, ReferenceValueLeaf, Value};
-use crate::schema::term::JSON_PATH_SEGMENT_SEP;
-use crate::schema::{Field, Type, DATE_TIME_PRECISION_INDEXED};
+use crate::schema::Type;
 use crate::time::format_description::well_known::Rfc3339;
 use crate::time::{OffsetDateTime, UtcOffset};
 use crate::tokenizer::TextAnalyzer;
@@ -33,7 +31,7 @@ use crate::{DateTime, DocId, Term};
 /// position 1.
 /// As a result, with lemmatization, "The Smiths" will match our object.
 ///
-/// Worse, if a same term is appears in the second object, a non increasing value would be pushed
+/// Worse, if a same term appears in the second object, a non increasing value would be pushed
 /// to the position recorder probably provoking a panic.
 ///
 /// This problem is solved for regular multivalued object by offsetting the position
@@ -52,7 +50,7 @@ use crate::{DateTime, DocId, Term};
 /// We can therefore afford working with a map that is not imperfect. It is fine if several
 /// path map to the same index position as long as the probability is relatively low.
 #[derive(Default)]
-struct IndexingPositionsPerPath {
+pub(crate) struct IndexingPositionsPerPath {
    positions_per_path: FxHashMap<u32, IndexingPosition>,
 }

@@ -60,6 +58,9 @@ impl IndexingPositionsPerPath {
    fn get_position_from_id(&mut self, id: u32) -> &mut IndexingPosition {
        self.positions_per_path.entry(id).or_default()
    }
+    pub fn clear(&mut self) {
+        self.positions_per_path.clear();
+    }
 }

 /// Convert JSON_PATH_SEGMENT_SEP to a dot.
@@ -70,36 +71,6 @@ pub fn json_path_sep_to_dot(path: &mut str) {
    }
 }

-#[allow(clippy::too_many_arguments)]
-pub(crate) fn index_json_values<'a, V: Value<'a>>(
-    doc: DocId,
-    json_visitors: impl Iterator<Item = crate::Result<V::ObjectIter>>,
-    text_analyzer: &mut TextAnalyzer,
-    expand_dots_enabled: bool,
-    term_buffer: &mut Term,
-    postings_writer: &mut dyn PostingsWriter,
-    json_path_writer: &mut JsonPathWriter,
-    ctx: &mut IndexingContext,
-) -> crate::Result<()> {
-    json_path_writer.clear();
-    json_path_writer.set_expand_dots(expand_dots_enabled);
-    let mut positions_per_path: IndexingPositionsPerPath = Default::default();
-    for json_visitor_res in json_visitors {
-        let json_visitor = json_visitor_res?;
-        index_json_object::<V>(
-            doc,
-            json_visitor,
-            text_analyzer,
-            term_buffer,
-            json_path_writer,
-            postings_writer,
-            ctx,
-            &mut positions_per_path,
-        );
-    }
-    Ok(())
-}
-
 #[allow(clippy::too_many_arguments)]
 fn index_json_object<'a, V: Value<'a>>(
    doc: DocId,
@@ -128,7 +99,7 @@ fn index_json_object<'a, V: Value<'a>>(
 }

 #[allow(clippy::too_many_arguments)]
-fn index_json_value<'a, V: Value<'a>>(
+pub(crate) fn index_json_value<'a, V: Value<'a>>(
    doc: DocId,
    json_value: V,
    text_analyzer: &mut TextAnalyzer,
@@ -168,12 +139,18 @@ fn index_json_value<'a, V: Value<'a>>(
                );
            }
            ReferenceValueLeaf::U64(val) => {
+                // try to parse to i64, since when querying we will apply the same logic and prefer
+                // i64 values
                set_path_id(
                    term_buffer,
                    ctx.path_to_unordered_id
                        .get_or_allocate_unordered_id(json_path_writer.as_str()),
                );
-                term_buffer.append_type_and_fast_value(val);
+                if let Ok(i64_val) = val.try_into() {
+                    term_buffer.append_type_and_fast_value::<i64>(i64_val);
+                } else {
+                    term_buffer.append_type_and_fast_value(val);
+                }
                postings_writer.subscribe(doc, 0u32, term_buffer, ctx);
            }
            ReferenceValueLeaf::I64(val) => {
@@ -256,71 +233,42 @@ fn index_json_value<'a, V: Value<'a>>(
    }
 }

-// Tries to infer a JSON type from a string.
-pub fn convert_to_fast_value_and_get_term(
-    json_term_writer: &mut JsonTermWriter,
-    phrase: &str,
-) -> Option<Term> {
+/// Tries to infer a JSON type from a string and append it to the term.
+///
+/// The term must be json + JSON path.
+pub fn convert_to_fast_value_and_append_to_json_term(mut term: Term, phrase: &str) -> Option<Term> {
+    assert_eq!(
+        term.value()
+            .as_json_value_bytes()
+            .expect("expecting a Term with a json type and json path")
+            .as_serialized()
+            .len(),
+        0,
+        "JSON value bytes should be empty"
+    );
    if let Ok(dt) = OffsetDateTime::parse(phrase, &Rfc3339) {
        let dt_utc = dt.to_offset(UtcOffset::UTC);
-        return Some(set_fastvalue_and_get_term(
-            json_term_writer,
-            DateTime::from_utc(dt_utc),
-        ));
+        term.append_type_and_fast_value(DateTime::from_utc(dt_utc));
+        return Some(term);
    }
    if let Ok(i64_val) = str::parse::<i64>(phrase) {
-        return Some(set_fastvalue_and_get_term(json_term_writer, i64_val));
+        term.append_type_and_fast_value(i64_val);
+        return Some(term);
    }
    if let Ok(u64_val) = str::parse::<u64>(phrase) {
-        return Some(set_fastvalue_and_get_term(json_term_writer, u64_val));
+        term.append_type_and_fast_value(u64_val);
+        return Some(term);
    }
    if let Ok(f64_val) = str::parse::<f64>(phrase) {
-        return Some(set_fastvalue_and_get_term(json_term_writer, f64_val));
+        term.append_type_and_fast_value(f64_val);
+        return Some(term);
    }
    if let Ok(bool_val) = str::parse::<bool>(phrase) {
-        return Some(set_fastvalue_and_get_term(json_term_writer, bool_val));
+        term.append_type_and_fast_value(bool_val);
+        return Some(term);
    }
    None
 }
-// helper function to generate a Term from a json fastvalue
-pub(crate) fn set_fastvalue_and_get_term<T: FastValue>(
-    json_term_writer: &mut JsonTermWriter,
-    value: T,
-) -> Term {
-    json_term_writer.set_fast_value(value);
-    json_term_writer.term().clone()
-}
-
-// helper function to generate a list of terms with their positions from a textual json value
-pub(crate) fn set_string_and_get_terms(
-    json_term_writer: &mut JsonTermWriter,
-    value: &str,
-    text_analyzer: &mut TextAnalyzer,
-) -> Vec<(usize, Term)> {
-    let mut positions_and_terms = Vec::<(usize, Term)>::new();
-    json_term_writer.close_path_and_set_type(Type::Str);
-    let term_num_bytes = json_term_writer.term_buffer.len_bytes();
-    let mut token_stream = text_analyzer.token_stream(value);
-    token_stream.process(&mut |token| {
-        json_term_writer
-            .term_buffer
-            .truncate_value_bytes(term_num_bytes);
-        json_term_writer
-            .term_buffer
-            .append_bytes(token.text.as_bytes());
-        positions_and_terms.push((token.position, json_term_writer.term().clone()));
-    });
-    positions_and_terms
-}
-
-/// Writes a value of a JSON field to a `Term`.
-/// The Term format is as follows:
-/// `[JSON_TYPE][JSON_PATH][JSON_END_OF_PATH][VALUE_BYTES]`
-pub struct JsonTermWriter<'a> {
-    term_buffer: &'a mut Term,
-    path_stack: Vec<usize>,
-    expand_dots_enabled: bool,
-}

 /// Splits a json path supplied to the query parser in such a way that
 /// `.` can be escaped.
@@ -377,158 +325,48 @@ pub(crate) fn encode_column_name(
    path.into()
 }

-impl<'a> JsonTermWriter<'a> {
-    pub fn from_field_and_json_path(
-        field: Field,
-        json_path: &str,
-        expand_dots_enabled: bool,
-        term_buffer: &'a mut Term,
-    ) -> Self {
-        term_buffer.set_field_and_type(field, Type::Json);
-        let mut json_term_writer = Self::wrap(term_buffer, expand_dots_enabled);
-        for segment in split_json_path(json_path) {
-            json_term_writer.push_path_segment(&segment);
-        }
-        json_term_writer
-    }
-
-    pub fn wrap(term_buffer: &'a mut Term, expand_dots_enabled: bool) -> Self {
-        term_buffer.clear_with_type(Type::Json);
-        let mut path_stack = Vec::with_capacity(10);
-        path_stack.push(0);
-        Self {
-            term_buffer,
-            path_stack,
-            expand_dots_enabled,
-        }
-    }
-
-    fn trim_to_end_of_path(&mut self) {
-        let end_of_path = *self.path_stack.last().unwrap();
-        self.term_buffer.truncate_value_bytes(end_of_path);
-    }
-
-    pub fn close_path_and_set_type(&mut self, typ: Type) {
-        self.trim_to_end_of_path();
-        self.term_buffer.set_json_path_end();
-        self.term_buffer.append_bytes(&[typ.to_code()]);
-    }
-
-    // TODO: Remove this function and use JsonPathWriter instead.
-    pub fn push_path_segment(&mut self, segment: &str) {
-        // the path stack should never be empty.
-        self.trim_to_end_of_path();
-
-        if self.path_stack.len() > 1 {
-            self.term_buffer.set_json_path_separator();
-        }
-        let appended_segment = self.term_buffer.append_bytes(segment.as_bytes());
-        if self.expand_dots_enabled {
-            // We need to replace `.` by JSON_PATH_SEGMENT_SEP.
-            replace_in_place(b'.', JSON_PATH_SEGMENT_SEP, appended_segment);
-        }
-        self.term_buffer.add_json_path_separator();
-        self.path_stack.push(self.term_buffer.len_bytes());
-    }
-
-    pub fn pop_path_segment(&mut self) {
-        self.path_stack.pop();
-        assert!(!self.path_stack.is_empty());
-        self.trim_to_end_of_path();
-    }
-
-    /// Returns the json path of the term being currently built.
-    #[cfg(test)]
-    pub(crate) fn path(&self) -> &[u8] {
-        let end_of_path = self.path_stack.last().cloned().unwrap_or(1);
-        &self.term().serialized_value_bytes()[..end_of_path - 1]
-    }
-
-    pub(crate) fn set_fast_value<T: FastValue>(&mut self, val: T) {
-        self.close_path_and_set_type(T::to_type());
-        let value = if T::to_type() == Type::Date {
-            DateTime::from_u64(val.to_u64())
-                .truncate(DATE_TIME_PRECISION_INDEXED)
-                .to_u64()
-        } else {
-            val.to_u64()
-        };
-        self.term_buffer
-            .append_bytes(value.to_be_bytes().as_slice());
-    }
-
-    pub fn set_str(&mut self, text: &str) {
-        self.close_path_and_set_type(Type::Str);
-        self.term_buffer.append_bytes(text.as_bytes());
-    }
-
-    pub fn term(&self) -> &Term {
-        self.term_buffer
-    }
-}
-
 #[cfg(test)]
 mod tests {
-    use super::{split_json_path, JsonTermWriter};
-    use crate::schema::{Field, Type};
+    use super::split_json_path;
+    use crate::schema::Field;
    use crate::Term;

    #[test]
    fn test_json_writer() {
        let field = Field::from_field_id(1);
-        let mut term = Term::with_type_and_field(Type::Json, field);
-        let mut json_writer = JsonTermWriter::wrap(&mut term, false);
-        json_writer.push_path_segment("attributes");
-        json_writer.push_path_segment("color");
-        json_writer.set_str("red");
+
+        let mut term = Term::from_field_json_path(field, "attributes.color", false);
+        term.append_type_and_str("red");
        assert_eq!(
-            format!("{:?}", json_writer.term()),
+            format!("{term:?}"),
            "Term(field=1, type=Json, path=attributes.color, type=Str, \"red\")"
        );
-        json_writer.set_str("blue");
+
+        let mut term = Term::from_field_json_path(field, "attributes.dimensions.width", false);
+        term.append_type_and_fast_value(400i64);
        assert_eq!(
-            format!("{:?}", json_writer.term()),
-            "Term(field=1, type=Json, path=attributes.color, type=Str, \"blue\")"
-        );
-        json_writer.pop_path_segment();
-        json_writer.push_path_segment("dimensions");
-        json_writer.push_path_segment("width");
-        json_writer.set_fast_value(400i64);
-        assert_eq!(
-            format!("{:?}", json_writer.term()),
+            format!("{term:?}"),
            "Term(field=1, type=Json, path=attributes.dimensions.width, type=I64, 400)"
        );
-        json_writer.pop_path_segment();
-        json_writer.push_path_segment("height");
-        json_writer.set_fast_value(300i64);
-        assert_eq!(
-            format!("{:?}", json_writer.term()),
-            "Term(field=1, type=Json, path=attributes.dimensions.height, type=I64, 300)"
-        );
    }

    #[test]
    fn test_string_term() {
        let field = Field::from_field_id(1);
-        let mut term = Term::with_type_and_field(Type::Json, field);
-        let mut json_writer = JsonTermWriter::wrap(&mut term, false);
-        json_writer.push_path_segment("color");
-        json_writer.set_str("red");
-        assert_eq!(
-            json_writer.term().serialized_term(),
-            b"\x00\x00\x00\x01jcolor\x00sred"
-        )
+        let mut term = Term::from_field_json_path(field, "color", false);
+        term.append_type_and_str("red");
+
+        assert_eq!(term.serialized_term(), b"\x00\x00\x00\x01jcolor\x00sred")
    }

    #[test]
    fn test_i64_term() {
        let field = Field::from_field_id(1);
-        let mut term = Term::with_type_and_field(Type::Json, field);
-        let mut json_writer = JsonTermWriter::wrap(&mut term, false);
-        json_writer.push_path_segment("color");
-        json_writer.set_fast_value(-4i64);
+        let mut term = Term::from_field_json_path(field, "color", false);
+        term.append_type_and_fast_value(-4i64);
+
        assert_eq!(
-            json_writer.term().serialized_term(),
+            term.serialized_term(),
            b"\x00\x00\x00\x01jcolor\x00i\x7f\xff\xff\xff\xff\xff\xff\xfc"
        )
    }
@@ -536,12 +374,11 @@ mod tests {
    #[test]
    fn test_u64_term() {
        let field = Field::from_field_id(1);
-        let mut term = Term::with_type_and_field(Type::Json, field);
-        let mut json_writer = JsonTermWriter::wrap(&mut term, false);
-        json_writer.push_path_segment("color");
-        json_writer.set_fast_value(4u64);
+        let mut term = Term::from_field_json_path(field, "color", false);
+        term.append_type_and_fast_value(4u64);
+
        assert_eq!(
-            json_writer.term().serialized_term(),
+            term.serialized_term(),
            b"\x00\x00\x00\x01jcolor\x00u\x00\x00\x00\x00\x00\x00\x00\x04"
        )
    }
@@ -549,12 +386,10 @@ mod tests {
    #[test]
    fn test_f64_term() {
        let field = Field::from_field_id(1);
-        let mut term = Term::with_type_and_field(Type::Json, field);
-        let mut json_writer = JsonTermWriter::wrap(&mut term, false);
-        json_writer.push_path_segment("color");
-        json_writer.set_fast_value(4.0f64);
+        let mut term = Term::from_field_json_path(field, "color", false);
+        term.append_type_and_fast_value(4.0f64);
        assert_eq!(
-            json_writer.term().serialized_term(),
+            term.serialized_term(),
            b"\x00\x00\x00\x01jcolor\x00f\xc0\x10\x00\x00\x00\x00\x00\x00"
        )
    }
@@ -562,90 +397,14 @@ mod tests {
    #[test]
    fn test_bool_term() {
        let field = Field::from_field_id(1);
-        let mut term = Term::with_type_and_field(Type::Json, field);
-        let mut json_writer = JsonTermWriter::wrap(&mut term, false);
-        json_writer.push_path_segment("color");
-        json_writer.set_fast_value(true);
+        let mut term = Term::from_field_json_path(field, "color", false);
+        term.append_type_and_fast_value(true);
        assert_eq!(
-            json_writer.term().serialized_term(),
+            term.serialized_term(),
            b"\x00\x00\x00\x01jcolor\x00o\x00\x00\x00\x00\x00\x00\x00\x01"
        )
    }

-    #[test]
-    fn test_push_after_set_path_segment() {
-        let field = Field::from_field_id(1);
-        let mut term = Term::with_type_and_field(Type::Json, field);
-        let mut json_writer = JsonTermWriter::wrap(&mut term, false);
-        json_writer.push_path_segment("attribute");
-        json_writer.set_str("something");
-        json_writer.push_path_segment("color");
-        json_writer.set_str("red");
-        assert_eq!(
-            json_writer.term().serialized_term(),
-            b"\x00\x00\x00\x01jattribute\x01color\x00sred"
-        )
-    }
-
-    #[test]
-    fn test_pop_segment() {
-        let field = Field::from_field_id(1);
-        let mut term = Term::with_type_and_field(Type::Json, field);
-        let mut json_writer = JsonTermWriter::wrap(&mut term, false);
-        json_writer.push_path_segment("color");
-        json_writer.push_path_segment("hue");
-        json_writer.pop_path_segment();
-        json_writer.set_str("red");
-        assert_eq!(
-            json_writer.term().serialized_term(),
-            b"\x00\x00\x00\x01jcolor\x00sred"
-        )
-    }
-
-    #[test]
-    fn test_json_writer_path() {
-        let field = Field::from_field_id(1);
-        let mut term = Term::with_type_and_field(Type::Json, field);
-        let mut json_writer = JsonTermWriter::wrap(&mut term, false);
-        json_writer.push_path_segment("color");
-        assert_eq!(json_writer.path(), b"color");
-        json_writer.push_path_segment("hue");
-        assert_eq!(json_writer.path(), b"color\x01hue");
-        json_writer.set_str("pink");
-        assert_eq!(json_writer.path(), b"color\x01hue");
-    }
-
-    #[test]
-    fn test_json_path_expand_dots_disabled() {
-        let field = Field::from_field_id(1);
-        let mut term = Term::with_type_and_field(Type::Json, field);
-        let mut json_writer = JsonTermWriter::wrap(&mut term, false);
-        json_writer.push_path_segment("color.hue");
-        assert_eq!(json_writer.path(), b"color.hue");
-    }
-
-    #[test]
-    fn test_json_path_expand_dots_enabled() {
-        let field = Field::from_field_id(1);
-        let mut term = Term::with_type_and_field(Type::Json, field);
-        let mut json_writer = JsonTermWriter::wrap(&mut term, true);
-        json_writer.push_path_segment("color.hue");
-        assert_eq!(json_writer.path(), b"color\x01hue");
-    }
-
-    #[test]
-    fn test_json_path_expand_dots_enabled_pop_segment() {
-        let field = Field::from_field_id(1);
-        let mut term = Term::with_type_and_field(Type::Json, field);
-        let mut json_writer = JsonTermWriter::wrap(&mut term, true);
-        json_writer.push_path_segment("hello");
-        assert_eq!(json_writer.path(), b"hello");
-        json_writer.push_path_segment("color.hue");
-        assert_eq!(json_writer.path(), b"hello\x01color\x01hue");
-        json_writer.pop_path_segment();
-        assert_eq!(json_writer.path(), b"hello");
-    }
-
    #[test]
    fn test_split_json_path_simple() {
        let json_path = split_json_path("titi.toto");
--- a/src/core/searcher.rs
+++ b/src/core/searcher.rs
@@ -4,13 +4,13 @@ use std::{fmt, io};

 use crate::collector::Collector;
 use crate::core::Executor;
-use crate::index::SegmentReader;
+use crate::index::{SegmentId, SegmentReader};
 use crate::query::{Bm25StatisticsProvider, EnableScoring, Query};
 use crate::schema::document::DocumentDeserialize;
 use crate::schema::{Schema, Term};
 use crate::space_usage::SearcherSpaceUsage;
 use crate::store::{CacheStats, StoreReader};
-use crate::{DocAddress, Index, Opstamp, SegmentId, TrackedObject};
+use crate::{DocAddress, Index, Opstamp, TrackedObject};

 /// Identifies the searcher generation accessed by a [`Searcher`].
 ///
@@ -109,8 +109,9 @@ impl Searcher {
        &self,
        doc_address: DocAddress,
    ) -> crate::Result<D> {
+        let executor = self.inner.index.search_executor();
        let store_reader = &self.inner.store_readers[doc_address.segment_ord as usize];
-        store_reader.get_async(doc_address.doc_id).await
+        store_reader.get_async(doc_address.doc_id, executor).await
    }

    /// Access the schema associated with the index of this searcher.
--- a/src/core/tests.rs
+++ b/src/core/tests.rs
@@ -1,13 +1,14 @@
 use crate::collector::Count;
 use crate::directory::{RamDirectory, WatchCallback};
+use crate::index::SegmentId;
 use crate::indexer::{LogMergePolicy, NoMergePolicy};
-use crate::json_utils::JsonTermWriter;
+use crate::postings::Postings;
 use crate::query::TermQuery;
-use crate::schema::{Field, IndexRecordOption, Schema, Type, INDEXED, STRING, TEXT};
+use crate::schema::{Field, IndexRecordOption, Schema, INDEXED, STRING, TEXT};
 use crate::tokenizer::TokenizerManager;
 use crate::{
-    Directory, DocSet, Index, IndexBuilder, IndexReader, IndexSettings, IndexWriter, Postings,
-    ReloadPolicy, SegmentId, TantivyDocument, Term,
+    Directory, DocSet, Index, IndexBuilder, IndexReader, IndexSettings, IndexWriter, ReloadPolicy,
+    TantivyDocument, Term,
 };

 #[test]
@@ -416,16 +417,12 @@ fn test_non_text_json_term_freq() {
    let searcher = reader.searcher();
    let segment_reader = searcher.segment_reader(0u32);
    let inv_idx = segment_reader.inverted_index(field).unwrap();
-    let mut term = Term::with_type_and_field(Type::Json, field);
-    let mut json_term_writer = JsonTermWriter::wrap(&mut term, false);
-    json_term_writer.push_path_segment("tenant_id");
-    json_term_writer.close_path_and_set_type(Type::U64);
-    json_term_writer.set_fast_value(75u64);
+
+    let mut term = Term::from_field_json_path(field, "tenant_id", false);
+    term.append_type_and_fast_value(75i64);
+
    let postings = inv_idx
-        .read_postings(
-            json_term_writer.term(),
-            IndexRecordOption::WithFreqsAndPositions,
-        )
+        .read_postings(&term, IndexRecordOption::WithFreqsAndPositions)
        .unwrap()
        .unwrap();
    assert_eq!(postings.doc(), 0);
@@ -454,16 +451,12 @@ fn test_non_text_json_term_freq_bitpacked() {
    let searcher = reader.searcher();
    let segment_reader = searcher.segment_reader(0u32);
    let inv_idx = segment_reader.inverted_index(field).unwrap();
-    let mut term = Term::with_type_and_field(Type::Json, field);
-    let mut json_term_writer = JsonTermWriter::wrap(&mut term, false);
-    json_term_writer.push_path_segment("tenant_id");
-    json_term_writer.close_path_and_set_type(Type::U64);
-    json_term_writer.set_fast_value(75u64);
+
+    let mut term = Term::from_field_json_path(field, "tenant_id", false);
+    term.append_type_and_fast_value(75i64);
+
    let mut postings = inv_idx
-        .read_postings(
-            json_term_writer.term(),
-            IndexRecordOption::WithFreqsAndPositions,
-        )
+        .read_postings(&term, IndexRecordOption::WithFreqsAndPositions)
        .unwrap()
        .unwrap();
    assert_eq!(postings.doc(), 0);
--- a/src/directory/composite_file.rs
+++ b/src/directory/composite_file.rs
@@ -1,6 +1,5 @@
 use std::collections::HashMap;
 use std::io::{self, Read, Write};
-use std::iter::ExactSizeIterator;
 use std::ops::Range;

 use common::{BinarySerializable, CountingWriter, HasLen, VInt};
--- a/src/directory/directory.rs
+++ b/src/directory/directory.rs
@@ -1,5 +1,4 @@
 use std::io::Write;
-use std::marker::{Send, Sync};
 use std::path::{Path, PathBuf};
 use std::sync::Arc;
 use std::time::Duration;
@@ -40,6 +39,7 @@ impl RetryPolicy {
 /// The `DirectoryLock` is an object that represents a file lock.
 ///
 /// It is associated with a lock file, that gets deleted on `Drop.`
+#[allow(dead_code)]
 pub struct DirectoryLock(Box<dyn Send + Sync + 'static>);

 struct DirectoryLockGuard {
--- a/src/directory/mmap_directory.rs
+++ b/src/directory/mmap_directory.rs
@@ -566,7 +566,7 @@ mod tests {
        let mmap_directory = MmapDirectory::create_from_tempdir().unwrap();
        let num_paths = 10;
        let paths: Vec<PathBuf> = (0..num_paths)
-            .map(|i| PathBuf::from(&*format!("file_{}", i)))
+            .map(|i| PathBuf::from(&*format!("file_{i}")))
            .collect();
        {
            for path in &paths {
--- a/src/directory/tests.rs
+++ b/src/directory/tests.rs
@@ -1,6 +1,6 @@
 use std::io::Write;
 use std::mem;
-use std::path::{Path, PathBuf};
+use std::path::Path;
 use std::sync::atomic::Ordering::SeqCst;
 use std::sync::atomic::{AtomicBool, AtomicUsize};
 use std::sync::Arc;
--- a/src/directory/watch_event_router.rs
+++ b/src/directory/watch_event_router.rs
@@ -32,6 +32,7 @@ pub struct WatchCallbackList {
 /// file change is detected.
 #[must_use = "This `WatchHandle` controls the lifetime of the watch and should therefore be used."]
 #[derive(Clone)]
+#[allow(dead_code)]
 pub struct WatchHandle(Arc<WatchCallback>);

 impl WatchHandle {
--- a/src/fastfield/facet_reader.rs
+++ b/src/fastfield/facet_reader.rs
@@ -62,8 +62,7 @@ impl FacetReader {

 #[cfg(test)]
 mod tests {
-    use crate::schema::document::Value;
-    use crate::schema::{Facet, FacetOptions, SchemaBuilder, STORED};
+    use crate::schema::{Facet, FacetOptions, SchemaBuilder, Value, STORED};
    use crate::{DocAddress, Index, IndexWriter, TantivyDocument};

    #[test]
@@ -89,7 +88,9 @@ mod tests {
        let doc = searcher
            .doc::<TantivyDocument>(DocAddress::new(0u32, 0u32))
            .unwrap();
-        let value = doc.get_first(facet_field).and_then(|v| v.as_facet());
+        let value = doc
+            .get_first(facet_field)
+            .and_then(|v| v.as_value().as_facet());
        assert_eq!(value, None);
    }

@@ -146,8 +147,11 @@ mod tests {
        facet_ords.extend(facet_reader.facet_ords(0u32));
        assert_eq!(&facet_ords, &[0u64]);
        let doc = searcher.doc::<TantivyDocument>(DocAddress::new(0u32, 0u32))?;
-        let value: Option<&Facet> = doc.get_first(facet_field).and_then(|v| v.as_facet());
-        assert_eq!(value, Facet::from_text("/a/b").ok().as_ref());
+        let value: Option<Facet> = doc
+            .get_first(facet_field)
+            .and_then(|v| v.as_facet())
+            .map(|facet| Facet::from_encoded_string(facet.to_string()));
+        assert_eq!(value, Facet::from_text("/a/b").ok());
        Ok(())
    }

--- a/src/fastfield/mod.rs
+++ b/src/fastfield/mod.rs
@@ -79,8 +79,8 @@ mod tests {
    use std::ops::{Range, RangeInclusive};
    use std::path::Path;

-    use columnar::{Column, MonotonicallyMappableToU64, StrColumn};
-    use common::{ByteCount, HasLen, TerminatingWrite};
+    use columnar::StrColumn;
+    use common::{ByteCount, DateTimePrecision, HasLen, TerminatingWrite};
    use once_cell::sync::Lazy;
    use rand::prelude::SliceRandom;
    use rand::rngs::StdRng;
@@ -88,14 +88,15 @@ mod tests {

    use super::*;
    use crate::directory::{Directory, RamDirectory, WritePtr};
+    use crate::index::SegmentId;
    use crate::merge_policy::NoMergePolicy;
    use crate::schema::{
-        Facet, FacetOptions, Field, JsonObjectOptions, Schema, SchemaBuilder, TantivyDocument,
-        TextOptions, FAST, INDEXED, STORED, STRING, TEXT,
+        DateOptions, Facet, FacetOptions, Field, JsonObjectOptions, Schema, SchemaBuilder,
+        TantivyDocument, TextOptions, FAST, INDEXED, STORED, STRING, TEXT,
    };
    use crate::time::OffsetDateTime;
    use crate::tokenizer::{LowerCaser, RawTokenizer, TextAnalyzer, TokenizerManager};
-    use crate::{DateOptions, DateTimePrecision, Index, IndexWriter, SegmentId, SegmentReader};
+    use crate::{Index, IndexWriter, SegmentReader};

    pub static SCHEMA: Lazy<Schema> = Lazy::new(|| {
        let mut schema_builder = Schema::builder();
--- a/src/fastfield/writer.rs
+++ b/src/fastfield/writer.rs
@@ -1,14 +1,14 @@
 use std::io;

 use columnar::{ColumnarWriter, NumericalValue};
-use common::JsonPathWriter;
+use common::{DateTimePrecision, JsonPathWriter};
 use tokenizer_api::Token;

 use crate::indexer::doc_id_mapping::DocIdMapping;
 use crate::schema::document::{Document, ReferenceValue, ReferenceValueLeaf, Value};
 use crate::schema::{value_type_to_column_type, Field, FieldType, Schema, Type};
 use crate::tokenizer::{TextAnalyzer, TokenizerManager};
-use crate::{DateTimePrecision, DocId, TantivyError};
+use crate::{DocId, TantivyError};

 /// Only index JSON down to a depth of 20.
 /// This is mostly to guard us from a stack overflow triggered by malicious input.
@@ -183,8 +183,7 @@ impl FastFieldsWriter {
                        .record_datetime(doc_id, field_name, truncated_datetime);
                }
                ReferenceValueLeaf::Facet(val) => {
-                    self.columnar_writer
-                        .record_str(doc_id, field_name, val.encoded_str());
+                    self.columnar_writer.record_str(doc_id, field_name, val);
                }
                ReferenceValueLeaf::Bytes(val) => {
                    self.columnar_writer.record_bytes(doc_id, field_name, val);
--- a/src/functional_test.rs
+++ b/src/functional_test.rs
@@ -1,9 +1,12 @@
+#![allow(deprecated)] // Remove with index sorting
+
 use std::collections::HashSet;

 use rand::{thread_rng, Rng};

 use crate::indexer::index_writer::MEMORY_BUDGET_NUM_BYTES_MIN;
 use crate::schema::*;
+#[allow(deprecated)]
 use crate::{doc, schema, Index, IndexSettings, IndexSortByField, IndexWriter, Order, Searcher};

 fn check_index_content(searcher: &Searcher, vals: &[u64]) -> crate::Result<()> {
--- a/src/index/index.rs
+++ b/src/index/index.rs
@@ -3,7 +3,7 @@ use std::fmt;
 #[cfg(feature = "mmap")]
 use std::path::Path;
 use std::path::PathBuf;
-use std::sync::Arc;
+use std::thread::available_parallelism;

 use super::segment::Segment;
 use super::segment_reader::merge_field_meta_data;
@@ -20,7 +20,7 @@ use crate::indexer::segment_updater::save_metas;
 use crate::indexer::{IndexWriter, SingleSegmentIndexWriter};
 use crate::reader::{IndexReader, IndexReaderBuilder};
 use crate::schema::document::Document;
-use crate::schema::{Field, FieldType, Schema};
+use crate::schema::{Field, FieldType, Schema, Type};
 use crate::tokenizer::{TextAnalyzer, TokenizerManager};
 use crate::SegmentReader;

@@ -83,7 +83,7 @@ fn save_new_metas(
 ///
 /// ```
 /// use tantivy::schema::*;
-/// use tantivy::{Index, IndexSettings, IndexSortByField, Order};
+/// use tantivy::{Index, IndexSettings};
 ///
 /// let mut schema_builder = Schema::builder();
 /// let id_field = schema_builder.add_text_field("id", STRING);
@@ -96,10 +96,7 @@ fn save_new_metas(
 ///
 /// let schema = schema_builder.build();
 /// let settings = IndexSettings{
-///     sort_by_field: Some(IndexSortByField{
-///         field: "number".to_string(),
-///         order: Order::Asc
-///     }),
+///     docstore_blocksize: 100_000,
 ///     ..Default::default()
 /// };
 /// let index = Index::builder().schema(schema).settings(settings).create_in_ram();
@@ -251,6 +248,14 @@ impl IndexBuilder {
                        sort_by_field.field
                    )));
                }
+                let supported_field_types = [Type::I64, Type::U64, Type::F64, Type::Date];
+                let field_type = entry.field_type().value_type();
+                if !supported_field_types.contains(&field_type) {
+                    return Err(TantivyError::InvalidArgument(format!(
+                        "Unsupported field type in sort_by_field: {field_type:?}. Supported field \
+                         types: {supported_field_types:?} ",
+                    )));
+                }
            }
            Ok(())
        } else {
@@ -287,7 +292,7 @@ pub struct Index {
    directory: ManagedDirectory,
    schema: Schema,
    settings: IndexSettings,
-    executor: Arc<Executor>,
+    executor: Executor,
    tokenizers: TokenizerManager,
    fast_field_tokenizers: TokenizerManager,
    inventory: SegmentMetaInventory,
@@ -312,29 +317,25 @@ impl Index {
    ///
    /// By default the executor is single thread, and simply runs in the calling thread.
    pub fn search_executor(&self) -> &Executor {
-        self.executor.as_ref()
+        &self.executor
    }

    /// Replace the default single thread search executor pool
    /// by a thread pool with a given number of threads.
    pub fn set_multithread_executor(&mut self, num_threads: usize) -> crate::Result<()> {
-        self.executor = Arc::new(Executor::multi_thread(num_threads, "tantivy-search-")?);
+        self.executor = Executor::multi_thread(num_threads, "tantivy-search-")?;
        Ok(())
    }

    /// Custom thread pool by a outer thread pool.
-    pub fn set_shared_multithread_executor(
-        &mut self,
-        shared_thread_pool: Arc<Executor>,
-    ) -> crate::Result<()> {
-        self.executor = shared_thread_pool.clone();
-        Ok(())
+    pub fn set_executor(&mut self, executor: Executor) {
+        self.executor = executor;
    }

    /// Replace the default single thread search executor pool
    /// by a thread pool with as many threads as there are CPUs on the system.
    pub fn set_default_multithread_executor(&mut self) -> crate::Result<()> {
-        let default_num_threads = num_cpus::get();
+        let default_num_threads = available_parallelism()?.get();
        self.set_multithread_executor(default_num_threads)
    }

@@ -412,7 +413,7 @@ impl Index {
            schema,
            tokenizers: TokenizerManager::default(),
            fast_field_tokenizers: TokenizerManager::default(),
-            executor: Arc::new(Executor::single_thread()),
+            executor: Executor::single_thread(),
            inventory,
        }
    }
@@ -615,7 +616,7 @@ impl Index {
        &self,
        memory_budget_in_bytes: usize,
    ) -> crate::Result<IndexWriter<D>> {
-        let mut num_threads = std::cmp::min(num_cpus::get(), MAX_NUM_THREAD);
+        let mut num_threads = std::cmp::min(available_parallelism()?.get(), MAX_NUM_THREAD);
        let memory_budget_num_bytes_per_thread = memory_budget_in_bytes / num_threads;
        if memory_budget_num_bytes_per_thread < MEMORY_BUDGET_NUM_BYTES_MIN {
            num_threads = (memory_budget_in_bytes / MEMORY_BUDGET_NUM_BYTES_MIN).max(1);
--- a/src/index/index_meta.rs
+++ b/src/index/index_meta.rs
@@ -288,6 +288,10 @@ impl Default for IndexSettings {
 /// Presorting documents can greatly improve performance
 /// in some scenarios, by applying top n
 /// optimizations.
+#[deprecated(
+    since = "0.22.0",
+    note = "We plan to remove index sorting in `0.23`. If you need index sorting, please comment on the related issue https://github.com/quickwit-oss/tantivy/issues/2352 and explain your use case."
+)]
 #[derive(Clone, Debug, Serialize, Deserialize, Eq, PartialEq)]
 pub struct IndexSortByField {
    /// The field to sort the documents by
--- a/src/index/inverted_index_reader.rs
+++ b/src/index/inverted_index_reader.rs
@@ -1,12 +1,13 @@
 use std::io;

+use common::json_path_writer::JSON_END_OF_PATH;
 use common::BinarySerializable;
 use fnv::FnvHashSet;

 use crate::directory::FileSlice;
 use crate::positions::PositionReader;
 use crate::postings::{BlockSegmentPostings, SegmentPostings, TermInfo};
-use crate::schema::{IndexRecordOption, Term, Type, JSON_END_OF_PATH};
+use crate::schema::{IndexRecordOption, Term, Type};
 use crate::termdict::TermDictionary;

 /// The inverted index reader is in charge of accessing
--- a/src/index/mod.rs
+++ b/src/index/mod.rs
@@ -1,5 +1,3 @@
-//! # Index Module
-//!
 //! The `index` module in Tantivy contains core components to read and write indexes.
 //!
 //! It contains `Index` and `Segment`, where a `Index` consists of one or more `Segment`s.
--- a/src/index/segment_id.rs
+++ b/src/index/segment_id.rs
@@ -1,4 +1,4 @@
-use std::cmp::{Ord, Ordering};
+use std::cmp::Ordering;
 use std::error::Error;
 use std::fmt;
 use std::str::FromStr;
--- a/src/index/segment_reader.rs
+++ b/src/index/segment_reader.rs
@@ -318,14 +318,14 @@ impl SegmentReader {
                        if create_canonical {
                            // Without expand dots enabled dots need to be escaped.
                            let escaped_json_path = json_path.replace('.', "\\.");
-                            let full_path = format!("{}.{}", field_name, escaped_json_path);
+                            let full_path = format!("{field_name}.{escaped_json_path}");
                            let full_path_unescaped = format!("{}.{}", field_name, &json_path);
                            map_to_canonical.insert(full_path_unescaped, full_path.to_string());
                            full_path
                        } else {
                            // With expand dots enabled, we can use '.' instead of '\u{1}'.
                            json_path_sep_to_dot(&mut json_path);
-                            format!("{}.{}", field_name, json_path)
+                            format!("{field_name}.{json_path}")
                        }
                    };
                    indexed_fields.extend(
@@ -406,7 +406,7 @@ impl SegmentReader {
    }

    /// Returns an iterator that will iterate over the alive document ids
-    pub fn doc_ids_alive(&self) -> Box<dyn Iterator<Item = DocId> + '_> {
+    pub fn doc_ids_alive(&self) -> Box<dyn Iterator<Item = DocId> + Send + '_> {
        if let Some(alive_bitset) = &self.alive_bitset_opt {
            Box::new(alive_bitset.iter_alive())
        } else {
@@ -516,8 +516,8 @@ impl fmt::Debug for SegmentReader {
 mod test {
    use super::*;
    use crate::index::Index;
-    use crate::schema::{Schema, SchemaBuilder, Term, STORED, TEXT};
-    use crate::{DocId, IndexWriter};
+    use crate::schema::{SchemaBuilder, Term, STORED, TEXT};
+    use crate::IndexWriter;

    #[test]
    fn test_merge_field_meta_data_same() {
--- a/src/indexer/delete_queue.rs
+++ b/src/indexer/delete_queue.rs
@@ -246,8 +246,9 @@ impl DeleteCursor {
 mod tests {

    use super::{DeleteOperation, DeleteQueue};
+    use crate::index::SegmentReader;
    use crate::query::{Explanation, Scorer, Weight};
-    use crate::{DocId, Score, SegmentReader};
+    use crate::{DocId, Score};

    struct DummyWeight;
    impl Weight for DummyWeight {
--- a/src/indexer/doc_id_mapping.rs
+++ b/src/indexer/doc_id_mapping.rs
@@ -159,7 +159,7 @@ mod tests_indexsorting {
    use crate::indexer::NoMergePolicy;
    use crate::query::QueryParser;
    use crate::schema::*;
-    use crate::{DocAddress, Index, IndexSettings, IndexSortByField, Order};
+    use crate::{DocAddress, Index, IndexBuilder, IndexSettings, IndexSortByField, Order};

    fn create_test_index(
        index_settings: Option<IndexSettings>,
@@ -306,12 +306,10 @@ mod tests_indexsorting {
        let my_string_field = index.schema().get_field("string_field").unwrap();
        let searcher = index.reader()?.searcher();
        {
-            assert_eq!(
-                searcher
-                    .doc::<TantivyDocument>(DocAddress::new(0, 0))?
-                    .get_first(my_string_field),
-                None
-            );
+            assert!(searcher
+                .doc::<TantivyDocument>(DocAddress::new(0, 0))?
+                .get_first(my_string_field)
+                .is_none());
            assert_eq!(
                searcher
                    .doc::<TantivyDocument>(DocAddress::new(0, 3))?
@@ -344,7 +342,7 @@ mod tests_indexsorting {
                Some("blublub")
            );
            let doc = searcher.doc::<TantivyDocument>(DocAddress::new(0, 4))?;
-            assert_eq!(doc.get_first(my_string_field), None);
+            assert!(doc.get_first(my_string_field).is_none());
        }
        // sort by field desc
        let index = create_test_index(
@@ -557,4 +555,28 @@ mod tests_indexsorting {
            &[2000, 8000, 3000]
        );
    }
+
+    #[test]
+    fn test_text_sort() -> crate::Result<()> {
+        let mut schema_builder = SchemaBuilder::new();
+        schema_builder.add_text_field("id", STRING | FAST | STORED);
+        schema_builder.add_text_field("name", TEXT | STORED);
+
+        let resp = IndexBuilder::new()
+            .schema(schema_builder.build())
+            .settings(IndexSettings {
+                sort_by_field: Some(IndexSortByField {
+                    field: "id".to_string(),
+                    order: Order::Asc,
+                }),
+                ..Default::default()
+            })
+            .create_in_ram();
+        assert!(resp
+            .unwrap_err()
+            .to_string()
+            .contains("Unsupported field type"));
+
+        Ok(())
+    }
 }
--- a/src/indexer/flat_map_with_buffer.rs
+++ b/src/indexer/flat_map_with_buffer.rs
@@ -22,6 +22,7 @@ where
    }
 }

+#[allow(dead_code)]
 pub trait FlatMapWithBufferIter: Iterator {
    /// Function similar to `flat_map`, but allows reusing a shared `Vec`.
    fn flat_map_with_buffer<F, T>(self, fill_buffer: F) -> FlatMapWithBuffer<T, F, Self>
--- a/src/indexer/index_writer.rs
+++ b/src/indexer/index_writer.rs
@@ -808,16 +808,15 @@ mod tests {
    use proptest::prop_oneof;

    use super::super::operation::UserOperation;
-    use crate::collector::TopDocs;
+    use crate::collector::{Count, TopDocs};
    use crate::directory::error::LockError;
    use crate::error::*;
    use crate::indexer::index_writer::MEMORY_BUDGET_NUM_BYTES_MIN;
    use crate::indexer::NoMergePolicy;
    use crate::query::{BooleanQuery, Occur, Query, QueryParser, TermQuery};
-    use crate::schema::document::Value;
    use crate::schema::{
        self, Facet, FacetOptions, IndexRecordOption, IpAddrOptions, NumericOptions, Schema,
-        TextFieldIndexing, TextOptions, FAST, INDEXED, STORED, STRING, TEXT,
+        TextFieldIndexing, TextOptions, Value, FAST, INDEXED, STORED, STRING, TEXT,
    };
    use crate::store::DOCSTORE_CACHE_CAPACITY;
    use crate::{
@@ -1573,20 +1572,74 @@ mod tests {
        Ok(())
    }

-    #[derive(Debug, Clone, Copy)]
+    #[derive(Debug, Clone)]
    enum IndexingOp {
-        AddDoc { id: u64 },
-        DeleteDoc { id: u64 },
-        DeleteDocQuery { id: u64 },
+        AddMultipleDoc {
+            id: u64,
+            num_docs: u64,
+            value: IndexValue,
+        },
+        AddDoc {
+            id: u64,
+            value: IndexValue,
+        },
+        DeleteDoc {
+            id: u64,
+        },
+        DeleteDocQuery {
+            id: u64,
+        },
        Commit,
        Merge,
    }
+    impl IndexingOp {
+        fn add(id: u64) -> Self {
+            IndexingOp::AddDoc {
+                id,
+                value: IndexValue::F64(id as f64),
+            }
+        }
+    }
+
+    use serde::Serialize;
+    #[derive(Debug, Clone, Serialize)]
+    #[serde(untagged)]
+    enum IndexValue {
+        Str(String),
+        F64(f64),
+        U64(u64),
+        I64(i64),
+    }
+    impl Default for IndexValue {
+        fn default() -> Self {
+            IndexValue::F64(0.0)
+        }
+    }
+
+    fn value_strategy() -> impl Strategy<Value = IndexValue> {
+        prop_oneof![
+            any::<f64>().prop_map(IndexValue::F64),
+            any::<u64>().prop_map(IndexValue::U64),
+            any::<i64>().prop_map(IndexValue::I64),
+            any::<String>().prop_map(IndexValue::Str),
+        ]
+    }

    fn balanced_operation_strategy() -> impl Strategy<Value = IndexingOp> {
        prop_oneof![
            (0u64..20u64).prop_map(|id| IndexingOp::DeleteDoc { id }),
            (0u64..20u64).prop_map(|id| IndexingOp::DeleteDocQuery { id }),
-            (0u64..20u64).prop_map(|id| IndexingOp::AddDoc { id }),
+            (0u64..20u64, value_strategy())
+                .prop_map(move |(id, value)| IndexingOp::AddDoc { id, value }),
+            ((0u64..20u64), (1u64..100), value_strategy()).prop_map(
+                move |(id, num_docs, value)| {
+                    IndexingOp::AddMultipleDoc {
+                        id,
+                        num_docs,
+                        value,
+                    }
+                }
+            ),
            (0u64..1u64).prop_map(|_| IndexingOp::Commit),
            (0u64..1u64).prop_map(|_| IndexingOp::Merge),
        ]
@@ -1596,7 +1649,17 @@ mod tests {
        prop_oneof![
            5 => (0u64..100u64).prop_map(|id| IndexingOp::DeleteDoc { id }),
            5 => (0u64..100u64).prop_map(|id| IndexingOp::DeleteDocQuery { id }),
-            50 => (0u64..100u64).prop_map(|id| IndexingOp::AddDoc { id }),
+            50 => (0u64..100u64, value_strategy())
+                .prop_map(move |(id, value)| IndexingOp::AddDoc { id, value }),
+            50 => (0u64..100u64, (1u64..100), value_strategy()).prop_map(
+                move |(id, num_docs, value)| {
+                    IndexingOp::AddMultipleDoc {
+                        id,
+                        num_docs,
+                        value,
+                    }
+                }
+            ),
            2 => (0u64..1u64).prop_map(|_| IndexingOp::Commit),
            1 => (0u64..1u64).prop_map(|_| IndexingOp::Merge),
        ]
@@ -1605,19 +1668,27 @@ mod tests {
    fn expected_ids(ops: &[IndexingOp]) -> (HashMap<u64, u64>, HashSet<u64>) {
        let mut existing_ids = HashMap::new();
        let mut deleted_ids = HashSet::new();
-        for &op in ops {
+        for op in ops {
            match op {
-                IndexingOp::AddDoc { id } => {
-                    *existing_ids.entry(id).or_insert(0) += 1;
-                    deleted_ids.remove(&id);
+                IndexingOp::AddDoc { id, value: _ } => {
+                    *existing_ids.entry(*id).or_insert(0) += 1;
+                    deleted_ids.remove(id);
+                }
+                IndexingOp::AddMultipleDoc {
+                    id,
+                    num_docs,
+                    value: _,
+                } => {
+                    *existing_ids.entry(*id).or_insert(0) += num_docs;
+                    deleted_ids.remove(id);
                }
                IndexingOp::DeleteDoc { id } => {
                    existing_ids.remove(&id);
-                    deleted_ids.insert(id);
+                    deleted_ids.insert(*id);
                }
                IndexingOp::DeleteDocQuery { id } => {
                    existing_ids.remove(&id);
-                    deleted_ids.insert(id);
+                    deleted_ids.insert(*id);
                }
                _ => {}
            }
@@ -1627,16 +1698,19 @@ mod tests {

    fn get_id_list(ops: &[IndexingOp]) -> Vec<u64> {
        let mut id_list = Vec::new();
-        for &op in ops {
+        for op in ops {
            match op {
-                IndexingOp::AddDoc { id } => {
-                    id_list.push(id);
+                IndexingOp::AddDoc { id, value: _ } => {
+                    id_list.push(*id);
+                }
+                IndexingOp::AddMultipleDoc { id, .. } => {
+                    id_list.push(*id);
                }
                IndexingOp::DeleteDoc { id } => {
-                    id_list.retain(|el| *el != id);
+                    id_list.retain(|el| el != id);
                }
                IndexingOp::DeleteDocQuery { id } => {
-                    id_list.retain(|el| *el != id);
+                    id_list.retain(|el| el != id);
                }
                _ => {}
            }
@@ -1717,42 +1791,59 @@ mod tests {

        let ip_from_id = |id| Ipv6Addr::from_u128(id as u128);

-        for &op in ops {
-            match op {
-                IndexingOp::AddDoc { id } => {
-                    let facet = Facet::from(&("/cola/".to_string() + &id.to_string()));
-                    let ip = ip_from_id(id);
-
-                    if !id_is_full_doc(id) {
-                        // every 3rd doc has no ip field
-                        index_writer.add_document(doc!(
-                            id_field=>id,
-                        ))?;
-                    } else {
-                        let json = json!({"date1": format!("2022-{id}-01T00:00:01Z"), "date2": format!("{id}-05-01T00:00:01Z"), "id": id, "ip": ip.to_string()});
-                        index_writer.add_document(doc!(id_field=>id,
-                                json_field=>json,
-                                bytes_field => id.to_le_bytes().as_slice(),
-                                id_opt_field => id,
-                                ip_field => ip,
-                                ips_field => ip,
-                                ips_field => ip,
-                                multi_numbers=> id,
-                                multi_numbers => id,
-                                bool_field => (id % 2u64) != 0,
-                                i64_field => id as i64,
-                                f64_field => id as f64,
-                                date_field => DateTime::from_timestamp_secs(id as i64),
-                                multi_bools => (id % 2u64) != 0,
-                                multi_bools => (id % 2u64) == 0,
-                                text_field => id.to_string(),
-                                facet_field => facet,
-                                large_text_field => LOREM,
-                                multi_text_fields => multi_text_field_text1,
-                                multi_text_fields => multi_text_field_text2,
-                                multi_text_fields => multi_text_field_text3,
-                        ))?;
-                    }
+        let add_docs = |index_writer: &mut IndexWriter,
+                        id: u64,
+                        value: IndexValue,
+                        num: u64|
+         -> crate::Result<()> {
+            let facet = Facet::from(&("/cola/".to_string() + &id.to_string()));
+            let ip = ip_from_id(id);
+            let doc = if !id_is_full_doc(id) {
+                // every 3rd doc has no ip field
+                doc!(
+                    id_field=>id,
+                )
+            } else {
+                let json = json!({"date1": format!("2022-{id}-01T00:00:01Z"), "date2": format!("{id}-05-01T00:00:01Z"), "id": id, "ip": ip.to_string(), "val": value});
+                doc!(id_field=>id,
+                        json_field=>json,
+                        bytes_field => id.to_le_bytes().as_slice(),
+                        id_opt_field => id,
+                        ip_field => ip,
+                        ips_field => ip,
+                        ips_field => ip,
+                        multi_numbers=> id,
+                        multi_numbers => id,
+                        bool_field => (id % 2u64) != 0,
+                        i64_field => id as i64,
+                        f64_field => id as f64,
+                        date_field => DateTime::from_timestamp_secs(id as i64),
+                        multi_bools => (id % 2u64) != 0,
+                        multi_bools => (id % 2u64) == 0,
+                        text_field => id.to_string(),
+                        facet_field => facet,
+                        large_text_field => LOREM,
+                        multi_text_fields => multi_text_field_text1,
+                        multi_text_fields => multi_text_field_text2,
+                        multi_text_fields => multi_text_field_text3,
+                )
+            };
+            for _ in 0..num {
+                index_writer.add_document(doc.clone())?;
+            }
+            Ok(())
+        };
+        for op in ops {
+            match op.clone() {
+                IndexingOp::AddMultipleDoc {
+                    id,
+                    num_docs,
+                    value,
+                } => {
+                    add_docs(&mut index_writer, id, value, num_docs)?;
+                }
+                IndexingOp::AddDoc { id, value } => {
+                    add_docs(&mut index_writer, id, value, 1)?;
                }
                IndexingOp::DeleteDoc { id } => {
                    index_writer.delete_term(Term::from_field_u64(id_field, id));
@@ -1980,7 +2071,13 @@ mod tests {
                .unwrap();
            // test store iterator
            for doc in store_reader.iter::<TantivyDocument>(segment_reader.alive_bitset()) {
-                let id = doc.unwrap().get_first(id_field).unwrap().as_u64().unwrap();
+                let id = doc
+                    .unwrap()
+                    .get_first(id_field)
+                    .unwrap()
+                    .as_value()
+                    .as_u64()
+                    .unwrap();
                assert!(expected_ids_and_num_occurrences.contains_key(&id));
            }
            // test store random access
@@ -2013,7 +2110,7 @@ mod tests {
                    let mut bool2 = doc.get_all(multi_bools);
                    assert_eq!(bool, bool2.next().unwrap().as_bool().unwrap());
                    assert_ne!(bool, bool2.next().unwrap().as_bool().unwrap());
-                    assert_eq!(None, bool2.next())
+                    assert!(bool2.next().is_none())
                }
            }
        }
@@ -2027,18 +2124,22 @@ mod tests {

            top_docs.iter().map(|el| el.1).collect::<Vec<_>>()
        };
+        let count_search = |term: &str, field| {
+            let query = QueryParser::for_index(&index, vec![field])
+                .parse_query(term)
+                .unwrap();
+            searcher.search(&query, &Count).unwrap()
+        };

-        let do_search2 = |term: Term| {
+        let count_search2 = |term: Term| {
            let query = TermQuery::new(term, IndexRecordOption::Basic);
-            let top_docs: Vec<(f32, DocAddress)> =
-                searcher.search(&query, &TopDocs::with_limit(1000)).unwrap();
-
-            top_docs.iter().map(|el| el.1).collect::<Vec<_>>()
+            searcher.search(&query, &Count).unwrap()
        };

        for (id, count) in &expected_ids_and_num_occurrences {
+            // skip expensive queries
            let (existing_id, count) = (*id, *count);
-            let get_num_hits = |field| do_search(&existing_id.to_string(), field).len() as u64;
+            let get_num_hits = |field| count_search(&existing_id.to_string(), field) as u64;
            assert_eq!(get_num_hits(id_field), count);
            if !id_is_full_doc(existing_id) {
                continue;
@@ -2048,29 +2149,31 @@ mod tests {
            assert_eq!(get_num_hits(f64_field), count);

            // Test multi text
-            assert_eq!(
-                do_search("\"test1 test2\"", multi_text_fields).len(),
-                num_docs_with_values
-            );
-            assert_eq!(
-                do_search("\"test2 test3\"", multi_text_fields).len(),
-                num_docs_with_values
-            );
+            if num_docs_with_values < 1000 {
+                assert_eq!(
+                    do_search("\"test1 test2\"", multi_text_fields).len(),
+                    num_docs_with_values
+                );
+                assert_eq!(
+                    do_search("\"test2 test3\"", multi_text_fields).len(),
+                    num_docs_with_values
+                );
+            }

            // Test bytes
            let term = Term::from_field_bytes(bytes_field, existing_id.to_le_bytes().as_slice());
-            assert_eq!(do_search2(term).len() as u64, count);
+            assert_eq!(count_search2(term) as u64, count);

            // Test date
            let term = Term::from_field_date(
                date_field,
                DateTime::from_timestamp_secs(existing_id as i64),
            );
-            assert_eq!(do_search2(term).len() as u64, count);
+            assert_eq!(count_search2(term) as u64, count);
        }
        for deleted_id in deleted_ids {
            let assert_field = |field| {
-                assert_eq!(do_search(&deleted_id.to_string(), field).len() as u64, 0);
+                assert_eq!(count_search(&deleted_id.to_string(), field) as u64, 0);
            };
            assert_field(text_field);
            assert_field(f64_field);
@@ -2079,12 +2182,12 @@ mod tests {

            // Test bytes
            let term = Term::from_field_bytes(bytes_field, deleted_id.to_le_bytes().as_slice());
-            assert_eq!(do_search2(term).len() as u64, 0);
+            assert_eq!(count_search2(term), 0);

            // Test date
            let term =
                Term::from_field_date(date_field, DateTime::from_timestamp_secs(deleted_id as i64));
-            assert_eq!(do_search2(term).len() as u64, 0);
+            assert_eq!(count_search2(term), 0);
        }
        // search ip address
        //
@@ -2093,13 +2196,13 @@ mod tests {
            if !id_is_full_doc(existing_id) {
                continue;
            }
-            let do_search_ip_field = |term: &str| do_search(term, ip_field).len() as u64;
+            let do_search_ip_field = |term: &str| count_search(term, ip_field) as u64;
            let ip_addr = Ipv6Addr::from_u128(existing_id as u128);
            // Test incoming ip as ipv6
            assert_eq!(do_search_ip_field(&format!("\"{ip_addr}\"")), count);

            let term = Term::from_field_ip_addr(ip_field, ip_addr);
-            assert_eq!(do_search2(term).len() as u64, count);
+            assert_eq!(count_search2(term) as u64, count);

            // Test incoming ip as ipv4
            if let Some(ip_addr) = ip_addr.to_ipv4_mapped() {
@@ -2116,7 +2219,7 @@ mod tests {
        if !sample.is_empty() {
            let (left_sample, right_sample) = sample.split_at(sample.len() / 2);

-            let expected_count = |sample: &[(&u64, &u64)]| {
+            let calc_expected_count = |sample: &[(&u64, &u64)]| {
                sample
                    .iter()
                    .filter(|(id, _)| id_is_full_doc(**id))
@@ -2132,18 +2235,17 @@ mod tests {
            }

            // Query first half
-            if !left_sample.is_empty() {
-                let expected_count = expected_count(left_sample);
-
+            let expected_count = calc_expected_count(left_sample);
+            if !left_sample.is_empty() && expected_count < 1000 {
                let start_range = *left_sample[0].0;
                let end_range = *left_sample.last().unwrap().0;
                let query = gen_query_inclusive("id_opt", start_range, end_range);
-                assert_eq!(do_search(&query, id_opt_field).len() as u64, expected_count);
+                assert_eq!(count_search(&query, id_opt_field) as u64, expected_count);

                // Range query on ip field
                let ip1 = ip_from_id(start_range);
                let ip2 = ip_from_id(end_range);
-                let do_search_ip_field = |term: &str| do_search(term, ip_field).len() as u64;
+                let do_search_ip_field = |term: &str| count_search(term, ip_field) as u64;
                let query = gen_query_inclusive("ip", ip1, ip2);
                assert_eq!(do_search_ip_field(&query), expected_count);
                let query = gen_query_inclusive("ip", "*", ip2);
@@ -2155,19 +2257,19 @@ mod tests {
                assert_eq!(do_search_ip_field(&query), expected_count);
            }
            // Query second half
-            if !right_sample.is_empty() {
-                let expected_count = expected_count(right_sample);
+            let expected_count = calc_expected_count(right_sample);
+            if !right_sample.is_empty() && expected_count < 1000 {
                let start_range = *right_sample[0].0;
                let end_range = *right_sample.last().unwrap().0;
                // Range query on id opt field
                let query =
                    gen_query_inclusive("id_opt", start_range.to_string(), end_range.to_string());
-                assert_eq!(do_search(&query, id_opt_field).len() as u64, expected_count);
+                assert_eq!(count_search(&query, id_opt_field) as u64, expected_count);

                // Range query on ip field
                let ip1 = ip_from_id(start_range);
                let ip2 = ip_from_id(end_range);
-                let do_search_ip_field = |term: &str| do_search(term, ip_field).len() as u64;
+                let do_search_ip_field = |term: &str| count_search(term, ip_field) as u64;
                let query = gen_query_inclusive("ip", ip1, ip2);
                assert_eq!(do_search_ip_field(&query), expected_count);
                let query = gen_query_inclusive("ip", ip1, "*");
@@ -2192,7 +2294,7 @@ mod tests {
            };
            let ip = ip_from_id(existing_id);

-            let do_search_ip_field = |term: &str| do_search(term, ip_field).len() as u64;
+            let do_search_ip_field = |term: &str| count_search(term, ip_field) as u64;
            // Range query on single value field
            let query = gen_query_inclusive("ip", ip, ip);
            assert_eq!(do_search_ip_field(&query), count);
@@ -2252,7 +2354,7 @@ mod tests {

    #[test]
    fn test_fast_field_range() {
-        let ops: Vec<_> = (0..1000).map(|id| IndexingOp::AddDoc { id }).collect();
+        let ops: Vec<_> = (0..1000).map(|id| IndexingOp::add(id)).collect();
        assert!(test_operation_strategy(&ops, false, true).is_ok());
    }

@@ -2260,8 +2362,8 @@ mod tests {
    fn test_sort_index_on_opt_field_regression() {
        assert!(test_operation_strategy(
            &[
-                IndexingOp::AddDoc { id: 81 },
-                IndexingOp::AddDoc { id: 70 },
+                IndexingOp::add(81),
+                IndexingOp::add(70),
                IndexingOp::DeleteDoc { id: 70 }
            ],
            true,
@@ -2270,14 +2372,45 @@ mod tests {
        .is_ok());
    }

+    #[test]
+    fn test_simple_multiple_doc() {
+        assert!(test_operation_strategy(
+            &[
+                IndexingOp::AddMultipleDoc {
+                    id: 7,
+                    num_docs: 800,
+                    value: IndexValue::U64(0),
+                },
+                IndexingOp::AddMultipleDoc {
+                    id: 92,
+                    num_docs: 800,
+                    value: IndexValue::U64(0),
+                },
+                IndexingOp::AddMultipleDoc {
+                    id: 30,
+                    num_docs: 800,
+                    value: IndexValue::U64(0),
+                },
+                IndexingOp::AddMultipleDoc {
+                    id: 33,
+                    num_docs: 800,
+                    value: IndexValue::U64(0),
+                },
+            ],
+            true,
+            false
+        )
+        .is_ok());
+    }
+
    #[test]
    fn test_ip_range_query_multivalue_bug() {
        assert!(test_operation_strategy(
            &[
-                IndexingOp::AddDoc { id: 2 },
+                IndexingOp::add(2),
                IndexingOp::Commit,
-                IndexingOp::AddDoc { id: 1 },
-                IndexingOp::AddDoc { id: 1 },
+                IndexingOp::add(1),
+                IndexingOp::add(1),
                IndexingOp::Commit,
                IndexingOp::Merge
            ],
@@ -2291,11 +2424,11 @@ mod tests {
    fn test_ff_num_ips_regression() {
        assert!(test_operation_strategy(
            &[
-                IndexingOp::AddDoc { id: 13 },
-                IndexingOp::AddDoc { id: 1 },
+                IndexingOp::add(13),
+                IndexingOp::add(1),
                IndexingOp::Commit,
                IndexingOp::DeleteDocQuery { id: 13 },
-                IndexingOp::AddDoc { id: 1 },
+                IndexingOp::add(1),
                IndexingOp::Commit,
            ],
            false,
@@ -2307,7 +2440,7 @@ mod tests {
    #[test]
    fn test_minimal_sort_force_end_merge() {
        assert!(test_operation_strategy(
-            &[IndexingOp::AddDoc { id: 23 }, IndexingOp::AddDoc { id: 13 },],
+            &[IndexingOp::add(23), IndexingOp::add(13),],
            false,
            false
        )
@@ -2368,8 +2501,8 @@ mod tests {
    fn test_minimal_sort_force_end_merge_with_delete() {
        assert!(test_operation_strategy(
            &[
-                IndexingOp::AddDoc { id: 23 },
-                IndexingOp::AddDoc { id: 13 },
+                IndexingOp::add(23),
+                IndexingOp::add(13),
                IndexingOp::DeleteDoc { id: 13 }
            ],
            true,
@@ -2382,8 +2515,8 @@ mod tests {
    fn test_minimal_no_sort_no_force_end_merge() {
        assert!(test_operation_strategy(
            &[
-                IndexingOp::AddDoc { id: 23 },
-                IndexingOp::AddDoc { id: 13 },
+                IndexingOp::add(23),
+                IndexingOp::add(13),
                IndexingOp::DeleteDoc { id: 13 }
            ],
            false,
@@ -2394,7 +2527,7 @@ mod tests {

    #[test]
    fn test_minimal_sort_merge() {
-        assert!(test_operation_strategy(&[IndexingOp::AddDoc { id: 3 },], true, true).is_ok());
+        assert!(test_operation_strategy(&[IndexingOp::add(3),], true, true).is_ok());
    }

    use proptest::prelude::*;
@@ -2490,14 +2623,14 @@ mod tests {
    fn test_delete_bug_reproduction_ip_addr() {
        use IndexingOp::*;
        let ops = &[
-            AddDoc { id: 1 },
-            AddDoc { id: 2 },
+            IndexingOp::add(1),
+            IndexingOp::add(2),
            Commit,
-            AddDoc { id: 3 },
+            IndexingOp::add(3),
            DeleteDoc { id: 1 },
            Commit,
            Merge,
-            AddDoc { id: 4 },
+            IndexingOp::add(4),
            Commit,
        ];
        test_operation_strategy(&ops[..], false, true).unwrap();
@@ -2506,7 +2639,13 @@ mod tests {
    #[test]
    fn test_merge_regression_1() {
        use IndexingOp::*;
-        let ops = &[AddDoc { id: 15 }, Commit, AddDoc { id: 9 }, Commit, Merge];
+        let ops = &[
+            IndexingOp::add(15),
+            Commit,
+            IndexingOp::add(9),
+            Commit,
+            Merge,
+        ];
        test_operation_strategy(&ops[..], false, true).unwrap();
    }

@@ -2514,9 +2653,9 @@ mod tests {
    fn test_range_query_bug_1() {
        use IndexingOp::*;
        let ops = &[
-            AddDoc { id: 9 },
-            AddDoc { id: 0 },
-            AddDoc { id: 13 },
+            IndexingOp::add(9),
+            IndexingOp::add(0),
+            IndexingOp::add(13),
            Commit,
        ];
        test_operation_strategy(&ops[..], false, true).unwrap();
@@ -2524,12 +2663,11 @@ mod tests {

    #[test]
    fn test_range_query_bug_2() {
-        use IndexingOp::*;
        let ops = &[
-            AddDoc { id: 3 },
-            AddDoc { id: 6 },
-            AddDoc { id: 9 },
-            AddDoc { id: 10 },
+            IndexingOp::add(3),
+            IndexingOp::add(6),
+            IndexingOp::add(9),
+            IndexingOp::add(10),
        ];
        test_operation_strategy(&ops[..], false, false).unwrap();
    }
@@ -2551,7 +2689,7 @@ mod tests {
        assert!(test_operation_strategy(
            &[
                IndexingOp::DeleteDoc { id: 0 },
-                IndexingOp::AddDoc { id: 6 },
+                IndexingOp::add(6),
                IndexingOp::DeleteDocQuery { id: 11 },
                IndexingOp::Commit,
                IndexingOp::Merge,
@@ -2568,10 +2706,13 @@ mod tests {
    fn test_bug_1617_2() {
        assert!(test_operation_strategy(
            &[
-                IndexingOp::AddDoc { id: 13 },
+                IndexingOp::AddDoc {
+                    id: 13,
+                    value: Default::default()
+                },
                IndexingOp::DeleteDoc { id: 13 },
                IndexingOp::Commit,
-                IndexingOp::AddDoc { id: 30 },
+                IndexingOp::add(30),
                IndexingOp::Commit,
                IndexingOp::Merge,
            ],
--- a/src/indexer/log_merge_policy.rs
+++ b/src/indexer/log_merge_policy.rs
@@ -144,10 +144,9 @@ mod tests {
    use once_cell::sync::Lazy;

    use super::*;
-    use crate::index::SegmentMetaInventory;
-    use crate::indexer::merge_policy::MergePolicy;
+    use crate::index::{SegmentId, SegmentMetaInventory};
+    use crate::schema;
    use crate::schema::INDEXED;
-    use crate::{schema, SegmentId};

    static INVENTORY: Lazy<SegmentMetaInventory> = Lazy::new(SegmentMetaInventory::default);

--- a/src/indexer/merge_operation.rs
+++ b/src/indexer/merge_operation.rs
@@ -1,7 +1,8 @@
 use std::collections::HashSet;
 use std::ops::Deref;

-use crate::{Inventory, Opstamp, SegmentId, TrackedObject};
+use crate::index::SegmentId;
+use crate::{Inventory, Opstamp, TrackedObject};

 #[derive(Default)]
 pub(crate) struct MergeOperationInventory(Inventory<InnerMergeOperation>);
--- a/src/indexer/merge_policy.rs
+++ b/src/indexer/merge_policy.rs
@@ -39,7 +39,6 @@ impl MergePolicy for NoMergePolicy {
 pub mod tests {

    use super::*;
-    use crate::index::{SegmentId, SegmentMeta};

    /// `MergePolicy` useful for test purposes.
    ///
--- a/src/indexer/merger.rs
+++ b/src/indexer/merger.rs
@@ -13,7 +13,7 @@ use crate::docset::{DocSet, TERMINATED};
 use crate::error::DataCorruption;
 use crate::fastfield::{AliveBitSet, FastFieldNotAvailableError};
 use crate::fieldnorm::{FieldNormReader, FieldNormReaders, FieldNormsSerializer, FieldNormsWriter};
-use crate::index::{Segment, SegmentReader};
+use crate::index::{Segment, SegmentComponent, SegmentReader};
 use crate::indexer::doc_id_mapping::{MappingType, SegmentDocIdMapping};
 use crate::indexer::SegmentSerializer;
 use crate::postings::{InvertedIndexSerializer, Postings, SegmentPostings};
@@ -21,8 +21,7 @@ use crate::schema::{value_type_to_column_type, Field, FieldType, Schema};
 use crate::store::StoreWriter;
 use crate::termdict::{TermMerger, TermOrdinal};
 use crate::{
-    DocAddress, DocId, IndexSettings, IndexSortByField, InvertedIndexReader, Order,
-    SegmentComponent, SegmentOrdinal,
+    DocAddress, DocId, IndexSettings, IndexSortByField, InvertedIndexReader, Order, SegmentOrdinal,
 };

 /// Segment's max doc must be `< MAX_DOC_LIMIT`.
@@ -576,7 +575,7 @@ impl IndexMerger {
                    //
                    // Overall the reliable way to know if we have actual frequencies loaded or not
                    // is to check whether the actual decoded array is empty or not.
-                    if has_term_freq != !postings.block_cursor.freqs().is_empty() {
+                    if has_term_freq == postings.block_cursor.freqs().is_empty() {
                        return Err(DataCorruption::comment_only(
                            "Term freqs are inconsistent across segments",
                        )
@@ -788,23 +787,25 @@ impl IndexMerger {
 mod tests {

    use columnar::Column;
+    use proptest::prop_oneof;
+    use proptest::strategy::Strategy;
    use schema::FAST;

    use crate::collector::tests::{
        BytesFastFieldTestCollector, FastFieldTestCollector, TEST_COLLECTOR_WITH_SCORE,
    };
    use crate::collector::{Count, FacetCollector};
-    use crate::index::Index;
+    use crate::index::{Index, SegmentId};
+    use crate::indexer::NoMergePolicy;
    use crate::query::{AllQuery, BooleanQuery, EnableScoring, Scorer, TermQuery};
-    use crate::schema::document::Value;
    use crate::schema::{
        Facet, FacetOptions, IndexRecordOption, NumericOptions, TantivyDocument, Term,
-        TextFieldIndexing, INDEXED, TEXT,
+        TextFieldIndexing, Value, INDEXED, TEXT,
    };
    use crate::time::OffsetDateTime;
    use crate::{
        assert_nearly_equals, schema, DateTime, DocAddress, DocId, DocSet, IndexSettings,
-        IndexSortByField, IndexWriter, Order, Searcher, SegmentId,
+        IndexSortByField, IndexWriter, Order, Searcher,
    };

    #[test]
@@ -911,15 +912,24 @@ mod tests {
            }
            {
                let doc = searcher.doc::<TantivyDocument>(DocAddress::new(0, 0))?;
-                assert_eq!(doc.get_first(text_field).unwrap().as_str(), Some("af b"));
+                assert_eq!(
+                    doc.get_first(text_field).unwrap().as_value().as_str(),
+                    Some("af b")
+                );
            }
            {
                let doc = searcher.doc::<TantivyDocument>(DocAddress::new(0, 1))?;
-                assert_eq!(doc.get_first(text_field).unwrap().as_str(), Some("a b c"));
+                assert_eq!(
+                    doc.get_first(text_field).unwrap().as_value().as_str(),
+                    Some("a b c")
+                );
            }
            {
                let doc = searcher.doc::<TantivyDocument>(DocAddress::new(0, 2))?;
-                assert_eq!(doc.get_first(text_field).unwrap().as_str(), Some("a b c d"));
+                assert_eq!(
+                    doc.get_first(text_field).unwrap().as_value().as_str(),
+                    Some("a b c d")
+                );
            }
            {
                let doc = searcher.doc::<TantivyDocument>(DocAddress::new(0, 3))?;
@@ -1524,6 +1534,112 @@ mod tests {
        Ok(())
    }

+    #[derive(Debug, Clone, Copy, Eq, PartialEq)]
+    enum IndexingOp {
+        ZeroVal,
+        OneVal { val: u64 },
+        TwoVal { val: u64 },
+        Commit,
+    }
+
+    fn balanced_operation_strategy() -> impl Strategy<Value = IndexingOp> {
+        prop_oneof![
+            (0u64..1u64).prop_map(|_| IndexingOp::ZeroVal),
+            (0u64..1u64).prop_map(|val| IndexingOp::OneVal { val }),
+            (0u64..1u64).prop_map(|val| IndexingOp::TwoVal { val }),
+            (0u64..1u64).prop_map(|_| IndexingOp::Commit),
+        ]
+    }
+
+    use proptest::prelude::*;
+    proptest! {
+        #[test]
+        fn test_merge_columnar_int_proptest(ops in proptest::collection::vec(balanced_operation_strategy(), 1..20)) {
+            assert!(test_merge_int_fields(&ops[..]).is_ok());
+        }
+    }
+    fn test_merge_int_fields(ops: &[IndexingOp]) -> crate::Result<()> {
+        if ops.iter().all(|op| *op == IndexingOp::Commit) {
+            return Ok(());
+        }
+        let expected_doc_and_vals: Vec<(u32, Vec<u64>)> = ops
+            .iter()
+            .filter(|op| *op != &IndexingOp::Commit)
+            .map(|op| match op {
+                IndexingOp::ZeroVal => vec![],
+                IndexingOp::OneVal { val } => vec![*val],
+                IndexingOp::TwoVal { val } => vec![*val, *val],
+                IndexingOp::Commit => unreachable!(),
+            })
+            .enumerate()
+            .map(|(id, val)| (id as u32, val))
+            .collect();
+
+        let mut schema_builder = schema::Schema::builder();
+        let int_options = NumericOptions::default().set_fast().set_indexed();
+        let int_field = schema_builder.add_u64_field("intvals", int_options);
+        let index = Index::create_in_ram(schema_builder.build());
+        {
+            let mut index_writer = index.writer_for_tests()?;
+            index_writer.set_merge_policy(Box::new(NoMergePolicy));
+            let index_doc = |index_writer: &mut IndexWriter, int_vals: &[u64]| {
+                let mut doc = TantivyDocument::default();
+                for &val in int_vals {
+                    doc.add_u64(int_field, val);
+                }
+                index_writer.add_document(doc).unwrap();
+            };
+
+            for op in ops {
+                match op {
+                    IndexingOp::ZeroVal => index_doc(&mut index_writer, &[]),
+                    IndexingOp::OneVal { val } => index_doc(&mut index_writer, &[*val]),
+                    IndexingOp::TwoVal { val } => index_doc(&mut index_writer, &[*val, *val]),
+                    IndexingOp::Commit => {
+                        index_writer.commit().expect("commit failed");
+                    }
+                }
+            }
+            index_writer.commit().expect("commit failed");
+        }
+        {
+            let mut segment_ids = index.searchable_segment_ids()?;
+            segment_ids.sort();
+            let mut index_writer: IndexWriter = index.writer_for_tests()?;
+            index_writer.merge(&segment_ids).wait()?;
+            index_writer.wait_merging_threads()?;
+        }
+        let reader = index.reader()?;
+        reader.reload()?;
+
+        let mut vals: Vec<u64> = Vec::new();
+        let mut test_vals = move |col: &Column<u64>, doc: DocId, expected: &[u64]| {
+            vals.clear();
+            vals.extend(col.values_for_doc(doc));
+            assert_eq!(&vals[..], expected);
+        };
+
+        let mut test_col = move |col: &Column<u64>, column_expected: &[(u32, Vec<u64>)]| {
+            for (doc_id, vals) in column_expected.iter() {
+                test_vals(col, *doc_id, vals);
+            }
+        };
+
+        {
+            let searcher = reader.searcher();
+            let segment = searcher.segment_reader(0u32);
+            let col = segment
+                .fast_fields()
+                .column_opt::<u64>("intvals")
+                .unwrap()
+                .unwrap();
+
+            test_col(&col, &expected_doc_and_vals);
+        }
+
+        Ok(())
+    }
+
    #[test]
    fn test_merge_multivalued_int_fields_simple() -> crate::Result<()> {
        let mut schema_builder = schema::Schema::builder();
--- a/src/indexer/merger_sorted_index_test.rs
+++ b/src/indexer/merger_sorted_index_test.rs
@@ -3,15 +3,15 @@ mod tests {
    use crate::collector::TopDocs;
    use crate::fastfield::AliveBitSet;
    use crate::index::Index;
+    use crate::postings::Postings;
    use crate::query::QueryParser;
-    use crate::schema::document::Value;
    use crate::schema::{
        self, BytesOptions, Facet, FacetOptions, IndexRecordOption, NumericOptions,
-        TextFieldIndexing, TextOptions,
+        TextFieldIndexing, TextOptions, Value,
    };
    use crate::{
-        DocAddress, DocSet, IndexSettings, IndexSortByField, IndexWriter, Order, Postings,
-        TantivyDocument, Term,
+        DocAddress, DocSet, IndexSettings, IndexSortByField, IndexWriter, Order, TantivyDocument,
+        Term,
    };

    fn create_test_index_posting_list_issue(index_settings: Option<IndexSettings>) -> Index {
@@ -280,13 +280,16 @@ mod tests {
                .doc::<TantivyDocument>(DocAddress::new(0, blubber_pos))
                .unwrap();
            assert_eq!(
-                doc.get_first(my_text_field).unwrap().as_str(),
+                doc.get_first(my_text_field).unwrap().as_value().as_str(),
                Some("blubber")
            );
            let doc = searcher
                .doc::<TantivyDocument>(DocAddress::new(0, 0))
                .unwrap();
-            assert_eq!(doc.get_first(int_field).unwrap().as_u64(), Some(1000));
+            assert_eq!(
+                doc.get_first(int_field).unwrap().as_value().as_u64(),
+                Some(1000)
+            );
        }
    }

--- a/src/indexer/mod.rs
+++ b/src/indexer/mod.rs
@@ -144,6 +144,181 @@ mod tests_mmap {
            assert_eq!(num_docs, 256);
        }
    }
+    #[test]
+    fn test_json_field_null_byte() {
+        // Test when field name contains a zero byte, which has special meaning in tantivy.
+        // As a workaround, we convert the zero byte to the ASCII character '0'.
+        // https://github.com/quickwit-oss/tantivy/issues/2340
+        // https://github.com/quickwit-oss/tantivy/issues/2193
+        let field_name_in = "\u{0000}";
+        let field_name_out = "0";
+        test_json_field_name(field_name_in, field_name_out);
+    }
+    #[test]
+    fn test_json_field_1byte() {
+        // Test when field name contains a '1' byte, which has special meaning in tantivy.
+        // The 1 byte can be addressed as '1' byte or '.'.
+        let field_name_in = "\u{0001}";
+        let field_name_out = "\u{0001}";
+        test_json_field_name(field_name_in, field_name_out);
+
+        // Test when field name contains a '1' byte, which has special meaning in tantivy.
+        let field_name_in = "\u{0001}";
+        let field_name_out = ".";
+        test_json_field_name(field_name_in, field_name_out);
+    }
+    #[test]
+    fn test_json_field_dot() {
+        // Test when field name contains a '.'
+        let field_name_in = ".";
+        let field_name_out = ".";
+        test_json_field_name(field_name_in, field_name_out);
+    }
+    fn test_json_field_name(field_name_in: &str, field_name_out: &str) {
+        let mut schema_builder = Schema::builder();
+
+        let options = JsonObjectOptions::from(TEXT | FAST).set_expand_dots_enabled();
+        let field = schema_builder.add_json_field("json", options);
+        let index = Index::create_in_ram(schema_builder.build());
+        let mut index_writer = index.writer_for_tests().unwrap();
+        index_writer
+            .add_document(doc!(field=>json!({format!("{field_name_in}"): "test1", format!("num{field_name_in}"): 10})))
+            .unwrap();
+        index_writer
+            .add_document(doc!(field=>json!({format!("a{field_name_in}"): "test2"})))
+            .unwrap();
+        index_writer
+            .add_document(doc!(field=>json!({format!("a{field_name_in}a"): "test3"})))
+            .unwrap();
+        index_writer
+            .add_document(
+                doc!(field=>json!({format!("a{field_name_in}a{field_name_in}"): "test4"})),
+            )
+            .unwrap();
+        index_writer
+            .add_document(
+                doc!(field=>json!({format!("a{field_name_in}.ab{field_name_in}"): "test5"})),
+            )
+            .unwrap();
+        index_writer
+            .add_document(
+                doc!(field=>json!({format!("a{field_name_in}"): json!({format!("a{field_name_in}"): "test6"}) })),
+            )
+            .unwrap();
+        index_writer
+            .add_document(doc!(field=>json!({format!("{field_name_in}a" ): "test7"})))
+            .unwrap();
+
+        index_writer.commit().unwrap();
+        let reader = index.reader().unwrap();
+        let searcher = reader.searcher();
+        let parse_query = QueryParser::for_index(&index, Vec::new());
+        let test_query = |query_str: &str| {
+            let query = parse_query.parse_query(query_str).unwrap();
+            let num_docs = searcher.search(&query, &Count).unwrap();
+            assert_eq!(num_docs, 1, "{query_str}");
+        };
+        test_query(format!("json.{field_name_out}:test1").as_str());
+        test_query(format!("json.a{field_name_out}:test2").as_str());
+        test_query(format!("json.a{field_name_out}a:test3").as_str());
+        test_query(format!("json.a{field_name_out}a{field_name_out}:test4").as_str());
+        test_query(format!("json.a{field_name_out}.ab{field_name_out}:test5").as_str());
+        test_query(format!("json.a{field_name_out}.a{field_name_out}:test6").as_str());
+        test_query(format!("json.{field_name_out}a:test7").as_str());
+
+        let test_agg = |field_name: &str, expected: &str| {
+            let agg_req_str = json!(
+            {
+              "termagg": {
+                "terms": {
+                  "field": field_name,
+                }
+              }
+            });
+
+            let agg_req: Aggregations = serde_json::from_value(agg_req_str).unwrap();
+            let collector = AggregationCollector::from_aggs(agg_req, Default::default());
+            let agg_res: AggregationResults = searcher.search(&AllQuery, &collector).unwrap();
+            let res = serde_json::to_value(agg_res).unwrap();
+            assert_eq!(res["termagg"]["buckets"][0]["doc_count"], 1);
+            assert_eq!(res["termagg"]["buckets"][0]["key"], expected);
+        };
+
+        test_agg(format!("json.{field_name_out}").as_str(), "test1");
+        test_agg(format!("json.a{field_name_out}").as_str(), "test2");
+        test_agg(format!("json.a{field_name_out}a").as_str(), "test3");
+        test_agg(
+            format!("json.a{field_name_out}a{field_name_out}").as_str(),
+            "test4",
+        );
+        test_agg(
+            format!("json.a{field_name_out}.ab{field_name_out}").as_str(),
+            "test5",
+        );
+        test_agg(
+            format!("json.a{field_name_out}.a{field_name_out}").as_str(),
+            "test6",
+        );
+        test_agg(format!("json.{field_name_out}a").as_str(), "test7");
+
+        // `.` is stored as `\u{0001}` internally in tantivy
+        let field_name_out_internal = if field_name_out == "." {
+            "\u{0001}"
+        } else {
+            field_name_out
+        };
+
+        let mut fields = reader.searcher().segment_readers()[0]
+            .inverted_index(field)
+            .unwrap()
+            .list_encoded_fields()
+            .unwrap();
+        assert_eq!(fields.len(), 8);
+        fields.sort();
+        let mut expected_fields = vec![
+            (format!("a{field_name_out_internal}"), Type::Str),
+            (format!("a{field_name_out_internal}a"), Type::Str),
+            (
+                format!("a{field_name_out_internal}a{field_name_out_internal}"),
+                Type::Str,
+            ),
+            (
+                format!("a{field_name_out_internal}\u{1}ab{field_name_out_internal}"),
+                Type::Str,
+            ),
+            (
+                format!("a{field_name_out_internal}\u{1}a{field_name_out_internal}"),
+                Type::Str,
+            ),
+            (format!("{field_name_out_internal}a"), Type::Str),
+            (format!("{field_name_out_internal}"), Type::Str),
+            (format!("num{field_name_out_internal}"), Type::I64),
+        ];
+        expected_fields.sort();
+        assert_eq!(fields, expected_fields);
+        // Check columnar reader
+        let mut columns = reader.searcher().segment_readers()[0]
+            .fast_fields()
+            .columnar()
+            .list_columns()
+            .unwrap()
+            .into_iter()
+            .map(|(name, _)| name)
+            .collect::<Vec<_>>();
+        let mut expected_columns = vec![
+            format!("json\u{1}{field_name_out_internal}"),
+            format!("json\u{1}{field_name_out_internal}a"),
+            format!("json\u{1}a{field_name_out_internal}"),
+            format!("json\u{1}a{field_name_out_internal}a"),
+            format!("json\u{1}a{field_name_out_internal}a{field_name_out_internal}"),
+            format!("json\u{1}a{field_name_out_internal}\u{1}ab{field_name_out_internal}"),
+            format!("json\u{1}a{field_name_out_internal}\u{1}a{field_name_out_internal}"),
+            format!("json\u{1}num{field_name_out_internal}"),
+        ];
+        columns.sort();
+        expected_columns.sort();
+        assert_eq!(columns, expected_columns);
+    }

    #[test]
    fn test_json_field_expand_dots_enabled_dot_escape_not_required() {
@@ -415,10 +590,10 @@ mod tests_mmap {
        let query_parser = QueryParser::for_index(&index, vec![]);
        // Test if field name can be queried
        for (indexed_field, val) in fields_and_vals.iter() {
-            let query_str = &format!("{}:{}", indexed_field, val);
+            let query_str = &format!("{indexed_field}:{val}");
            let query = query_parser.parse_query(query_str).unwrap();
            let count_docs = searcher.search(&*query, &TopDocs::with_limit(2)).unwrap();
-            assert!(!count_docs.is_empty(), "{}:{}", indexed_field, val);
+            assert!(!count_docs.is_empty(), "{indexed_field}:{val}");
        }
        // Test if field name can be used for aggregation
        for (field_name, val) in fields_and_vals.iter() {
--- a/src/indexer/segment_writer.rs
+++ b/src/indexer/segment_writer.rs
@@ -5,20 +5,20 @@ use tokenizer_api::BoxTokenStream;

 use super::doc_id_mapping::{get_doc_id_mapping_from_field, DocIdMapping};
 use super::operation::AddOperation;
-use crate::core::json_utils::index_json_values;
 use crate::fastfield::FastFieldsWriter;
 use crate::fieldnorm::{FieldNormReaders, FieldNormsWriter};
-use crate::index::Segment;
+use crate::index::{Segment, SegmentComponent};
 use crate::indexer::segment_serializer::SegmentSerializer;
+use crate::json_utils::{index_json_value, IndexingPositionsPerPath};
 use crate::postings::{
    compute_table_memory_size, serialize_postings, IndexingContext, IndexingPosition,
    PerFieldPostingsWriter, PostingsWriter,
 };
-use crate::schema::document::{Document, ReferenceValue, Value};
+use crate::schema::document::{Document, Value};
 use crate::schema::{FieldEntry, FieldType, Schema, Term, DATE_TIME_PRECISION_INDEXED};
 use crate::store::{StoreReader, StoreWriter};
 use crate::tokenizer::{FacetTokenizer, PreTokenizedStream, TextAnalyzer, Tokenizer};
-use crate::{DocId, Opstamp, SegmentComponent, TantivyError};
+use crate::{DocId, Opstamp, TantivyError};

 /// Computes the initial size of the hash table.
 ///
@@ -68,6 +68,7 @@ pub struct SegmentWriter {
    pub(crate) fast_field_writers: FastFieldsWriter,
    pub(crate) fieldnorms_writer: FieldNormsWriter,
    pub(crate) json_path_writer: JsonPathWriter,
+    pub(crate) json_positions_per_path: IndexingPositionsPerPath,
    pub(crate) doc_opstamps: Vec<Opstamp>,
    per_field_text_analyzers: Vec<TextAnalyzer>,
    term_buffer: Term,
@@ -119,6 +120,7 @@ impl SegmentWriter {
            per_field_postings_writers,
            fieldnorms_writer: FieldNormsWriter::for_schema(&schema),
            json_path_writer: JsonPathWriter::default(),
+            json_positions_per_path: IndexingPositionsPerPath::default(),
            segment_serializer,
            fast_field_writers: FastFieldsWriter::from_schema_and_tokenizer_manager(
                &schema,
@@ -200,12 +202,10 @@ impl SegmentWriter {
            match field_entry.field_type() {
                FieldType::Facet(_) => {
                    let mut facet_tokenizer = FacetTokenizer::default(); // this can be global
-                    for value_access in values {
-                        // Used to help with linting and type checking.
-                        let value = value_access as D::Value<'_>;
+                    for value in values {
+                        let value = value.as_value();

-                        let facet = value.as_facet().ok_or_else(make_schema_error)?;
-                        let facet_str = facet.encoded_str();
+                        let facet_str = value.as_facet().ok_or_else(make_schema_error)?;
                        let mut facet_tokenizer = facet_tokenizer.token_stream(facet_str);
                        let mut indexing_position = IndexingPosition::default();
                        postings_writer.index_text(
@@ -219,16 +219,15 @@ impl SegmentWriter {
                }
                FieldType::Str(_) => {
                    let mut indexing_position = IndexingPosition::default();
-                    for value_access in values {
-                        // Used to help with linting and type checking.
-                        let value = value_access as D::Value<'_>;
+                    for value in values {
+                        let value = value.as_value();

                        let mut token_stream = if let Some(text) = value.as_str() {
                            let text_analyzer =
                                &mut self.per_field_text_analyzers[field.field_id() as usize];
                            text_analyzer.token_stream(text)
-                        } else if let Some(tok_str) = value.as_pre_tokenized_text() {
-                            BoxTokenStream::new(PreTokenizedStream::from(tok_str.clone()))
+                        } else if let Some(tok_str) = value.into_pre_tokenized_text() {
+                            BoxTokenStream::new(PreTokenizedStream::from(*tok_str.clone()))
                        } else {
                            continue;
                        };
@@ -249,9 +248,8 @@ impl SegmentWriter {
                }
                FieldType::U64(_) => {
                    let mut num_vals = 0;
-                    for value_access in values {
-                        // Used to help with linting and type checking.
-                        let value = value_access as D::Value<'_>;
+                    for value in values {
+                        let value = value.as_value();

                        num_vals += 1;
                        let u64_val = value.as_u64().ok_or_else(make_schema_error)?;
@@ -264,10 +262,8 @@ impl SegmentWriter {
                }
                FieldType::Date(_) => {
                    let mut num_vals = 0;
-                    for value_access in values {
-                        // Used to help with linting and type checking.
-                        let value_access = value_access as D::Value<'_>;
-                        let value = value_access.as_value();
+                    for value in values {
+                        let value = value.as_value();

                        num_vals += 1;
                        let date_val = value.as_datetime().ok_or_else(make_schema_error)?;
@@ -281,9 +277,8 @@ impl SegmentWriter {
                }
                FieldType::I64(_) => {
                    let mut num_vals = 0;
-                    for value_access in values {
-                        // Used to help with linting and type checking.
-                        let value = value_access as D::Value<'_>;
+                    for value in values {
+                        let value = value.as_value();

                        num_vals += 1;
                        let i64_val = value.as_i64().ok_or_else(make_schema_error)?;
@@ -296,10 +291,8 @@ impl SegmentWriter {
                }
                FieldType::F64(_) => {
                    let mut num_vals = 0;
-                    for value_access in values {
-                        // Used to help with linting and type checking.
-                        let value = value_access as D::Value<'_>;
-
+                    for value in values {
+                        let value = value.as_value();
                        num_vals += 1;
                        let f64_val = value.as_f64().ok_or_else(make_schema_error)?;
                        term_buffer.set_f64(f64_val);
@@ -311,10 +304,8 @@ impl SegmentWriter {
                }
                FieldType::Bool(_) => {
                    let mut num_vals = 0;
-                    for value_access in values {
-                        // Used to help with linting and type checking.
-                        let value = value_access as D::Value<'_>;
-
+                    for value in values {
+                        let value = value.as_value();
                        num_vals += 1;
                        let bool_val = value.as_bool().ok_or_else(make_schema_error)?;
                        term_buffer.set_bool(bool_val);
@@ -326,10 +317,8 @@ impl SegmentWriter {
                }
                FieldType::Bytes(_) => {
                    let mut num_vals = 0;
-                    for value_access in values {
-                        // Used to help with linting and type checking.
-                        let value = value_access as D::Value<'_>;
-
+                    for value in values {
+                        let value = value.as_value();
                        num_vals += 1;
                        let bytes = value.as_bytes().ok_or_else(make_schema_error)?;
                        term_buffer.set_bytes(bytes);
@@ -342,32 +331,29 @@ impl SegmentWriter {
                FieldType::JsonObject(json_options) => {
                    let text_analyzer =
                        &mut self.per_field_text_analyzers[field.field_id() as usize];
-                    let json_values_it = values.map(|value_access| {
-                        // Used to help with linting and type checking.
-                        let value_access = value_access as D::Value<'_>;
-                        let value = value_access.as_value();

-                        match value {
-                            ReferenceValue::Object(object_iter) => Ok(object_iter),
-                            _ => Err(make_schema_error()),
-                        }
-                    });
-                    index_json_values::<D::Value<'_>>(
-                        doc_id,
-                        json_values_it,
-                        text_analyzer,
-                        json_options.is_expand_dots_enabled(),
-                        term_buffer,
-                        postings_writer,
-                        &mut self.json_path_writer,
-                        ctx,
-                    )?;
+                    self.json_positions_per_path.clear();
+                    self.json_path_writer
+                        .set_expand_dots(json_options.is_expand_dots_enabled());
+                    for json_value in values {
+                        self.json_path_writer.clear();
+
+                        index_json_value(
+                            doc_id,
+                            json_value,
+                            text_analyzer,
+                            term_buffer,
+                            &mut self.json_path_writer,
+                            postings_writer,
+                            ctx,
+                            &mut self.json_positions_per_path,
+                        );
+                    }
                }
                FieldType::IpAddr(_) => {
                    let mut num_vals = 0;
-                    for value_access in values {
-                        // Used to help with linting and type checking.
-                        let value = value_access as D::Value<'_>;
+                    for value in values {
+                        let value = value.as_value();

                        num_vals += 1;
                        let ip_addr = value.as_ip_addr().ok_or_else(make_schema_error)?;
@@ -496,22 +482,21 @@ mod tests {
    use tempfile::TempDir;

    use crate::collector::{Count, TopDocs};
-    use crate::core::json_utils::JsonTermWriter;
    use crate::directory::RamDirectory;
-    use crate::postings::TermInfo;
+    use crate::fastfield::FastValue;
+    use crate::postings::{Postings, TermInfo};
    use crate::query::{PhraseQuery, QueryParser};
-    use crate::schema::document::Value;
    use crate::schema::{
-        Document, IndexRecordOption, Schema, TextFieldIndexing, TextOptions, Type, STORED, STRING,
-        TEXT,
+        Document, IndexRecordOption, OwnedValue, Schema, TextFieldIndexing, TextOptions, Value,
+        STORED, STRING, TEXT,
    };
    use crate::store::{Compressor, StoreReader, StoreWriter};
    use crate::time::format_description::well_known::Rfc3339;
    use crate::time::OffsetDateTime;
    use crate::tokenizer::{PreTokenizedString, Token};
    use crate::{
-        DateTime, Directory, DocAddress, DocSet, Index, IndexWriter, Postings, TantivyDocument,
-        Term, TERMINATED,
+        DateTime, Directory, DocAddress, DocSet, Index, IndexWriter, TantivyDocument, Term,
+        TERMINATED,
    };

    #[test]
@@ -556,9 +541,15 @@ mod tests {
        let reader = StoreReader::open(directory.open_read(path).unwrap(), 0).unwrap();
        let doc = reader.get::<TantivyDocument>(0).unwrap();

-        assert_eq!(doc.field_values().len(), 2);
-        assert_eq!(doc.field_values()[0].value().as_str(), Some("A"));
-        assert_eq!(doc.field_values()[1].value().as_str(), Some("title"));
+        assert_eq!(doc.field_values().count(), 2);
+        assert_eq!(
+            doc.get_all(text_field).next().unwrap().as_value().as_str(),
+            Some("A")
+        );
+        assert_eq!(
+            doc.get_all(text_field).nth(1).unwrap().as_value().as_str(),
+            Some("title")
+        );
    }
    #[test]
    fn test_simple_json_indexing() {
@@ -598,12 +589,51 @@ mod tests {
        assert_eq!(score_docs.len(), 2);
    }

+    #[test]
+    fn test_flat_json_indexing() {
+        // A JSON Object that contains mixed values on the first level
+        let mut schema_builder = Schema::builder();
+        let json_field = schema_builder.add_json_field("json", STORED | STRING);
+        let schema = schema_builder.build();
+        let index = Index::create_in_ram(schema.clone());
+        let mut writer = index.writer_for_tests().unwrap();
+        // Text, i64, u64
+        writer.add_document(doc!(json_field=>"b")).unwrap();
+        writer
+            .add_document(doc!(json_field=>OwnedValue::I64(10i64)))
+            .unwrap();
+        writer
+            .add_document(doc!(json_field=>OwnedValue::U64(55u64)))
+            .unwrap();
+        writer
+            .add_document(doc!(json_field=>json!({"my_field": "a"})))
+            .unwrap();
+        writer.commit().unwrap();
+
+        let search_and_expect = |query| {
+            let query_parser = QueryParser::for_index(&index, vec![json_field]);
+            let text_query = query_parser.parse_query(query).unwrap();
+            let score_docs: Vec<(_, DocAddress)> = index
+                .reader()
+                .unwrap()
+                .searcher()
+                .search(&text_query, &TopDocs::with_limit(4))
+                .unwrap();
+            assert_eq!(score_docs.len(), 1);
+        };
+
+        search_and_expect("my_field:a");
+        search_and_expect("b");
+        search_and_expect("10");
+        search_and_expect("55");
+    }
+
    #[test]
    fn test_json_indexing() {
        let mut schema_builder = Schema::builder();
        let json_field = schema_builder.add_json_field("json", STORED | TEXT);
        let schema = schema_builder.build();
-        let json_val: serde_json::Map<String, serde_json::Value> = serde_json::from_str(
+        let json_val: serde_json::Value = serde_json::from_str(
            r#"{
            "toto": "titi",
            "float": -0.2,
@@ -631,129 +661,125 @@ mod tests {
                doc_id: 0u32,
            })
            .unwrap();
-        let serdeser_json_val = serde_json::from_str::<serde_json::Map<String, serde_json::Value>>(
-            &doc.to_json(&schema),
-        )
-        .unwrap()
-        .get("json")
-        .unwrap()[0]
-            .as_object()
+        let serdeser_json_val = serde_json::from_str::<serde_json::Value>(&doc.to_json(&schema))
            .unwrap()
+            .get("json")
+            .unwrap()[0]
            .clone();
        assert_eq!(json_val, serdeser_json_val);
        let segment_reader = searcher.segment_reader(0u32);
        let inv_idx = segment_reader.inverted_index(json_field).unwrap();
        let term_dict = inv_idx.terms();

-        let mut term = Term::with_type_and_field(Type::Json, json_field);
        let mut term_stream = term_dict.stream().unwrap();

-        let mut json_term_writer = JsonTermWriter::wrap(&mut term, false);
+        let term_from_path =
+            |path: &str| -> Term { Term::from_field_json_path(json_field, path, false) };

-        json_term_writer.push_path_segment("bool");
-        json_term_writer.set_fast_value(true);
+        fn set_fast_val<T: FastValue>(val: T, mut term: Term) -> Term {
+            term.append_type_and_fast_value(val);
+            term
+        }
+        fn set_str(val: &str, mut term: Term) -> Term {
+            term.append_type_and_str(val);
+            term
+        }
+
+        let term = term_from_path("bool");
        assert!(term_stream.advance());
        assert_eq!(
            term_stream.key(),
-            json_term_writer.term().serialized_value_bytes()
+            set_fast_val(true, term).serialized_value_bytes()
        );

-        json_term_writer.pop_path_segment();
-        json_term_writer.push_path_segment("complexobject");
-        json_term_writer.push_path_segment("field.with.dot");
-        json_term_writer.set_fast_value(1i64);
+        let term = term_from_path("complexobject.field\\.with\\.dot");
        assert!(term_stream.advance());
        assert_eq!(
            term_stream.key(),
-            json_term_writer.term().serialized_value_bytes()
+            set_fast_val(1i64, term).serialized_value_bytes()
        );

-        json_term_writer.pop_path_segment();
-        json_term_writer.pop_path_segment();
-        json_term_writer.push_path_segment("date");
-        json_term_writer.set_fast_value(DateTime::from_utc(
-            OffsetDateTime::parse("1985-04-12T23:20:50.52Z", &Rfc3339).unwrap(),
-        ));
+        // Date
+        let term = term_from_path("date");
+
        assert!(term_stream.advance());
        assert_eq!(
            term_stream.key(),
-            json_term_writer.term().serialized_value_bytes()
+            set_fast_val(
+                DateTime::from_utc(
+                    OffsetDateTime::parse("1985-04-12T23:20:50.52Z", &Rfc3339).unwrap(),
+                ),
+                term
+            )
+            .serialized_value_bytes()
        );

-        json_term_writer.pop_path_segment();
-        json_term_writer.push_path_segment("float");
-        json_term_writer.set_fast_value(-0.2f64);
+        // Float
+        let term = term_from_path("float");
        assert!(term_stream.advance());
        assert_eq!(
            term_stream.key(),
-            json_term_writer.term().serialized_value_bytes()
+            set_fast_val(-0.2f64, term).serialized_value_bytes()
        );

-        json_term_writer.pop_path_segment();
-        json_term_writer.push_path_segment("my_arr");
-        json_term_writer.set_fast_value(2i64);
+        // Number In Array
+        let term = term_from_path("my_arr");
        assert!(term_stream.advance());
        assert_eq!(
            term_stream.key(),
-            json_term_writer.term().serialized_value_bytes()
+            set_fast_val(2i64, term).serialized_value_bytes()
        );

-        json_term_writer.set_fast_value(3i64);
+        let term = term_from_path("my_arr");
        assert!(term_stream.advance());
        assert_eq!(
            term_stream.key(),
-            json_term_writer.term().serialized_value_bytes()
+            set_fast_val(3i64, term).serialized_value_bytes()
        );

-        json_term_writer.set_fast_value(4i64);
+        let term = term_from_path("my_arr");
        assert!(term_stream.advance());
        assert_eq!(
            term_stream.key(),
-            json_term_writer.term().serialized_value_bytes()
+            set_fast_val(4i64, term).serialized_value_bytes()
        );

-        json_term_writer.push_path_segment("my_key");
-        json_term_writer.set_str("tokens");
+        // El in Array
+        let term = term_from_path("my_arr.my_key");
        assert!(term_stream.advance());
        assert_eq!(
            term_stream.key(),
-            json_term_writer.term().serialized_value_bytes()
+            set_str("tokens", term).serialized_value_bytes()
        );
-
-        json_term_writer.set_str("two");
+        let term = term_from_path("my_arr.my_key");
        assert!(term_stream.advance());
        assert_eq!(
            term_stream.key(),
-            json_term_writer.term().serialized_value_bytes()
+            set_str("two", term).serialized_value_bytes()
        );

-        json_term_writer.pop_path_segment();
-        json_term_writer.pop_path_segment();
-        json_term_writer.push_path_segment("signed");
-        json_term_writer.set_fast_value(-2i64);
+        // Signed
+        let term = term_from_path("signed");
        assert!(term_stream.advance());
        assert_eq!(
            term_stream.key(),
-            json_term_writer.term().serialized_value_bytes()
+            set_fast_val(-2i64, term).serialized_value_bytes()
        );

-        json_term_writer.pop_path_segment();
-        json_term_writer.push_path_segment("toto");
-        json_term_writer.set_str("titi");
+        let term = term_from_path("toto");
        assert!(term_stream.advance());
        assert_eq!(
            term_stream.key(),
-            json_term_writer.term().serialized_value_bytes()
+            set_str("titi", term).serialized_value_bytes()
        );
-
-        json_term_writer.pop_path_segment();
-        json_term_writer.push_path_segment("unsigned");
-        json_term_writer.set_fast_value(1i64);
+        // Unsigned
+        let term = term_from_path("unsigned");
        assert!(term_stream.advance());
        assert_eq!(
            term_stream.key(),
-            json_term_writer.term().serialized_value_bytes()
+            set_fast_val(1i64, term).serialized_value_bytes()
        );
+
        assert!(!term_stream.advance());
    }

@@ -774,14 +800,9 @@ mod tests {
        let searcher = reader.searcher();
        let segment_reader = searcher.segment_reader(0u32);
        let inv_index = segment_reader.inverted_index(json_field).unwrap();
-        let mut term = Term::with_type_and_field(Type::Json, json_field);
-        let mut json_term_writer = JsonTermWriter::wrap(&mut term, false);
-        json_term_writer.push_path_segment("mykey");
-        json_term_writer.set_str("token");
-        let term_info = inv_index
-            .get_term_info(json_term_writer.term())
-            .unwrap()
-            .unwrap();
+        let mut term = Term::from_field_json_path(json_field, "mykey", false);
+        term.append_type_and_str("token");
+        let term_info = inv_index.get_term_info(&term).unwrap().unwrap();
        assert_eq!(
            term_info,
            TermInfo {
@@ -807,7 +828,7 @@ mod tests {
        let mut schema_builder = Schema::builder();
        let json_field = schema_builder.add_json_field("json", STRING);
        let schema = schema_builder.build();
-        let json_val: serde_json::Map<String, serde_json::Value> =
+        let json_val: serde_json::Value =
            serde_json::from_str(r#"{"mykey": "two tokens"}"#).unwrap();
        let doc = doc!(json_field=>json_val);
        let index = Index::create_in_ram(schema);
@@ -818,14 +839,9 @@ mod tests {
        let searcher = reader.searcher();
        let segment_reader = searcher.segment_reader(0u32);
        let inv_index = segment_reader.inverted_index(json_field).unwrap();
-        let mut term = Term::with_type_and_field(Type::Json, json_field);
-        let mut json_term_writer = JsonTermWriter::wrap(&mut term, false);
-        json_term_writer.push_path_segment("mykey");
-        json_term_writer.set_str("two tokens");
-        let term_info = inv_index
-            .get_term_info(json_term_writer.term())
-            .unwrap()
-            .unwrap();
+        let mut term = Term::from_field_json_path(json_field, "mykey", false);
+        term.append_type_and_str("two tokens");
+        let term_info = inv_index.get_term_info(&term).unwrap().unwrap();
        assert_eq!(
            term_info,
            TermInfo {
@@ -852,7 +868,7 @@ mod tests {
        let mut schema_builder = Schema::builder();
        let json_field = schema_builder.add_json_field("json", TEXT);
        let schema = schema_builder.build();
-        let json_val: serde_json::Map<String, serde_json::Value> = serde_json::from_str(
+        let json_val: serde_json::Value = serde_json::from_str(
            r#"{"mykey": [{"field": "hello happy tax payer"}, {"field": "nothello"}]}"#,
        )
        .unwrap();
@@ -863,16 +879,18 @@ mod tests {
        writer.commit().unwrap();
        let reader = index.reader().unwrap();
        let searcher = reader.searcher();
-        let mut term = Term::with_type_and_field(Type::Json, json_field);
-        let mut json_term_writer = JsonTermWriter::wrap(&mut term, false);
-        json_term_writer.push_path_segment("mykey");
-        json_term_writer.push_path_segment("field");
-        json_term_writer.set_str("hello");
-        let hello_term = json_term_writer.term().clone();
-        json_term_writer.set_str("nothello");
-        let nothello_term = json_term_writer.term().clone();
-        json_term_writer.set_str("happy");
-        let happy_term = json_term_writer.term().clone();
+
+        let term = Term::from_field_json_path(json_field, "mykey.field", false);
+
+        let mut hello_term = term.clone();
+        hello_term.append_type_and_str("hello");
+
+        let mut nothello_term = term.clone();
+        nothello_term.append_type_and_str("nothello");
+
+        let mut happy_term = term.clone();
+        happy_term.append_type_and_str("happy");
+
        let phrase_query = PhraseQuery::new(vec![hello_term, happy_term.clone()]);
        assert_eq!(searcher.search(&phrase_query, &Count).unwrap(), 1);
        let phrase_query = PhraseQuery::new(vec![nothello_term, happy_term]);
--- a/src/lib.rs
+++ b/src/lib.rs
@@ -178,6 +178,7 @@ pub use crate::future_result::FutureResult;
 pub type Result<T> = std::result::Result<T, TantivyError>;

 mod core;
+#[allow(deprecated)] // Remove with index sorting
 pub mod indexer;

 #[allow(unused_doc_comments)]
@@ -189,6 +190,7 @@ pub mod collector;
 pub mod directory;
 pub mod fastfield;
 pub mod fieldnorm;
+#[allow(deprecated)] // Remove with index sorting
 pub mod index;
 pub mod positions;
 pub mod postings;
@@ -214,29 +216,17 @@ use once_cell::sync::Lazy;
 use serde::{Deserialize, Serialize};

 pub use self::docset::{DocSet, COLLECT_BLOCK_BUFFER_LEN, TERMINATED};
-#[deprecated(
-    since = "0.22.0",
-    note = "Will be removed in tantivy 0.23. Use export from snippet module instead"
-)]
-pub use self::snippet::{Snippet, SnippetGenerator};
 #[doc(hidden)]
 pub use crate::core::json_utils;
 pub use crate::core::{Executor, Searcher, SearcherGeneration};
 pub use crate::directory::Directory;
+#[allow(deprecated)] // Remove with index sorting
 pub use crate::index::{
    Index, IndexBuilder, IndexMeta, IndexSettings, IndexSortByField, InvertedIndexReader, Order,
-    Segment, SegmentComponent, SegmentId, SegmentMeta, SegmentReader,
+    Segment, SegmentMeta, SegmentReader,
 };
-#[deprecated(
-    since = "0.22.0",
-    note = "Will be removed in tantivy 0.23. Use export from indexer module instead"
-)]
-pub use crate::indexer::PreparedCommit;
 pub use crate::indexer::{IndexWriter, SingleSegmentIndexWriter};
-pub use crate::postings::Postings;
-#[allow(deprecated)]
-pub use crate::schema::DatePrecision;
-pub use crate::schema::{DateOptions, DateTimePrecision, Document, TantivyDocument, Term};
+pub use crate::schema::{Document, TantivyDocument, Term};

 /// Index format version.
 const INDEX_FORMAT_VERSION: u32 = 6;
@@ -254,7 +244,7 @@ pub struct Version {

 impl fmt::Debug for Version {
    fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
-        write!(f, "{}", self.to_string())
+        fmt::Display::fmt(self, f)
    }
 }

@@ -265,9 +255,10 @@ static VERSION: Lazy<Version> = Lazy::new(|| Version {
    index_format_version: INDEX_FORMAT_VERSION,
 });

-impl ToString for Version {
-    fn to_string(&self) -> String {
-        format!(
+impl fmt::Display for Version {
+    fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
+        write!(
+            f,
            "tantivy v{}.{}.{}, index_format v{}",
            self.major, self.minor, self.patch, self.index_format_version
        )
@@ -390,9 +381,10 @@ pub mod tests {
    use crate::docset::{DocSet, TERMINATED};
    use crate::index::SegmentReader;
    use crate::merge_policy::NoMergePolicy;
+    use crate::postings::Postings;
    use crate::query::BooleanQuery;
    use crate::schema::*;
-    use crate::{DateTime, DocAddress, Index, IndexWriter, Postings, ReloadPolicy};
+    use crate::{DateTime, DocAddress, Index, IndexWriter, ReloadPolicy};

    pub fn fixed_size_test<O: BinarySerializable + FixedSize + Default>() {
        let mut buffer = Vec::new();
@@ -405,16 +397,20 @@ pub mod tests {
    #[macro_export]
    macro_rules! assert_nearly_equals {
        ($left:expr, $right:expr) => {{
-            match (&$left, &$right) {
-                (left_val, right_val) => {
+            assert_nearly_equals!($left, $right, 0.0005);
+        }};
+        ($left:expr, $right:expr, $epsilon:expr) => {{
+            match (&$left, &$right, &$epsilon) {
+                (left_val, right_val, epsilon_val) => {
                    let diff = (left_val - right_val).abs();
-                    let add = left_val.abs() + right_val.abs();
-                    if diff > 0.0005 * add {
+
+                    if diff > *epsilon_val {
                        panic!(
-                            r#"assertion failed: `(left ~= right)`
-  left: `{:?}`,
- right: `{:?}`"#,
-                            &*left_val, &*right_val
+                            r#"assertion failed: `abs(left-right)>epsilon`
+    left: `{:?}`,
+    right: `{:?}`,
+    epsilon: `{:?}`"#,
+                            &*left_val, &*right_val, &*epsilon_val
                        )
                    }
                }
@@ -444,7 +440,6 @@ pub mod tests {
    }

    #[test]
-    #[cfg(not(feature = "lz4"))]
    fn test_version_string() {
        use regex::Regex;
        let regex_ptn = Regex::new(
@@ -944,7 +939,7 @@ pub mod tests {
        let mut schema_builder = Schema::builder();
        let json_field = schema_builder.add_json_field("json", STORED | TEXT);
        let schema = schema_builder.build();
-        let json_val: serde_json::Map<String, serde_json::Value> = serde_json::from_str(
+        let json_val: serde_json::Value = serde_json::from_str(
            r#"{
            "signed": 2,
            "float": 2.0,
@@ -1034,13 +1029,16 @@ pub mod tests {
                            text_field => "some other value",
                            other_text_field => "short");
        assert_eq!(document.len(), 3);
-        let values: Vec<&OwnedValue> = document.get_all(text_field).collect();
+        let values: Vec<OwnedValue> = document.get_all(text_field).map(OwnedValue::from).collect();
        assert_eq!(values.len(), 2);
-        assert_eq!(values[0].as_str(), Some("tantivy"));
-        assert_eq!(values[1].as_str(), Some("some other value"));
-        let values: Vec<&OwnedValue> = document.get_all(other_text_field).collect();
+        assert_eq!(values[0].as_ref().as_str(), Some("tantivy"));
+        assert_eq!(values[1].as_ref().as_str(), Some("some other value"));
+        let values: Vec<OwnedValue> = document
+            .get_all(other_text_field)
+            .map(OwnedValue::from)
+            .collect();
        assert_eq!(values.len(), 1);
-        assert_eq!(values[0].as_str(), Some("short"));
+        assert_eq!(values[0].as_ref().as_str(), Some("short"));
    }

    #[test]
@@ -1107,9 +1105,9 @@ pub mod tests {
    #[test]
    fn test_update_via_delete_insert() -> crate::Result<()> {
        use crate::collector::Count;
+        use crate::index::SegmentId;
        use crate::indexer::NoMergePolicy;
        use crate::query::AllQuery;
-        use crate::SegmentId;

        const DOC_COUNT: u64 = 2u64;

--- a/src/macros.rs
+++ b/src/macros.rs
@@ -41,6 +41,7 @@
 /// );
 /// # }
 /// ```
+
 #[macro_export]
 macro_rules! doc(
    () => {
@@ -52,7 +53,7 @@ macro_rules! doc(
        {
            let mut document = $crate::TantivyDocument::default();
            $(
-                document.add_field_value($field, $value);
+                document.add_field_value($field, &$value);
            )*
            document
        }
--- a/src/postings/compression/mod.rs
+++ b/src/postings/compression/mod.rs
@@ -14,7 +14,6 @@ pub fn compressed_block_size(num_bits: u8) -> usize {
 pub struct BlockEncoder {
    bitpacker: BitPacker4x,
    pub output: [u8; COMPRESSED_BLOCK_MAX_SIZE],
-    pub output_len: usize,
 }

 impl Default for BlockEncoder {
@@ -28,7 +27,6 @@ impl BlockEncoder {
        BlockEncoder {
            bitpacker: BitPacker4x::new(),
            output: [0u8; COMPRESSED_BLOCK_MAX_SIZE],
-            output_len: 0,
        }
    }

--- a/src/postings/json_postings_writer.rs
+++ b/src/postings/json_postings_writer.rs
@@ -1,5 +1,6 @@
 use std::io;

+use common::json_path_writer::JSON_END_OF_PATH;
 use stacker::Addr;

 use crate::indexer::doc_id_mapping::DocIdMapping;
@@ -7,7 +8,7 @@ use crate::indexer::path_to_unordered_id::OrderedPathId;
 use crate::postings::postings_writer::SpecializedPostingsWriter;
 use crate::postings::recorder::{BufferLender, DocIdRecorder, Recorder};
 use crate::postings::{FieldSerializer, IndexingContext, IndexingPosition, PostingsWriter};
-use crate::schema::{Field, Type, JSON_END_OF_PATH};
+use crate::schema::{Field, Type};
 use crate::tokenizer::TokenStream;
 use crate::{DocId, Term};

@@ -67,10 +68,18 @@ impl<Rec: Recorder> PostingsWriter for JsonPostingsWriter<Rec> {
    ) -> io::Result<()> {
        let mut term_buffer = Term::with_capacity(48);
        let mut buffer_lender = BufferLender::default();
+        term_buffer.clear_with_field_and_type(Type::Json, Field::from_field_id(0));
+        let mut prev_term_id = u32::MAX;
+        let mut term_path_len = 0; // this will be set in the first iteration
        for (_field, path_id, term, addr) in term_addrs {
-            term_buffer.clear_with_field_and_type(Type::Json, Field::from_field_id(0));
-            term_buffer.append_bytes(ordered_id_to_path[path_id.path_id() as usize].as_bytes());
-            term_buffer.append_bytes(&[JSON_END_OF_PATH]);
+            if prev_term_id != path_id.path_id() {
+                term_buffer.truncate_value_bytes(0);
+                term_buffer.append_path(ordered_id_to_path[path_id.path_id() as usize].as_bytes());
+                term_buffer.append_bytes(&[JSON_END_OF_PATH]);
+                term_path_len = term_buffer.len_bytes();
+                prev_term_id = path_id.path_id();
+            }
+            term_buffer.truncate_value_bytes(term_path_len);
            term_buffer.append_bytes(term);
            if let Some(json_value) = term_buffer.value().as_json_value_bytes() {
                let typ = json_value.typ();
--- a/Show More
+++ b/Show More
Author	SHA1	Message	Date
Pascal Seitz	2ce485b8cc	skip estimate phase for merge multivalue index precompute stats for merge multivalue index + disable Line encoding for multivalue index. That combination allows to skip the first estimation pass. This gives up to 2x on merge performance on multivalue indices. This change may decrease compression as Line is very good compressible for documents, which have a fixed amount of values in each doc. The line codec should be replaced. ``` merge_multi_and_multi Avg: 22.7880ms (-47.15%) Median: 22.5469ms (-47.38%) [22.3691ms .. 25.8392ms] merge_dense_and_dense Avg: 14.4398ms (+2.18%) Median: 14.2465ms (+0.74%) [14.1620ms .. 16.1270ms] merge_sparse_and_sparse Avg: 10.6559ms (+1.10%) Median: 10.6318ms (+0.91%) [10.5527ms .. 11.2848ms] merge_sparse_and_dense Avg: 12.4886ms (+1.52%) Median: 12.4044ms (+0.84%) [12.3261ms .. 13.9439ms] merge_multi_and_dense Avg: 25.6686ms (-45.56%) Median: 25.4851ms (-45.84%) [25.1618ms .. 27.6226ms] merge_multi_and_sparse Avg: 24.3278ms (-47.00%) Median: 24.1917ms (-47.34%) [23.7159ms .. 27.0513ms] ```	2024-06-11 20:22:00 +08:00
PSeitz	c3b92a5412	fix compiler warning, cleanup (#2393 ) fix compiler warning for missing feature flag remove unused variables cleanup unused methods	2024-06-11 16:03:50 +08:00
PSeitz	2f55511064	extend indexwriter proptests (#2342 ) * index random values in proptest * add proptest with multiple docs	2024-06-11 16:02:57 +08:00
trinity-1686a	08b9fc0b31	fix de-escaping too much in query parser (#2427 ) * fix de-escaping too much in query parser	2024-06-10 11:19:01 +02:00
PSeitz	714f363d43	add bench & test for columnar merging (#2428 ) * add merge columnar proptest * add columnar merge benchmark	2024-06-10 16:26:16 +08:00
PSeitz	93ff7365b0	reduce top hits aggregation memory consumption (#2426 ) move request structure out of top hits aggregation collector and use from the passed structure instead full terms_many_with_top_hits Memory: 58.2 MB (-43.64%) Avg: 425.9680ms (-21.38%) Median: 415.1097ms (-23.56%) [395.5303ms .. 484.6325ms] dense terms_many_with_top_hits Memory: 58.2 MB (-43.64%) Avg: 440.0817ms (-19.68%) Median: 432.2286ms (-21.10%) [403.5632ms .. 497.7541ms] sparse terms_many_with_top_hits Memory: 13.1 MB (-49.31%) Avg: 33.3568ms (-32.19%) Median: 33.0834ms (-31.86%) [32.5126ms .. 35.7397ms] multivalue terms_many_with_top_hits Memory: 58.2 MB (-43.64%) Avg: 414.2340ms (-25.44%) Median: 413.4144ms (-25.64%) [403.9919ms .. 430.3170ms]	2024-06-06 22:32:58 +08:00
Adam Reichold	8151925068	Panicking in spawned Rayon tasks will abort the process by default. (#2409 )	2024-06-04 17:04:30 +09:00
dependabot[bot]	b960e40bc8	Update sketches-ddsketch requirement from 0.2.1 to 0.3.0 (#2423 ) Updates the requirements on [sketches-ddsketch](https://github.com/mheffner/rust-sketches-ddsketch) to permit the latest version. - [Release notes](https://github.com/mheffner/rust-sketches-ddsketch/releases) - [Commits](https://github.com/mheffner/rust-sketches-ddsketch/compare/v0.2.1...v0.3.0) --- updated-dependencies: - dependency-name: sketches-ddsketch dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2024-06-04 15:50:23 +08:00
giovannicuccu	1095c9b073	Issue 1787 extended stats (#2247 ) * first version of extended stats along with its tests * using IntermediateExtendStats instead of IntermediateStats with all tests passing * Created struct for request and response * first test with extended_stats * kahan summation and tests with approximate equality * version ready for merge * removed approx dependency * refactor for using ExtendedStats only when needed * interim version * refined version with code formatted * refactored a struct * cosmetic refactor * fix after merge * fix format * added extended_stat bench * merge and new benchmark for extended stats * split stat segment collectors * wrapped intermediate extended stat with a box to limit memory usage * Revert "wrapped intermediate extended stat with a box to limit memory usage" This reverts commit `5b4aa9f393`. * some code reformat, commented kahan summation * refactor after review * refactor after code review * fix after incorrectly restoring kahan summation * modifications for code review + bug fix in merge_fruit * refactor assert_nearly_equals macro * update after code review --------- Co-authored-by: Giovanni Cuccu <gcuccu@imolainformatica.it>	2024-06-04 14:25:17 +08:00
PSeitz	c0686515a9	update one_shot (#2420 )	2024-05-31 11:07:35 +08:00
trinity-1686a	455156f51c	improve query parser (#2416 ) * support escape sequence in more place and fix bug with singlequoted strings * add query parser test for range query on default field	2024-05-30 17:29:27 +02:00
Meng Zhang	4143d31865	chore: fix build as the rev is gone (#2417 )	2024-05-29 09:49:16 +08:00
Hamir Mahal	0c634adbe1	style: simplify strings with string interpolation (#2412 ) * style: simplify strings with string interpolation * fix: formatting	2024-05-27 09:16:47 +02:00
PSeitz	2e3641c2ae	return CompactDocValue instead of trait (#2410 ) The CompactDocValue is easier to handle than the trait in some cases like comparison and conversion	2024-05-27 07:33:50 +02:00
Paul Masurel	b806122c81	Fixing flaky test (#2407 )	2024-05-22 10:10:55 +09:00
PSeitz	e1679f3fb9	compact doc (#2402 ) * compact doc * add any value type * pass references when building CompactDoc * remove OwnedValue from API * clippy * clippy * fail on large documents * fmt * cleanup * cleanup * implement Value for different types fix serde_json date Value implementation * fmt * cleanup * fmt * cleanup * store positions instead of pos+len * remove nodes array * remove mediumvec * cleanup * infallible serialize into vec * remove positions indirection * remove 24MB limitation in document use u32 for Addr Remove the 3 byte addressing limitation and use VInt instead * cleanup * extend test * cleanup, add comments * rename, remove pub	2024-05-21 10:16:08 +02:00
dependabot[bot]	5a80420b10	--- (#2406 ) updated-dependencies: - dependency-name: binggan dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2024-05-21 04:36:32 +02:00
dependabot[bot]	aa26ff5029	Update binggan requirement from 0.6.2 to 0.7.0 (#2401 ) --- updated-dependencies: - dependency-name: binggan dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2024-05-17 02:53:25 +02:00
dependabot[bot]	e197b59258	Update itertools requirement from 0.12.0 to 0.13.0 (#2400 ) Updates the requirements on [itertools](https://github.com/rust-itertools/itertools) to permit the latest version. - [Changelog](https://github.com/rust-itertools/itertools/blob/master/CHANGELOG.md) - [Commits](https://github.com/rust-itertools/itertools/compare/v0.12.0...v0.13.0) --- updated-dependencies: - dependency-name: itertools dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2024-05-17 02:53:02 +02:00
PSeitz	5b7cca13e5	lower contention on AggregationLimits (#2394 ) PR https://github.com/quickwit-oss/quickwit/pull/4962 fixes an issue where the AggregationLimits are not passed correctly. Since the AggregationLimits are shared properly we run into contention issues. This PR includes some straightforward improvement to reduce contention, by only calling if the memory changed and avoiding the second read. We probably need some sharding with multiple counters or local caching before updating the global after some threshold.	2024-05-15 12:25:40 +02:00
dependabot[bot]	a79590477e	Update binggan requirement from 0.5.2 to 0.6.2 (#2399 ) --- updated-dependencies: - dependency-name: binggan dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2024-05-15 05:40:37 +02:00
Paul Masurel	6181c1eb5e	Small changes in the Executor API. (#2391 ) Warning, this change is mildly not backward compatible so I bumped tantivy's version.	2024-05-10 17:19:12 +09:00
Adam Reichold	1ee5f90761	Give allocation control to the caller instead of force a clone (#2389 ) Achieved by moving the boxes out of the temporary reference wrappers which are cloneable themselves, i.e. if required the caller can clone them already or consume them to reuse existing allocations.	2024-05-09 16:01:13 +09:00
PSeitz	71f3b4e4e3	fix ReferenceValue API flaw (#2372 ) * fix ReferenceValue API flaw Remove `Facet` and `TokenizedString` values from the `ReferenceValue` API, as this requires the trait value to have them stored somewhere. Since `TokenizedString` is quite niche, I just copy it into a Box, instead of designing a reference API around it. * fix comment link	2024-05-09 06:14:42 +02:00
trinity-1686a	8cd7ddc535	run block decompression from executor (#2386 ) * run block decompression from executor * add a wrapper with is_closed to oneshot channel * add cancelation test to Executor::spawn_blocking	2024-05-08 12:22:44 +02:00
Paul Masurel	2b76335a95	Removed usage of num_cpus (#2387 ) * Removed usage of num_cpus * handling error	2024-05-08 13:32:52 +09:00
PSeitz	c6b213d8f0	use bingang for agg benchmark (#2378 ) * use bingang for agg benchmark use bingang for agg benchmark, which includes memory consumption Output: ``` full histogram Memory: 15.8 KB Avg: 10.9322ms (+5.44%) Median: 10.8790ms (+9.28%) Min: 10.7470ms Max: 11.3263ms histogram_hard_bounds Memory: 15.5 KB Avg: 5.1939ms (+6.61%) Median: 5.1722ms (+10.98%) Min: 5.0432ms Max: 5.3910ms histogram_with_avg_sub_agg Memory: 48.7 KB Avg: 23.8165ms (+4.57%) Median: 23.7264ms (+10.06%) Min: 23.4995ms Max: 24.8107ms dense histogram Memory: 17.3 KB Avg: 15.6810ms (-8.54%) Median: 15.6174ms (-8.89%) Min: 15.4953ms Max: 16.0702ms histogram_hard_bounds Memory: 15.4 KB Avg: 10.0720ms (-7.33%) Median: 10.0572ms (-7.06%) Min: 9.8500ms Max: 10.4819ms histogram_with_avg_sub_agg Memory: 50.1 KB Avg: 33.0993ms (-7.04%) Median: 32.9499ms (-6.86%) Min: 32.8284ms Max: 34.0529ms sparse histogram Memory: 16.3 KB Avg: 19.2325ms (-0.44%) Median: 19.1211ms (-1.26%) Min: 19.0348ms Max: 19.7902ms histogram_hard_bounds Memory: 16.1 KB Avg: 18.5179ms (-0.61%) Median: 18.4552ms (-0.90%) Min: 18.3799ms Max: 19.0535ms histogram_with_avg_sub_agg Memory: 34.7 KB Avg: 21.2589ms (-0.69%) Median: 21.1867ms (-1.05%) Min: 21.0342ms Max: 21.9900ms ``` * add more bench with term as sub agg	2024-05-07 11:29:49 +02:00
PSeitz	eea70030bf	cleanup top level exports (#2382 ) remove some top level exports	2024-05-07 09:59:41 +02:00
PSeitz	92b5526310	allow more JSON values, fix i64 special case (#2383 ) This changes three things: - Reuse positions_per_path hashmap instead of allocating one per indexed JSON value - Try to cast u64 values to i64 to streamline with search behaviour - Allow top level json values to be of any type, instead of limiting it to JSON objects. Remove special JSON object handling method. TODO: We probably should also try to check f64 to i64 and u64 when indexing, as values may get converted to f64 by the JSON parser	2024-05-01 12:08:12 +02:00
PSeitz	99a59ad37e	remove zero byte check (#2379 ) remove zero byte checks in columnar. zero bytes are converted during serialization now. unify code paths extend test for expected column names	2024-04-26 06:03:28 +02:00
trinity-1686a	6a66a71cbb	modify fastfield range query heuristic (#2375 )	2024-04-25 10:06:11 +02:00
PSeitz	ff40764204	make convert_to_fast_value_and_append_to_json_term pub (#2370 ) * make convert_to_fast_value_and_append_to_json_term pub * clippy	2024-04-23 04:05:41 +02:00
PSeitz	047da20b5b	add json path constructor to term (#2367 )	2024-04-22 12:23:35 +02:00
PSeitz	1417eaf3a7	fix coverage (#2368 )	2024-04-22 12:23:15 +02:00
PSeitz	4f8493d2de	improve document docs (#2359 )	2024-04-22 12:05:16 +02:00
Paul Masurel	8861366137	Owned value relying on Vec instead of BTreeMap (#2364 ) * Owned value relying on Vec instead of BTreeMap * fmt * fix build * fix serialization --------- Co-authored-by: Pascal Seitz <pascal.seitz@gmail.com>	2024-04-22 09:38:05 +02:00
PSeitz	0e9fced336	remove JsonTermWriter (#2238 ) * remove JsonTermWriter remove JsonTermWriter remove path truncation logic, add assertion * fix json_path_writer add sep logic	2024-04-18 16:28:05 +02:00
PSeitz	b257b960b3	validate sort by field type (#2336 ) * validate sort by field type * Update src/index/index.rs Co-authored-by: Adam Reichold <adamreichold@users.noreply.github.com> --------- Co-authored-by: Adam Reichold <adamreichold@users.noreply.github.com>	2024-04-16 04:42:24 +02:00
Adam Reichold	4708171a32	Fix some of the things current Clippy complains about (#2363 )	2024-04-16 04:27:06 +02:00
Adam Reichold	b493743f8d	Fix trait bound of StoreReader::iter (#2360 ) * Fix trait bound of StoreReader::iter Similar to `StoreReader::get`, `StoreReader::iter` should only require `DocumentDeserialize` and not `Document`. * Mark the iterator returned by SegmentReader::doc_ids_alive as Send so it can be used in impls of Stream/AsyncIterator.	2024-04-15 15:50:02 +02:00
trinity-1686a	d2955a3fd2	extend field grouping (#2333 ) * extend field grouping	2024-04-15 10:36:32 +02:00
PSeitz	17d5869ad6	update CHANGELOG, use github API in cliff (#2354 ) * update CHANGELOG, use github API in cliff * reset version to 0.21.1, before release * chore: Release * remove unreleased from CHANGELOG	2024-04-15 10:07:20 +02:00
PSeitz	dfa3aed32d	check unsupported parameters top_hits (#2351 ) * check unsupported parameters top_hits * move to function	2024-04-10 08:20:52 +02:00
PSeitz	398817ce7b	add index sorting deprecation warning (#2353 ) * add index sorting deprecation warning * remove deprecated IntOptions and DatePrecision	2024-04-10 08:09:09 +02:00
PSeitz	74940e9345	clippy (#2349 ) * fix clippy * fix clippy * fix duplicate imports	2024-04-09 07:54:44 +02:00
PSeitz	1e9fc51535	update ahash (#2344 )	2024-04-09 06:35:39 +02:00
PSeitz	92c32979d2	fix postcard compatibility for top_hits, add postcard test (#2346 ) * fix postcard compatibility for top_hits, add postcard test * fix top_hits naming, delay data fetch closes #2347 * fix import	2024-04-09 06:17:25 +02:00
PSeitz	b644d78a32	fix null byte handling in JSON paths (#2345 ) * fix null byte handling in JSON paths closes https://github.com/quickwit-oss/tantivy/issues/2193 closes https://github.com/quickwit-oss/tantivy/issues/2340 * avoid repeated term truncation * fix test * Apply suggestions from code review Co-authored-by: Paul Masurel <paul@quickwit.io> * add comment --------- Co-authored-by: Paul Masurel <paul@quickwit.io>	2024-04-05 09:53:35 +02:00