Speed up searches by removing repeated memsets coming from vec.resize()

Also, reserve exactly the size needed, which is surprisingly needed to get the full speedup of ~5% on a good fraction of the queries.
allow some mixing of occur and bool in strict query parser (#2323 )
2026-06-07 02:50:40 +00:00 · 2024-03-12 17:50:23 +01:00 · 2024-03-07 15:17:48 +01:00 · 2024-03-05 05:49:41 +01:00 · 2024-03-05 04:11:11 +01:00 · 2024-02-27 03:38:04 +01:00
90 changed files with 2094 additions and 478 deletions
--- a/Cargo.toml
+++ b/Cargo.toml
@@ -31,14 +31,14 @@ log = "0.4.16"
 serde = { version = "1.0.136", features = ["derive"] }
 serde_json = "1.0.79"
 num_cpus = "1.13.1"
-fs4 = { version = "0.7.0", optional = true }
+fs4 = { version = "0.8.0", optional = true }
 levenshtein_automata = "0.2.1"
 uuid = { version = "1.0.0", features = ["v4", "serde"] }
 crossbeam-channel = "0.5.4"
 rust-stemmers = "1.2.0"
 downcast-rs = "1.2.0"
 bitpacking = { version = "0.9.2", default-features = false, features = ["bitpacker4x"] }
-census = "0.4.0"
+census = "0.4.2"
 rustc-hash = "1.1.0"
 thiserror = "1.0.30"
 htmlescape = "0.3.1"
@@ -77,6 +77,7 @@ futures = "0.3.21"
 paste = "1.0.11"
 more-asserts = "0.3.1"
 rand_distr = "0.4.3"
+time = { version = "0.3.10", features = ["serde-well-known", "macros"] }

 [target.'cfg(not(windows))'.dev-dependencies]
 criterion = { version = "0.5", default-features = false }
--- a/README.md
+++ b/README.md
@@ -5,19 +5,18 @@
 [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
 [![Crates.io](https://img.shields.io/crates/v/tantivy.svg)](https://crates.io/crates/tantivy)

-![Tantivy](https://tantivy-search.github.io/logo/tantivy-logo.png)
+<img src="https://tantivy-search.github.io/logo/tantivy-logo.png" alt="Tantivy, the fastest full-text search engine library written in Rust" height="250">

-**Tantivy** is a **full-text search engine library** written in Rust.
+## Fast full-text search engine library written in Rust

-It is closer to [Apache Lucene](https://lucene.apache.org/) than to [Elasticsearch](https://www.elastic.co/products/elasticsearch) or [Apache Solr](https://lucene.apache.org/solr/) in the sense it is not
-an off-the-shelf search engine server, but rather a crate that can be used
-to build such a search engine.
+**If you are looking for an alternative to Elasticsearch or Apache Solr, check out [Quickwit](https://github.com/quickwit-oss/quickwit), our distributed search engine built on top of Tantivy.**
+
+Tantivy is closer to [Apache Lucene](https://lucene.apache.org/) than to [Elasticsearch](https://www.elastic.co/products/elasticsearch) or [Apache Solr](https://lucene.apache.org/solr/) in the sense it is not
+an off-the-shelf search engine server, but rather a crate that can be used to build such a search engine.

 Tantivy is, in fact, strongly inspired by Lucene's design.

-If you are looking for an alternative to Elasticsearch or Apache Solr, check out [Quickwit](https://github.com/quickwit-oss/quickwit), our search engine built on top of Tantivy.
-
-# Benchmark
+## Benchmark

 The following [benchmark](https://tantivy-search.github.io/bench/) breakdowns
 performance for different types of queries/collections.
@@ -28,7 +27,7 @@ Your mileage WILL vary depending on the nature of queries and their load.

 Details about the benchmark can be found at this [repository](https://github.com/quickwit-oss/search-benchmark-game).

-# Features
+## Features

 - Full-text search
 - Configurable tokenizer (stemming available for 17 Latin languages) with third party support for Chinese ([tantivy-jieba](https://crates.io/crates/tantivy-jieba) and [cang-jie](https://crates.io/crates/cang-jie)), Japanese ([lindera](https://github.com/lindera-morphology/lindera-tantivy), [Vaporetto](https://crates.io/crates/vaporetto_tantivy), and [tantivy-tokenizer-tiny-segmenter](https://crates.io/crates/tantivy-tokenizer-tiny-segmenter)) and Korean ([lindera](https://github.com/lindera-morphology/lindera-tantivy) + [lindera-ko-dic-builder](https://github.com/lindera-morphology/lindera-ko-dic-builder))
@@ -54,11 +53,11 @@ Details about the benchmark can be found at this [repository](https://github.com
 - Searcher Warmer API
 - Cheesy logo with a horse

-## Non-features
+### Non-features

 Distributed search is out of the scope of Tantivy, but if you are looking for this feature, check out [Quickwit](https://github.com/quickwit-oss/quickwit/).

-# Getting started
+## Getting started

 Tantivy works on stable Rust and supports Linux, macOS, and Windows.

@@ -68,7 +67,7 @@ index documents, and search via the CLI or a small server with a REST API.
 It walks you through getting a Wikipedia search engine up and running in a few minutes.
 - [Reference doc for the last released version](https://docs.rs/tantivy/)

-# How can I support this project?
+## How can I support this project?

 There are many ways to support this project.

@@ -79,16 +78,16 @@ There are many ways to support this project.
 - Contribute code (you can join [our Discord server](https://discord.gg/MT27AG5EVE))
 - Talk about Tantivy around you

-# Contributing code
+## Contributing code

 We use the GitHub Pull Request workflow: reference a GitHub ticket and/or include a comprehensive commit message when opening a PR.
 Feel free to update CHANGELOG.md with your contribution.

-## Tokenizer
+### Tokenizer

 When implementing a tokenizer for tantivy depend on the `tantivy-tokenizer-api` crate.

-## Clone and build locally
+### Clone and build locally

 Tantivy compiles on stable Rust.
 To check out and run tests, you can simply run:
@@ -99,7 +98,7 @@ cd tantivy
 cargo test
 ```

-# Companies Using Tantivy
+## Companies Using Tantivy

 <p align="left">
 <img align="center" src="doc/assets/images/etsy.png" alt="Etsy" height="25" width="auto" />&nbsp;
@@ -111,7 +110,7 @@ cargo test
 <img align="center" src="doc/assets/images/element-dark-theme.png#gh-dark-mode-only" alt="Element.io" height="25" width="auto" />
 </p>

-# FAQ
+## FAQ

 ### Can I use Tantivy in other languages?

--- a/bitpacker/src/bitpacker.rs
+++ b/bitpacker/src/bitpacker.rs
@@ -125,6 +125,8 @@ impl BitUnpacker {

    // Decodes the range of bitpacked `u32` values with idx
    // in [start_idx, start_idx + output.len()).
+    // It is guaranteed to completely fill `output` and not read from it, so passing a vector with
+    // un-initialized values is safe.
    //
    // #Panics
    //
@@ -237,7 +239,19 @@ impl BitUnpacker {
        data: &[u8],
        positions: &mut Vec<u32>,
    ) {
-        positions.resize(id_range.len(), 0u32);
+        // We use the code below instead of positions.resize(id_range.len(), 0u32) for performance
+        // reasons: on some queries, the CPU cost of memsetting the array and of using a bigger
+        // vector than necessary is noticeable (~5%).
+        // In particular, searches are a few percent faster when using reserve_exact() as below
+        // instead of reserve().
+        // The un-initialized values are safe as get_batch_u32s() completely fills `positions`
+        // and does not read from it.
+        positions.clear();
+        positions.reserve_exact(id_range.len());
+        #[allow(clippy::uninit_vec)]
+        unsafe {
+            positions.set_len(id_range.len());
+        }
        self.get_batch_u32s(id_range.start, data, positions);
        crate::filter_vec::filter_vec_in_place(value_range, id_range.start, positions)
    }
--- a/columnar/src/column_index/merge/stacked.rs
+++ b/columnar/src/column_index/merge/stacked.rs
@@ -111,10 +111,7 @@ fn stack_multivalued_indexes<'a>(
    let mut last_row_id = 0;
    let mut current_it = multivalued_indexes.next();
    Box::new(std::iter::from_fn(move || loop {
-        let Some(multivalued_index) = current_it.as_mut() else {
-            return None;
-        };
-        if let Some(row_id) = multivalued_index.next() {
+        if let Some(row_id) = current_it.as_mut()?.next() {
            last_row_id = offset + row_id;
            return Some(last_row_id);
        }
--- a/columnar/src/column_index/mod.rs
+++ b/columnar/src/column_index/mod.rs
@@ -126,18 +126,18 @@ impl ColumnIndex {
        }
    }

-    pub fn docid_range_to_rowids(&self, doc_id: Range<DocId>) -> Range<RowId> {
+    pub fn docid_range_to_rowids(&self, doc_id_range: Range<DocId>) -> Range<RowId> {
        match self {
            ColumnIndex::Empty { .. } => 0..0,
-            ColumnIndex::Full => doc_id,
+            ColumnIndex::Full => doc_id_range,
            ColumnIndex::Optional(optional_index) => {
-                let row_start = optional_index.rank(doc_id.start);
-                let row_end = optional_index.rank(doc_id.end);
+                let row_start = optional_index.rank(doc_id_range.start);
+                let row_end = optional_index.rank(doc_id_range.end);
                row_start..row_end
            }
            ColumnIndex::Multivalued(multivalued_index) => {
-                let end_docid = doc_id.end.min(multivalued_index.num_docs() - 1) + 1;
-                let start_docid = doc_id.start.min(end_docid);
+                let end_docid = doc_id_range.end.min(multivalued_index.num_docs() - 1) + 1;
+                let start_docid = doc_id_range.start.min(end_docid);

                let row_start = multivalued_index.start_index_column.get_val(start_docid);
                let row_end = multivalued_index.start_index_column.get_val(end_docid);
--- a/columnar/src/column_index/optional_index/mod.rs
+++ b/columnar/src/column_index/optional_index/mod.rs
@@ -21,8 +21,6 @@ const DENSE_BLOCK_THRESHOLD: u32 =

 const ELEMENTS_PER_BLOCK: u32 = u16::MAX as u32 + 1;

-const BLOCK_SIZE: RowId = 1 << 16;
-
 #[derive(Copy, Clone, Debug)]
 struct BlockMeta {
    non_null_rows_before_block: u32,
@@ -109,8 +107,8 @@ struct RowAddr {
 #[inline(always)]
 fn row_addr_from_row_id(row_id: RowId) -> RowAddr {
    RowAddr {
-        block_id: (row_id / BLOCK_SIZE) as u16,
-        in_block_row_id: (row_id % BLOCK_SIZE) as u16,
+        block_id: (row_id / ELEMENTS_PER_BLOCK) as u16,
+        in_block_row_id: (row_id % ELEMENTS_PER_BLOCK) as u16,
    }
 }

@@ -185,8 +183,13 @@ impl Set<RowId> for OptionalIndex {
        }
    }

+    /// Any value doc_id is allowed.
+    /// In particular, doc_id = num_rows.
    #[inline]
    fn rank(&self, doc_id: DocId) -> RowId {
+        if doc_id >= self.num_docs() {
+            return self.num_non_nulls();
+        }
        let RowAddr {
            block_id,
            in_block_row_id,
@@ -200,13 +203,15 @@ impl Set<RowId> for OptionalIndex {
        block_meta.non_null_rows_before_block + block_offset_row_id
    }

+    /// Any value doc_id is allowed.
+    /// In particular, doc_id = num_rows.
    #[inline]
    fn rank_if_exists(&self, doc_id: DocId) -> Option<RowId> {
        let RowAddr {
            block_id,
            in_block_row_id,
        } = row_addr_from_row_id(doc_id);
-        let block_meta = self.block_metas[block_id as usize];
+        let block_meta = *self.block_metas.get(block_id as usize)?;
        let block = self.block(block_meta);
        let block_offset_row_id = match block {
            Block::Dense(dense_block) => dense_block.rank_if_exists(in_block_row_id),
@@ -491,7 +496,7 @@ fn deserialize_optional_index_block_metadatas(
        non_null_rows_before_block += num_non_null_rows;
    }
    block_metas.resize(
-        ((num_rows + BLOCK_SIZE - 1) / BLOCK_SIZE) as usize,
+        ((num_rows + ELEMENTS_PER_BLOCK - 1) / ELEMENTS_PER_BLOCK) as usize,
        BlockMeta {
            non_null_rows_before_block,
            start_byte_offset,
--- a/columnar/src/column_index/optional_index/set.rs
+++ b/columnar/src/column_index/optional_index/set.rs
@@ -39,7 +39,8 @@ pub trait Set<T> {
    ///
    /// # Panics
    ///
-    /// May panic if rank is greater than the number of elements in the Set.
+    /// May panic if rank is greater or equal to the number of
+    /// elements in the Set.
    fn select(&self, rank: T) -> T;

    /// Creates a brand new select cursor.
--- a/columnar/src/column_index/optional_index/tests.rs
+++ b/columnar/src/column_index/optional_index/tests.rs
@@ -3,6 +3,30 @@ use proptest::strategy::Strategy;
 use proptest::{prop_oneof, proptest};

 use super::*;
+use crate::{ColumnarReader, ColumnarWriter, DynamicColumnHandle};
+
+#[test]
+fn test_optional_index_bug_2293() {
+    // tests for panic in docid_range_to_rowids for docid == num_docs
+    test_optional_index_with_num_docs(ELEMENTS_PER_BLOCK - 1);
+    test_optional_index_with_num_docs(ELEMENTS_PER_BLOCK);
+    test_optional_index_with_num_docs(ELEMENTS_PER_BLOCK + 1);
+}
+fn test_optional_index_with_num_docs(num_docs: u32) {
+    let mut dataframe_writer = ColumnarWriter::default();
+    dataframe_writer.record_numerical(100, "score", 80i64);
+    let mut buffer: Vec<u8> = Vec::new();
+    dataframe_writer
+        .serialize(num_docs, None, &mut buffer)
+        .unwrap();
+    let columnar = ColumnarReader::open(buffer).unwrap();
+    assert_eq!(columnar.num_columns(), 1);
+    let cols: Vec<DynamicColumnHandle> = columnar.read_columns("score").unwrap();
+    assert_eq!(cols.len(), 1);
+
+    let col = cols[0].open().unwrap();
+    col.column_index().docid_range_to_rowids(0..num_docs);
+}

 #[test]
 fn test_dense_block_threshold() {
@@ -35,7 +59,7 @@ proptest! {

 #[test]
 fn test_with_random_sets_simple() {
-    let vals = 10..BLOCK_SIZE * 2;
+    let vals = 10..ELEMENTS_PER_BLOCK * 2;
    let mut out: Vec<u8> = Vec::new();
    serialize_optional_index(&vals, 100, &mut out).unwrap();
    let null_index = open_optional_index(OwnedBytes::new(out)).unwrap();
@@ -171,7 +195,7 @@ fn test_optional_index_rank() {
    test_optional_index_rank_aux(&[0u32, 1u32]);
    let mut block = Vec::new();
    block.push(3u32);
-    block.extend((0..BLOCK_SIZE).map(|i| i + BLOCK_SIZE + 1));
+    block.extend((0..ELEMENTS_PER_BLOCK).map(|i| i + ELEMENTS_PER_BLOCK + 1));
    test_optional_index_rank_aux(&block);
 }

@@ -185,8 +209,8 @@ fn test_optional_index_iter_empty_one() {
 fn test_optional_index_iter_dense_block() {
    let mut block = Vec::new();
    block.push(3u32);
-    block.extend((0..BLOCK_SIZE).map(|i| i + BLOCK_SIZE + 1));
-    test_optional_index_iter_aux(&block, 3 * BLOCK_SIZE);
+    block.extend((0..ELEMENTS_PER_BLOCK).map(|i| i + ELEMENTS_PER_BLOCK + 1));
+    test_optional_index_iter_aux(&block, 3 * ELEMENTS_PER_BLOCK);
 }

 #[test]
--- a/columnar/src/column_values/mod.rs
+++ b/columnar/src/column_values/mod.rs
@@ -101,7 +101,7 @@ pub trait ColumnValues<T: PartialOrd = u64>: Send + Sync {
        row_id_hits: &mut Vec<RowId>,
    ) {
        let row_id_range = row_id_range.start..row_id_range.end.min(self.num_vals());
-        for idx in row_id_range.start..row_id_range.end {
+        for idx in row_id_range {
            let val = self.get_val(idx);
            if value_range.contains(&val) {
                row_id_hits.push(idx);
--- a/query-grammar/src/infallible.rs
+++ b/query-grammar/src/infallible.rs
@@ -81,8 +81,8 @@ where
    T: InputTakeAtPosition + Clone,
    <T as InputTakeAtPosition>::Item: AsChar + Clone,
 {
-    opt_i(nom::character::complete::space0)(input)
-        .map(|(left, (spaces, errors))| (left, (spaces.expect("space0 can't fail"), errors)))
+    opt_i(nom::character::complete::multispace0)(input)
+        .map(|(left, (spaces, errors))| (left, (spaces.expect("multispace0 can't fail"), errors)))
 }

 pub(crate) fn space1_infallible<T>(input: T) -> JResult<T, Option<T>>
@@ -90,7 +90,7 @@ where
    T: InputTakeAtPosition + Clone + InputLength,
    <T as InputTakeAtPosition>::Item: AsChar + Clone,
 {
-    opt_i(nom::character::complete::space1)(input).map(|(left, (spaces, mut errors))| {
+    opt_i(nom::character::complete::multispace1)(input).map(|(left, (spaces, mut errors))| {
        if spaces.is_none() {
            errors.push(LenientErrorInternal {
                pos: left.input_len(),
--- a/query-grammar/src/query_grammar.rs
+++ b/query-grammar/src/query_grammar.rs
@@ -3,11 +3,11 @@ use std::iter::once;
 use nom::branch::alt;
 use nom::bytes::complete::tag;
 use nom::character::complete::{
-    anychar, char, digit1, none_of, one_of, satisfy, space0, space1, u32,
+    anychar, char, digit1, multispace0, multispace1, none_of, one_of, satisfy, u32,
 };
 use nom::combinator::{eof, map, map_res, opt, peek, recognize, value, verify};
 use nom::error::{Error, ErrorKind};
-use nom::multi::{many0, many1, separated_list0, separated_list1};
+use nom::multi::{many0, many1, separated_list0};
 use nom::sequence::{delimited, preceded, separated_pair, terminated, tuple};
 use nom::IResult;

@@ -65,7 +65,7 @@ fn word_infallible(delimiter: &str) -> impl Fn(&str) -> JResult<&str, Option<&st
    |inp| {
        opt_i_err(
            preceded(
-                space0,
+                multispace0,
                recognize(many1(satisfy(|c| {
                    !c.is_whitespace() && !delimiter.contains(c)
                }))),
@@ -225,10 +225,10 @@ fn term_group(inp: &str) -> IResult<&str, UserInputAst> {

    map(
        tuple((
-            terminated(field_name, space0),
+            terminated(field_name, multispace0),
            delimited(
-                tuple((char('('), space0)),
-                separated_list0(space1, tuple((opt(occur_symbol), term_or_phrase))),
+                tuple((char('('), multispace0)),
+                separated_list0(multispace1, tuple((opt(occur_symbol), term_or_phrase))),
                char(')'),
            ),
        )),
@@ -250,7 +250,7 @@ fn term_group_precond(inp: &str) -> IResult<&str, (), ()> {
        (),
        peek(tuple((
            field_name,
-            space0,
+            multispace0,
            char('('), // when we are here, we know it can't be anything but a term group
        ))),
    )(inp)
@@ -259,7 +259,7 @@ fn term_group_precond(inp: &str) -> IResult<&str, (), ()> {

 fn term_group_infallible(inp: &str) -> JResult<&str, UserInputAst> {
    let (mut inp, (field_name, _, _, _)) =
-        tuple((field_name, space0, char('('), space0))(inp).expect("precondition failed");
+        tuple((field_name, multispace0, char('('), multispace0))(inp).expect("precondition failed");

    let mut terms = Vec::new();
    let mut errs = Vec::new();
@@ -305,7 +305,7 @@ fn exists(inp: &str) -> IResult<&str, UserInputLeaf> {
        UserInputLeaf::Exists {
            field: String::new(),
        },
-        tuple((space0, char('*'))),
+        tuple((multispace0, char('*'))),
    )(inp)
 }

@@ -314,7 +314,7 @@ fn exists_precond(inp: &str) -> IResult<&str, (), ()> {
        (),
        peek(tuple((
            field_name,
-            space0,
+            multispace0,
            char('*'), // when we are here, we know it can't be anything but a exists
        ))),
    )(inp)
@@ -323,7 +323,7 @@ fn exists_precond(inp: &str) -> IResult<&str, (), ()> {

 fn exists_infallible(inp: &str) -> JResult<&str, UserInputAst> {
    let (inp, (field_name, _, _)) =
-        tuple((field_name, space0, char('*')))(inp).expect("precondition failed");
+        tuple((field_name, multispace0, char('*')))(inp).expect("precondition failed");

    let exists = UserInputLeaf::Exists { field: field_name }.into();
    Ok((inp, (exists, Vec::new())))
@@ -349,7 +349,7 @@ fn literal_no_group_infallible(inp: &str) -> JResult<&str, Option<UserInputAst>>
            alt_infallible(
                (
                    (
-                        value((), tuple((tag("IN"), space0, char('[')))),
+                        value((), tuple((tag("IN"), multispace0, char('[')))),
                        map(set_infallible, |(set, errs)| (Some(set), errs)),
                    ),
                    (
@@ -430,8 +430,8 @@ fn range(inp: &str) -> IResult<&str, UserInputLeaf> {
    // check for unbounded range in the form of <5, <=10, >5, >=5
    let elastic_unbounded_range = map(
        tuple((
-            preceded(space0, alt((tag(">="), tag("<="), tag("<"), tag(">")))),
-            preceded(space0, range_term_val()),
+            preceded(multispace0, alt((tag(">="), tag("<="), tag("<"), tag(">")))),
+            preceded(multispace0, range_term_val()),
        )),
        |(comparison_sign, bound)| match comparison_sign {
            ">=" => (UserInputBound::Inclusive(bound), UserInputBound::Unbounded),
@@ -444,7 +444,7 @@ fn range(inp: &str) -> IResult<&str, UserInputLeaf> {
    );

    let lower_bound = map(
-        separated_pair(one_of("{["), space0, range_term_val()),
+        separated_pair(one_of("{["), multispace0, range_term_val()),
        |(boundary_char, lower_bound)| {
            if lower_bound == "*" {
                UserInputBound::Unbounded
@@ -457,7 +457,7 @@ fn range(inp: &str) -> IResult<&str, UserInputLeaf> {
    );

    let upper_bound = map(
-        separated_pair(range_term_val(), space0, one_of("}]")),
+        separated_pair(range_term_val(), multispace0, one_of("}]")),
        |(upper_bound, boundary_char)| {
            if upper_bound == "*" {
                UserInputBound::Unbounded
@@ -469,8 +469,11 @@ fn range(inp: &str) -> IResult<&str, UserInputLeaf> {
        },
    );

-    let lower_to_upper =
-        separated_pair(lower_bound, tuple((space1, tag("TO"), space1)), upper_bound);
+    let lower_to_upper = separated_pair(
+        lower_bound,
+        tuple((multispace1, tag("TO"), multispace1)),
+        upper_bound,
+    );

    map(
        alt((elastic_unbounded_range, lower_to_upper)),
@@ -490,13 +493,16 @@ fn range_infallible(inp: &str) -> JResult<&str, UserInputLeaf> {
            word_infallible("]}"),
            space1_infallible,
            opt_i_err(
-                terminated(tag("TO"), alt((value((), space1), value((), eof)))),
+                terminated(tag("TO"), alt((value((), multispace1), value((), eof)))),
                "missing keyword TO",
            ),
            word_infallible("]}"),
            opt_i_err(one_of("]}"), "missing range delimiter"),
        )),
-        |((lower_bound_kind, _space0, lower, _space1, to, upper, upper_bound_kind), errs)| {
+        |(
+            (lower_bound_kind, _multispace0, lower, _multispace1, to, upper, upper_bound_kind),
+            errs,
+        )| {
            let lower_bound = match (lower_bound_kind, lower) {
                (_, Some("*")) => UserInputBound::Unbounded,
                (_, None) => UserInputBound::Unbounded,
@@ -596,10 +602,10 @@ fn range_infallible(inp: &str) -> JResult<&str, UserInputLeaf> {
 fn set(inp: &str) -> IResult<&str, UserInputLeaf> {
    map(
        preceded(
-            tuple((space0, tag("IN"), space1)),
+            tuple((multispace0, tag("IN"), multispace1)),
            delimited(
-                tuple((char('['), space0)),
-                separated_list0(space1, map(simple_term, |(_, term)| term)),
+                tuple((char('['), multispace0)),
+                separated_list0(multispace1, map(simple_term, |(_, term)| term)),
                char(']'),
            ),
        ),
@@ -667,7 +673,7 @@ fn leaf(inp: &str) -> IResult<&str, UserInputAst> {
    alt((
        delimited(char('('), ast, char(')')),
        map(char('*'), |_| UserInputAst::from(UserInputLeaf::All)),
-        map(preceded(tuple((tag("NOT"), space1)), leaf), negate),
+        map(preceded(tuple((tag("NOT"), multispace1)), leaf), negate),
        literal,
    ))(inp)
 }
@@ -780,27 +786,23 @@ fn binary_operand(inp: &str) -> IResult<&str, BinaryOperand> {
 }

 fn aggregate_binary_expressions(
-    left: UserInputAst,
-    others: Vec<(BinaryOperand, UserInputAst)>,
-) -> UserInputAst {
-    let mut dnf: Vec<Vec<UserInputAst>> = vec![vec![left]];
-    for (operator, operand_ast) in others {
-        match operator {
-            BinaryOperand::And => {
-                if let Some(last) = dnf.last_mut() {
-                    last.push(operand_ast);
-                }
-            }
-            BinaryOperand::Or => {
-                dnf.push(vec![operand_ast]);
-            }
-        }
-    }
-    if dnf.len() == 1 {
-        UserInputAst::and(dnf.into_iter().next().unwrap()) //< safe
+    left: (Option<Occur>, UserInputAst),
+    others: Vec<(Option<BinaryOperand>, Option<Occur>, UserInputAst)>,
+) -> Result<UserInputAst, LenientErrorInternal> {
+    let mut leafs = Vec::with_capacity(others.len() + 1);
+    leafs.push((None, left.0, Some(left.1)));
+    leafs.extend(
+        others
+            .into_iter()
+            .map(|(operand, occur, ast)| (operand, occur, Some(ast))),
+    );
+    // the parameters we pass should statically guarantee we can't get errors
+    // (no prefix BinaryOperand is provided)
+    let (res, mut errors) = aggregate_infallible_expressions(leafs);
+    if errors.is_empty() {
+        Ok(res)
    } else {
-        let conjunctions = dnf.into_iter().map(UserInputAst::and).collect();
-        UserInputAst::or(conjunctions)
+        Err(errors.swap_remove(0))
    }
 }

@@ -816,30 +818,10 @@ fn aggregate_infallible_expressions(
        return (UserInputAst::empty_query(), err);
    }

-    let use_operand = leafs.iter().any(|(operand, _, _)| operand.is_some());
-    let all_operand = leafs
-        .iter()
-        .skip(1)
-        .all(|(operand, _, _)| operand.is_some());
    let early_operand = leafs
        .iter()
        .take(1)
        .all(|(operand, _, _)| operand.is_some());
-    let use_occur = leafs.iter().any(|(_, occur, _)| occur.is_some());
-
-    if use_operand && use_occur {
-        err.push(LenientErrorInternal {
-            pos: 0,
-            message: "Use of mixed occur and boolean operator".to_string(),
-        });
-    }
-
-    if use_operand && !all_operand {
-        err.push(LenientErrorInternal {
-            pos: 0,
-            message: "Missing boolean operator".to_string(),
-        });
-    }

    if early_operand {
        err.push(LenientErrorInternal {
@@ -866,7 +848,15 @@ fn aggregate_infallible_expressions(
                    Some(BinaryOperand::And) => Some(Occur::Must),
                    _ => Some(Occur::Should),
                };
-                clauses.push(vec![(occur.or(default_op), ast.clone())]);
+                if occur == &Some(Occur::MustNot) && default_op == Some(Occur::Should) {
+                    // if occur is MustNot *and* operation is OR, we synthetize a ShouldNot
+                    clauses.push(vec![(
+                        Some(Occur::Should),
+                        ast.clone().unary(Occur::MustNot),
+                    )])
+                } else {
+                    clauses.push(vec![(occur.or(default_op), ast.clone())]);
+                }
            }
            None => {
                let default_op = match next_operator {
@@ -874,7 +864,15 @@ fn aggregate_infallible_expressions(
                    Some(BinaryOperand::Or) => Some(Occur::Should),
                    None => None,
                };
-                clauses.push(vec![(occur.or(default_op), ast.clone())])
+                if occur == &Some(Occur::MustNot) && default_op == Some(Occur::Should) {
+                    // if occur is MustNot *and* operation is OR, we synthetize a ShouldNot
+                    clauses.push(vec![(
+                        Some(Occur::Should),
+                        ast.clone().unary(Occur::MustNot),
+                    )])
+                } else {
+                    clauses.push(vec![(occur.or(default_op), ast.clone())])
+                }
            }
        }
    }
@@ -891,7 +889,12 @@ fn aggregate_infallible_expressions(
            }
        }
        Some(BinaryOperand::Or) => {
-            clauses.push(vec![(last_occur.or(Some(Occur::Should)), last_ast)]);
+            if last_occur == Some(Occur::MustNot) {
+                // if occur is MustNot *and* operation is OR, we synthetize a ShouldNot
+                clauses.push(vec![(Some(Occur::Should), last_ast.unary(Occur::MustNot))]);
+            } else {
+                clauses.push(vec![(last_occur.or(Some(Occur::Should)), last_ast)]);
+            }
        }
        None => clauses.push(vec![(last_occur, last_ast)]),
    }
@@ -917,35 +920,29 @@ fn aggregate_infallible_expressions(
    }
 }

-fn operand_leaf(inp: &str) -> IResult<&str, (BinaryOperand, UserInputAst)> {
-    tuple((
-        terminated(binary_operand, space0),
-        terminated(boosted_leaf, space0),
-    ))(inp)
+fn operand_leaf(inp: &str) -> IResult<&str, (Option<BinaryOperand>, Option<Occur>, UserInputAst)> {
+    map(
+        tuple((
+            terminated(opt(binary_operand), multispace0),
+            terminated(occur_leaf, multispace0),
+        )),
+        |(operand, (occur, ast))| (operand, occur, ast),
+    )(inp)
 }

 fn ast(inp: &str) -> IResult<&str, UserInputAst> {
-    let boolean_expr = map(
-        separated_pair(boosted_leaf, space1, many1(operand_leaf)),
+    let boolean_expr = map_res(
+        separated_pair(occur_leaf, multispace1, many1(operand_leaf)),
        |(left, right)| aggregate_binary_expressions(left, right),
    );
-    let whitespace_separated_leaves = map(separated_list1(space1, occur_leaf), |subqueries| {
-        if subqueries.len() == 1 {
-            let (occur_opt, ast) = subqueries.into_iter().next().unwrap();
-            match occur_opt.unwrap_or(Occur::Should) {
-                Occur::Must | Occur::Should => ast,
-                Occur::MustNot => UserInputAst::Clause(vec![(Some(Occur::MustNot), ast)]),
-            }
+    let single_leaf = map(occur_leaf, |(occur, ast)| {
+        if occur == Some(Occur::MustNot) {
+            ast.unary(Occur::MustNot)
        } else {
-            UserInputAst::Clause(subqueries.into_iter().collect())
+            ast
        }
    });
-
-    delimited(
-        space0,
-        alt((boolean_expr, whitespace_separated_leaves)),
-        space0,
-    )(inp)
+    delimited(multispace0, alt((boolean_expr, single_leaf)), multispace0)(inp)
 }

 fn ast_infallible(inp: &str) -> JResult<&str, UserInputAst> {
@@ -969,7 +966,7 @@ fn ast_infallible(inp: &str) -> JResult<&str, UserInputAst> {
 }

 pub fn parse_to_ast(inp: &str) -> IResult<&str, UserInputAst> {
-    map(delimited(space0, opt(ast), eof), |opt_ast| {
+    map(delimited(multispace0, opt(ast), eof), |opt_ast| {
        rewrite_ast(opt_ast.unwrap_or_else(UserInputAst::empty_query))
    })(inp)
 }
@@ -1145,24 +1142,43 @@ mod test {
    #[test]
    fn test_parse_query_to_ast_binary_op() {
        test_parse_query_to_ast_helper("a AND b", "(+a +b)");
+        test_parse_query_to_ast_helper("a\nAND b", "(+a +b)");
        test_parse_query_to_ast_helper("a OR b", "(?a ?b)");
        test_parse_query_to_ast_helper("a OR b AND c", "(?a ?(+b +c))");
        test_parse_query_to_ast_helper("a AND b         AND c", "(+a +b +c)");
-        test_is_parse_err("a OR b aaa", "(?a ?b *aaa)");
-        test_is_parse_err("a AND b aaa", "(?(+a +b) *aaa)");
-        test_is_parse_err("aaa a OR b ", "(*aaa ?a ?b)");
-        test_is_parse_err("aaa ccc a OR b ", "(*aaa *ccc ?a ?b)");
-        test_is_parse_err("aaa a AND b ", "(*aaa ?(+a +b))");
-        test_is_parse_err("aaa ccc a AND b ", "(*aaa *ccc ?(+a +b))");
+        test_parse_query_to_ast_helper("a OR b aaa", "(?a ?b *aaa)");
+        test_parse_query_to_ast_helper("a AND b aaa", "(?(+a +b) *aaa)");
+        test_parse_query_to_ast_helper("aaa a OR b ", "(*aaa ?a ?b)");
+        test_parse_query_to_ast_helper("aaa ccc a OR b ", "(*aaa *ccc ?a ?b)");
+        test_parse_query_to_ast_helper("aaa a AND b ", "(*aaa ?(+a +b))");
+        test_parse_query_to_ast_helper("aaa ccc a AND b ", "(*aaa *ccc ?(+a +b))");
    }

    #[test]
    fn test_parse_mixed_bool_occur() {
-        test_is_parse_err("a OR b +aaa", "(?a ?b +aaa)");
-        test_is_parse_err("a AND b -aaa", "(?(+a +b) -aaa)");
-        test_is_parse_err("+a OR +b aaa", "(+a +b *aaa)");
-        test_is_parse_err("-a AND -b aaa", "(?(-a -b) *aaa)");
-        test_is_parse_err("-aaa +ccc -a OR b ", "(-aaa +ccc -a ?b)");
+        test_parse_query_to_ast_helper("+a OR +b", "(+a +b)");
+
+        test_parse_query_to_ast_helper("a AND -b", "(+a -b)");
+        test_parse_query_to_ast_helper("-a AND b", "(-a +b)");
+        test_parse_query_to_ast_helper("a AND NOT b", "(+a +(-b))");
+        test_parse_query_to_ast_helper("NOT a AND b", "(+(-a) +b)");
+
+        test_parse_query_to_ast_helper("a AND NOT b AND c", "(+a +(-b) +c)");
+        test_parse_query_to_ast_helper("a AND -b AND c", "(+a -b +c)");
+
+        test_parse_query_to_ast_helper("a OR -b", "(?a ?(-b))");
+        test_parse_query_to_ast_helper("-a OR b", "(?(-a) ?b)");
+        test_parse_query_to_ast_helper("a OR NOT b", "(?a ?(-b))");
+        test_parse_query_to_ast_helper("NOT a OR b", "(?(-a) ?b)");
+
+        test_parse_query_to_ast_helper("a OR NOT b OR c", "(?a ?(-b) ?c)");
+        test_parse_query_to_ast_helper("a OR -b OR c", "(?a ?(-b) ?c)");
+
+        test_parse_query_to_ast_helper("a OR b +aaa", "(?a ?b +aaa)");
+        test_parse_query_to_ast_helper("a AND b -aaa", "(?(+a +b) -aaa)");
+        test_parse_query_to_ast_helper("+a OR +b aaa", "(+a +b *aaa)");
+        test_parse_query_to_ast_helper("-a AND -b aaa", "(?(-a -b) *aaa)");
+        test_parse_query_to_ast_helper("-aaa +ccc -a OR b ", "(-aaa +ccc ?(-a) ?b)");
    }

    #[test]
--- a/src/aggregation/agg_bench.rs
+++ b/src/aggregation/agg_bench.rs
@@ -290,6 +290,41 @@ mod bench {
        });
    }

+    bench_all_cardinalities!(bench_aggregation_terms_many_with_top_hits_agg);
+
+    fn bench_aggregation_terms_many_with_top_hits_agg_card(
+        b: &mut Bencher,
+        cardinality: Cardinality,
+    ) {
+        let index = get_test_index_bench(cardinality).unwrap();
+        let reader = index.reader().unwrap();
+
+        b.iter(|| {
+            let agg_req: Aggregations = serde_json::from_value(json!({
+                "my_texts": {
+                    "terms": { "field": "text_many_terms" },
+                    "aggs": {
+                        "top_hits": { "top_hits":
+                            {
+                                "sort": [
+                                    { "score": "desc" }
+                                ],
+                                "size": 2,
+                                "doc_value_fields": ["score_f64"]
+                            }
+                        }
+                    }
+                },
+            }))
+            .unwrap();
+
+            let collector = get_collector(agg_req);
+
+            let searcher = reader.searcher();
+            searcher.search(&AllQuery, &collector).unwrap()
+        });
+    }
+
    bench_all_cardinalities!(bench_aggregation_terms_many_with_sub_agg);

    fn bench_aggregation_terms_many_with_sub_agg_card(b: &mut Bencher, cardinality: Cardinality) {
--- a/src/aggregation/agg_req.rs
+++ b/src/aggregation/agg_req.rs
@@ -35,7 +35,7 @@ use super::bucket::{
 };
 use super::metric::{
    AverageAggregation, CountAggregation, MaxAggregation, MinAggregation,
-    PercentilesAggregationReq, StatsAggregation, SumAggregation,
+    PercentilesAggregationReq, StatsAggregation, SumAggregation, TopHitsAggregation,
 };

 /// The top-level aggregation request structure, which contains [`Aggregation`] and their user
@@ -93,7 +93,12 @@ impl Aggregation {
    }

    fn get_fast_field_names(&self, fast_field_names: &mut HashSet<String>) {
-        fast_field_names.insert(self.agg.get_fast_field_name().to_string());
+        fast_field_names.extend(
+            self.agg
+                .get_fast_field_names()
+                .iter()
+                .map(|s| s.to_string()),
+        );
        fast_field_names.extend(get_fast_field_names(&self.sub_aggregation));
    }
 }
@@ -147,23 +152,27 @@ pub enum AggregationVariants {
    /// Computes the sum of the extracted values.
    #[serde(rename = "percentiles")]
    Percentiles(PercentilesAggregationReq),
+    /// Finds the top k values matching some order
+    #[serde(rename = "top_hits")]
+    TopHits(TopHitsAggregation),
 }

 impl AggregationVariants {
-    /// Returns the name of the field used by the aggregation.
-    pub fn get_fast_field_name(&self) -> &str {
+    /// Returns the name of the fields used by the aggregation.
+    pub fn get_fast_field_names(&self) -> Vec<&str> {
        match self {
-            AggregationVariants::Terms(terms) => terms.field.as_str(),
-            AggregationVariants::Range(range) => range.field.as_str(),
-            AggregationVariants::Histogram(histogram) => histogram.field.as_str(),
-            AggregationVariants::DateHistogram(histogram) => histogram.field.as_str(),
-            AggregationVariants::Average(avg) => avg.field_name(),
-            AggregationVariants::Count(count) => count.field_name(),
-            AggregationVariants::Max(max) => max.field_name(),
-            AggregationVariants::Min(min) => min.field_name(),
-            AggregationVariants::Stats(stats) => stats.field_name(),
-            AggregationVariants::Sum(sum) => sum.field_name(),
-            AggregationVariants::Percentiles(per) => per.field_name(),
+            AggregationVariants::Terms(terms) => vec![terms.field.as_str()],
+            AggregationVariants::Range(range) => vec![range.field.as_str()],
+            AggregationVariants::Histogram(histogram) => vec![histogram.field.as_str()],
+            AggregationVariants::DateHistogram(histogram) => vec![histogram.field.as_str()],
+            AggregationVariants::Average(avg) => vec![avg.field_name()],
+            AggregationVariants::Count(count) => vec![count.field_name()],
+            AggregationVariants::Max(max) => vec![max.field_name()],
+            AggregationVariants::Min(min) => vec![min.field_name()],
+            AggregationVariants::Stats(stats) => vec![stats.field_name()],
+            AggregationVariants::Sum(sum) => vec![sum.field_name()],
+            AggregationVariants::Percentiles(per) => vec![per.field_name()],
+            AggregationVariants::TopHits(top_hits) => top_hits.field_names(),
        }
    }

--- a/src/aggregation/agg_req_with_accessor.rs
+++ b/src/aggregation/agg_req_with_accessor.rs
@@ -1,6 +1,9 @@
 //! This will enhance the request tree with access to the fastfield and metadata.

-use columnar::{Column, ColumnBlockAccessor, ColumnType, StrColumn};
+use std::collections::HashMap;
+use std::io;
+
+use columnar::{Column, ColumnBlockAccessor, ColumnType, DynamicColumn, StrColumn};

 use super::agg_limits::ResourceLimitGuard;
 use super::agg_req::{Aggregation, AggregationVariants, Aggregations};
@@ -14,7 +17,7 @@ use super::metric::{
 use super::segment_agg_result::AggregationLimits;
 use super::VecWithNames;
 use crate::aggregation::{f64_to_fastfield_u64, Key};
-use crate::SegmentReader;
+use crate::{SegmentOrdinal, SegmentReader};

 #[derive(Default)]
 pub(crate) struct AggregationsWithAccessor {
@@ -32,6 +35,7 @@ impl AggregationsWithAccessor {
 }

 pub struct AggregationWithAccessor {
+    pub(crate) segment_ordinal: SegmentOrdinal,
    /// In general there can be buckets without fast field access, e.g. buckets that are created
    /// based on search terms. That is not that case currently, but eventually this needs to be
    /// Option or moved.
@@ -44,10 +48,16 @@ pub struct AggregationWithAccessor {
    pub(crate) limits: ResourceLimitGuard,
    pub(crate) column_block_accessor: ColumnBlockAccessor<u64>,
    /// Used for missing term aggregation, which checks all columns for existence.
+    /// And also for `top_hits` aggregation, which may sort on multiple fields.
    /// By convention the missing aggregation is chosen, when this property is set
    /// (instead bein set in `agg`).
    /// If this needs to used by other aggregations, we need to refactor this.
-    pub(crate) accessors: Vec<Column<u64>>,
+    // NOTE: we can make all other aggregations use this instead of the `accessor` and `field_type`
+    // (making them obsolete) But will it have a performance impact?
+    pub(crate) accessors: Vec<(Column<u64>, ColumnType)>,
+    /// Map field names to all associated column accessors.
+    /// This field is used for `docvalue_fields`, which is currently only supported for `top_hits`.
+    pub(crate) value_accessors: HashMap<String, Vec<DynamicColumn>>,
    pub(crate) agg: Aggregation,
 }

@@ -57,19 +67,55 @@ impl AggregationWithAccessor {
        agg: &Aggregation,
        sub_aggregation: &Aggregations,
        reader: &SegmentReader,
+        segment_ordinal: SegmentOrdinal,
        limits: AggregationLimits,
    ) -> crate::Result<Vec<AggregationWithAccessor>> {
-        let add_agg_with_accessor = |accessor: Column<u64>,
+        let mut agg = agg.clone();
+
+        let add_agg_with_accessor = |agg: &Aggregation,
+                                     accessor: Column<u64>,
                                     column_type: ColumnType,
                                     aggs: &mut Vec<AggregationWithAccessor>|
         -> crate::Result<()> {
            let res = AggregationWithAccessor {
+                segment_ordinal,
                accessor,
-                accessors: Vec::new(),
+                accessors: Default::default(),
+                value_accessors: Default::default(),
                field_type: column_type,
                sub_aggregation: get_aggs_with_segment_accessor_and_validate(
                    sub_aggregation,
                    reader,
+                    segment_ordinal,
+                    &limits,
+                )?,
+                agg: agg.clone(),
+                limits: limits.new_guard(),
+                missing_value_for_accessor: None,
+                str_dict_column: None,
+                column_block_accessor: Default::default(),
+            };
+            aggs.push(res);
+            Ok(())
+        };
+
+        let add_agg_with_accessors = |agg: &Aggregation,
+                                      accessors: Vec<(Column<u64>, ColumnType)>,
+                                      aggs: &mut Vec<AggregationWithAccessor>,
+                                      value_accessors: HashMap<String, Vec<DynamicColumn>>|
+         -> crate::Result<()> {
+            let (accessor, field_type) = accessors.first().expect("at least one accessor");
+            let res = AggregationWithAccessor {
+                segment_ordinal,
+                // TODO: We should do away with the `accessor` field altogether
+                accessor: accessor.clone(),
+                value_accessors,
+                field_type: *field_type,
+                accessors,
+                sub_aggregation: get_aggs_with_segment_accessor_and_validate(
+                    sub_aggregation,
+                    reader,
+                    segment_ordinal,
                    &limits,
                )?,
                agg: agg.clone(),
@@ -84,32 +130,36 @@ impl AggregationWithAccessor {

        let mut res: Vec<AggregationWithAccessor> = Vec::new();
        use AggregationVariants::*;
-        match &agg.agg {
+
+        match agg.agg {
            Range(RangeAggregation {
-                field: field_name, ..
+                field: ref field_name,
+                ..
            }) => {
                let (accessor, column_type) =
                    get_ff_reader(reader, field_name, Some(get_numeric_or_date_column_types()))?;
-                add_agg_with_accessor(accessor, column_type, &mut res)?;
+                add_agg_with_accessor(&agg, accessor, column_type, &mut res)?;
            }
            Histogram(HistogramAggregation {
-                field: field_name, ..
+                field: ref field_name,
+                ..
            }) => {
                let (accessor, column_type) =
                    get_ff_reader(reader, field_name, Some(get_numeric_or_date_column_types()))?;
-                add_agg_with_accessor(accessor, column_type, &mut res)?;
+                add_agg_with_accessor(&agg, accessor, column_type, &mut res)?;
            }
            DateHistogram(DateHistogramAggregationReq {
-                field: field_name, ..
+                field: ref field_name,
+                ..
            }) => {
                let (accessor, column_type) =
                    // Only DateTime is supported for DateHistogram
                    get_ff_reader(reader, field_name, Some(&[ColumnType::DateTime]))?;
-                add_agg_with_accessor(accessor, column_type, &mut res)?;
+                add_agg_with_accessor(&agg, accessor, column_type, &mut res)?;
            }
            Terms(TermsAggregation {
-                field: field_name,
-                missing,
+                field: ref field_name,
+                ref missing,
                ..
            }) => {
                let str_dict_column = reader.fast_fields().str(field_name)?;
@@ -119,8 +169,8 @@ impl AggregationWithAccessor {
                    ColumnType::F64,
                    ColumnType::Str,
                    ColumnType::DateTime,
+                    ColumnType::Bool,
                    // ColumnType::Bytes Unsupported
-                    // ColumnType::Bool Unsupported
                    // ColumnType::IpAddr Unsupported
                ];

@@ -162,24 +212,11 @@ impl AggregationWithAccessor {
                    let column_and_types =
                        get_all_ff_reader_or_empty(reader, field_name, None, fallback_type)?;

-                    let accessors: Vec<Column> =
-                        column_and_types.iter().map(|(a, _)| a.clone()).collect();
-                    let agg_wit_acc = AggregationWithAccessor {
-                        missing_value_for_accessor: None,
-                        accessor: accessors[0].clone(),
-                        accessors,
-                        field_type: ColumnType::U64,
-                        sub_aggregation: get_aggs_with_segment_accessor_and_validate(
-                            sub_aggregation,
-                            reader,
-                            &limits,
-                        )?,
-                        agg: agg.clone(),
-                        str_dict_column: str_dict_column.clone(),
-                        limits: limits.new_guard(),
-                        column_block_accessor: Default::default(),
-                    };
-                    res.push(agg_wit_acc);
+                    let accessors = column_and_types
+                        .iter()
+                        .map(|c_t| (c_t.0.clone(), c_t.1))
+                        .collect();
+                    add_agg_with_accessors(&agg, accessors, &mut res, Default::default())?;
                }

                for (accessor, column_type) in column_and_types {
@@ -189,21 +226,25 @@ impl AggregationWithAccessor {
                        missing.clone()
                    };

-                    let missing_value_for_accessor =
-                        if let Some(missing) = missing_value_term_agg.as_ref() {
-                            get_missing_val(column_type, missing, agg.agg.get_fast_field_name())?
-                        } else {
-                            None
-                        };
+                    let missing_value_for_accessor = if let Some(missing) =
+                        missing_value_term_agg.as_ref()
+                    {
+                        get_missing_val(column_type, missing, agg.agg.get_fast_field_names()[0])?
+                    } else {
+                        None
+                    };

                    let agg = AggregationWithAccessor {
+                        segment_ordinal,
                        missing_value_for_accessor,
                        accessor,
-                        accessors: Vec::new(),
+                        accessors: Default::default(),
+                        value_accessors: Default::default(),
                        field_type: column_type,
                        sub_aggregation: get_aggs_with_segment_accessor_and_validate(
                            sub_aggregation,
                            reader,
+                            segment_ordinal,
                            &limits,
                        )?,
                        agg: agg.clone(),
@@ -215,34 +256,63 @@ impl AggregationWithAccessor {
                }
            }
            Average(AverageAggregation {
-                field: field_name, ..
+                field: ref field_name,
+                ..
            })
            | Count(CountAggregation {
-                field: field_name, ..
+                field: ref field_name,
+                ..
            })
            | Max(MaxAggregation {
-                field: field_name, ..
+                field: ref field_name,
+                ..
            })
            | Min(MinAggregation {
-                field: field_name, ..
+                field: ref field_name,
+                ..
            })
            | Stats(StatsAggregation {
-                field: field_name, ..
+                field: ref field_name,
+                ..
            })
            | Sum(SumAggregation {
-                field: field_name, ..
+                field: ref field_name,
+                ..
            }) => {
                let (accessor, column_type) =
                    get_ff_reader(reader, field_name, Some(get_numeric_or_date_column_types()))?;
-                add_agg_with_accessor(accessor, column_type, &mut res)?;
+                add_agg_with_accessor(&agg, accessor, column_type, &mut res)?;
            }
-            Percentiles(percentiles) => {
+            Percentiles(ref percentiles) => {
                let (accessor, column_type) = get_ff_reader(
                    reader,
                    percentiles.field_name(),
                    Some(get_numeric_or_date_column_types()),
                )?;
-                add_agg_with_accessor(accessor, column_type, &mut res)?;
+                add_agg_with_accessor(&agg, accessor, column_type, &mut res)?;
+            }
+            TopHits(ref mut top_hits) => {
+                top_hits.validate_and_resolve(reader.fast_fields().columnar())?;
+                let accessors: Vec<(Column<u64>, ColumnType)> = top_hits
+                    .field_names()
+                    .iter()
+                    .map(|field| {
+                        get_ff_reader(reader, field, Some(get_numeric_or_date_column_types()))
+                    })
+                    .collect::<crate::Result<_>>()?;
+
+                let value_accessors = top_hits
+                    .value_field_names()
+                    .iter()
+                    .map(|field_name| {
+                        Ok((
+                            field_name.to_string(),
+                            get_dynamic_columns(reader, field_name)?,
+                        ))
+                    })
+                    .collect::<crate::Result<_>>()?;
+
+                add_agg_with_accessors(&agg, accessors, &mut res, value_accessors)?;
            }
        };

@@ -284,6 +354,7 @@ fn get_numeric_or_date_column_types() -> &'static [ColumnType] {
 pub(crate) fn get_aggs_with_segment_accessor_and_validate(
    aggs: &Aggregations,
    reader: &SegmentReader,
+    segment_ordinal: SegmentOrdinal,
    limits: &AggregationLimits,
 ) -> crate::Result<AggregationsWithAccessor> {
    let mut aggss = Vec::new();
@@ -292,6 +363,7 @@ pub(crate) fn get_aggs_with_segment_accessor_and_validate(
            agg,
            agg.sub_aggregation(),
            reader,
+            segment_ordinal,
            limits.clone(),
        )?;
        for agg in aggs {
@@ -321,6 +393,19 @@ fn get_ff_reader(
    Ok(ff_field_with_type)
 }

+fn get_dynamic_columns(
+    reader: &SegmentReader,
+    field_name: &str,
+) -> crate::Result<Vec<columnar::DynamicColumn>> {
+    let ff_fields = reader.fast_fields().dynamic_column_handles(field_name)?;
+    let cols = ff_fields
+        .iter()
+        .map(|h| h.open())
+        .collect::<io::Result<_>>()?;
+    assert!(!ff_fields.is_empty(), "field {} not found", field_name);
+    Ok(cols)
+}
+
 /// Get all fast field reader or empty as default.
 ///
 /// Is guaranteed to return at least one column.
--- a/src/aggregation/agg_result.rs
+++ b/src/aggregation/agg_result.rs
@@ -8,7 +8,7 @@ use rustc_hash::FxHashMap;
 use serde::{Deserialize, Serialize};

 use super::bucket::GetDocCount;
-use super::metric::{PercentilesMetricResult, SingleMetricResult, Stats};
+use super::metric::{PercentilesMetricResult, SingleMetricResult, Stats, TopHitsMetricResult};
 use super::{AggregationError, Key};
 use crate::TantivyError;

@@ -90,8 +90,10 @@ pub enum MetricResult {
    Stats(Stats),
    /// Sum metric result.
    Sum(SingleMetricResult),
-    /// Sum metric result.
+    /// Percentiles metric result.
    Percentiles(PercentilesMetricResult),
+    /// Top hits metric result
+    TopHits(TopHitsMetricResult),
 }

 impl MetricResult {
@@ -106,6 +108,9 @@ impl MetricResult {
            MetricResult::Percentiles(_) => Err(TantivyError::AggregationError(
                AggregationError::InvalidRequest("percentiles can't be used to order".to_string()),
            )),
+            MetricResult::TopHits(_) => Err(TantivyError::AggregationError(
+                AggregationError::InvalidRequest("top_hits can't be used to order".to_string()),
+            )),
        }
    }
 }
--- a/src/aggregation/agg_tests.rs
+++ b/src/aggregation/agg_tests.rs
@@ -587,6 +587,9 @@ fn test_aggregation_on_json_object() {
    let schema = schema_builder.build();
    let index = Index::create_in_ram(schema);
    let mut index_writer: IndexWriter = index.writer_for_tests().unwrap();
+    index_writer
+        .add_document(doc!(json => json!({"color": "red"})))
+        .unwrap();
    index_writer
        .add_document(doc!(json => json!({"color": "red"})))
        .unwrap();
@@ -614,8 +617,8 @@ fn test_aggregation_on_json_object() {
        &serde_json::json!({
            "jsonagg": {
                "buckets": [
+                    {"doc_count": 2, "key": "red"},
                    {"doc_count": 1, "key": "blue"},
-                    {"doc_count": 1, "key": "red"}
                ],
                "doc_count_error_upper_bound": 0,
                "sum_other_doc_count": 0
@@ -637,6 +640,9 @@ fn test_aggregation_on_nested_json_object() {
    index_writer
        .add_document(doc!(json => json!({"color.dot": "blue", "color": {"nested":"blue"} })))
        .unwrap();
+    index_writer
+        .add_document(doc!(json => json!({"color.dot": "blue", "color": {"nested":"blue"} })))
+        .unwrap();
    index_writer.commit().unwrap();
    let reader = index.reader().unwrap();
    let searcher = reader.searcher();
@@ -664,7 +670,7 @@ fn test_aggregation_on_nested_json_object() {
        &serde_json::json!({
            "jsonagg1": {
                "buckets": [
-                    {"doc_count": 1, "key": "blue"},
+                    {"doc_count": 2, "key": "blue"},
                    {"doc_count": 1, "key": "red"}
                ],
                "doc_count_error_upper_bound": 0,
@@ -672,7 +678,7 @@ fn test_aggregation_on_nested_json_object() {
            },
            "jsonagg2": {
                "buckets": [
-                    {"doc_count": 1, "key": "blue"},
+                    {"doc_count": 2, "key": "blue"},
                    {"doc_count": 1, "key": "red"}
                ],
                "doc_count_error_upper_bound": 0,
@@ -814,6 +820,12 @@ fn test_aggregation_on_json_object_mixed_types() {
        .unwrap();
    index_writer.commit().unwrap();
    // => Segment with all values text
+    index_writer
+        .add_document(doc!(json => json!({"mixed_type": "blue"})))
+        .unwrap();
+    index_writer
+        .add_document(doc!(json => json!({"mixed_type": "blue"})))
+        .unwrap();
    index_writer
        .add_document(doc!(json => json!({"mixed_type": "blue"})))
        .unwrap();
@@ -825,6 +837,9 @@ fn test_aggregation_on_json_object_mixed_types() {
    index_writer.commit().unwrap();

    // => Segment with mixed values
+    index_writer
+        .add_document(doc!(json => json!({"mixed_type": "red"})))
+        .unwrap();
    index_writer
        .add_document(doc!(json => json!({"mixed_type": "red"})))
        .unwrap();
@@ -870,6 +885,8 @@ fn test_aggregation_on_json_object_mixed_types() {

    let aggregation_results = searcher.search(&AllQuery, &aggregation_collector).unwrap();
    let aggregation_res_json = serde_json::to_value(aggregation_results).unwrap();
+    // pretty print as json
+    use pretty_assertions::assert_eq;
    assert_eq!(
        &aggregation_res_json,
        &serde_json::json!({
@@ -885,9 +902,9 @@ fn test_aggregation_on_json_object_mixed_types() {
            "buckets": [
              { "doc_count": 1, "key": 10.0, "min_price": { "value": 10.0 } },
              { "doc_count": 1, "key": -20.5, "min_price": { "value": -20.5 } },
-              // TODO bool is also not yet handled in aggregation
-              { "doc_count": 1, "key": "blue", "min_price": { "value": null } },
-              { "doc_count": 1, "key": "red", "min_price": { "value": null } },
+              { "doc_count": 2, "key": "red", "min_price": { "value": null } },
+              { "doc_count": 2, "key": 1.0, "key_as_string": "true", "min_price": { "value": null } },
+              { "doc_count": 3, "key": "blue", "min_price": { "value": null } },
            ],
            "sum_other_doc_count": 0
          }
--- a/src/aggregation/bucket/histogram/date_histogram.rs
+++ b/src/aggregation/bucket/histogram/date_histogram.rs
@@ -1,7 +1,7 @@
 use serde::{Deserialize, Serialize};

 use super::{HistogramAggregation, HistogramBounds};
-use crate::aggregation::AggregationError;
+use crate::aggregation::*;

 /// DateHistogramAggregation is similar to `HistogramAggregation`, but it can only be used with date
 /// type.
@@ -307,6 +307,7 @@ pub mod tests {
    ) -> crate::Result<Index> {
        let mut schema_builder = Schema::builder();
        schema_builder.add_date_field("date", FAST);
+        schema_builder.add_json_field("mixed", FAST);
        schema_builder.add_text_field("text", FAST | STRING);
        schema_builder.add_text_field("text2", FAST | STRING);
        let schema = schema_builder.build();
@@ -351,8 +352,10 @@ pub mod tests {
        let docs = vec![
            vec![r#"{ "date": "2015-01-01T12:10:30Z", "text": "aaa" }"#],
            vec![r#"{ "date": "2015-01-01T11:11:30Z", "text": "bbb" }"#],
+            vec![r#"{ "date": "2015-01-01T11:11:30Z", "text": "bbb" }"#],
            vec![r#"{ "date": "2015-01-02T00:00:00Z", "text": "bbb" }"#],
            vec![r#"{ "date": "2015-01-06T00:00:00Z", "text": "ccc" }"#],
+            vec![r#"{ "date": "2015-01-06T00:00:00Z", "text": "ccc" }"#],
        ];
        let index = get_test_index_from_docs(merge_segments, &docs).unwrap();

@@ -381,7 +384,7 @@ pub mod tests {
                        {
                            "key_as_string" : "2015-01-01T00:00:00Z",
                            "key" : 1420070400000.0,
-                            "doc_count" : 4
+                            "doc_count" : 6
                        }
                    ]
                }
@@ -419,15 +422,15 @@ pub mod tests {
                    {
                        "key_as_string" : "2015-01-01T00:00:00Z",
                        "key" : 1420070400000.0,
-                        "doc_count" : 4,
+                        "doc_count" : 6,
                        "texts": {
                            "buckets": [
                                {
-                                "doc_count": 2,
+                                "doc_count": 3,
                                "key": "bbb"
                                },
                                {
-                                "doc_count": 1,
+                                "doc_count": 2,
                                "key": "ccc"
                                },
                                {
@@ -466,7 +469,7 @@ pub mod tests {
                "sales_over_time": {
                    "buckets": [
                        {
-                            "doc_count": 2,
+                            "doc_count": 3,
                            "key": 1420070400000.0,
                            "key_as_string": "2015-01-01T00:00:00Z"
                        },
@@ -491,7 +494,7 @@ pub mod tests {
                            "key_as_string": "2015-01-05T00:00:00Z"
                        },
                        {
-                            "doc_count": 1,
+                            "doc_count": 2,
                            "key": 1420502400000.0,
                            "key_as_string": "2015-01-06T00:00:00Z"
                        }
@@ -532,7 +535,7 @@ pub mod tests {
                            "key_as_string": "2014-12-31T00:00:00Z"
                        },
                        {
-                            "doc_count": 2,
+                            "doc_count": 3,
                            "key": 1420070400000.0,
                            "key_as_string": "2015-01-01T00:00:00Z"
                        },
@@ -557,7 +560,7 @@ pub mod tests {
                            "key_as_string": "2015-01-05T00:00:00Z"
                        },
                        {
-                            "doc_count": 1,
+                            "doc_count": 2,
                            "key": 1420502400000.0,
                            "key_as_string": "2015-01-06T00:00:00Z"
                        },
--- a/src/aggregation/bucket/histogram/histogram.rs
+++ b/src/aggregation/bucket/histogram/histogram.rs
@@ -20,7 +20,7 @@ use crate::aggregation::intermediate_agg_result::{
 use crate::aggregation::segment_agg_result::{
    build_segment_agg_collector, AggregationLimits, SegmentAggregationCollector,
 };
-use crate::aggregation::{f64_from_fastfield_u64, format_date};
+use crate::aggregation::*;
 use crate::TantivyError;

 /// Histogram is a bucket aggregation, where buckets are created dynamically for given `interval`.
@@ -73,6 +73,7 @@ pub struct HistogramAggregation {
    pub field: String,
    /// The interval to chunk your data range. Each bucket spans a value range of [0..interval).
    /// Must be a positive value.
+    #[serde(deserialize_with = "deserialize_f64")]
    pub interval: f64,
    /// Intervals implicitly defines an absolute grid of buckets `[interval * k, interval * (k +
    /// 1))`.
@@ -85,6 +86,7 @@ pub struct HistogramAggregation {
    /// fall into the buckets with the key 0 and 10.
    /// With offset 5 and interval 10, they would both fall into the bucket with they key 5 and the
    /// range [5..15)
+    #[serde(default, deserialize_with = "deserialize_option_f64")]
    pub offset: Option<f64>,
    /// The minimum number of documents in a bucket to be returned. Defaults to 0.
    pub min_doc_count: Option<u64>,
@@ -596,10 +598,13 @@ mod tests {

    use super::*;
    use crate::aggregation::agg_req::Aggregations;
+    use crate::aggregation::agg_result::AggregationResults;
    use crate::aggregation::tests::{
        exec_request, exec_request_with_query, exec_request_with_query_and_memory_limit,
        get_test_index_2_segments, get_test_index_from_values, get_test_index_with_num_docs,
    };
+    use crate::aggregation::AggregationCollector;
+    use crate::query::AllQuery;

    #[test]
    fn histogram_test_crooked_values() -> crate::Result<()> {
@@ -1351,6 +1356,35 @@ mod tests {
            })
        );

+        Ok(())
+    }
+    #[test]
+    fn test_aggregation_histogram_empty_index() -> crate::Result<()> {
+        // test index without segments
+        let values = vec![];
+
+        let index = get_test_index_from_values(false, &values)?;
+
+        let agg_req_1: Aggregations = serde_json::from_value(json!({
+            "myhisto": {
+                "histogram": {
+                    "field": "score",
+                    "interval": 10.0
+                },
+            }
+        }))
+        .unwrap();
+
+        let collector = AggregationCollector::from_aggs(agg_req_1, Default::default());
+
+        let reader = index.reader()?;
+        let searcher = reader.searcher();
+        let agg_res: AggregationResults = searcher.search(&AllQuery, &collector).unwrap();
+
+        let res: Value = serde_json::from_str(&serde_json::to_string(&agg_res)?)?;
+        // Make sure the result structure is correct
+        assert_eq!(res["myhisto"]["buckets"].as_array().unwrap().len(), 0);
+
        Ok(())
    }
 }
--- a/src/aggregation/bucket/range.rs
+++ b/src/aggregation/bucket/range.rs
@@ -14,9 +14,7 @@ use crate::aggregation::intermediate_agg_result::{
 use crate::aggregation::segment_agg_result::{
    build_segment_agg_collector, SegmentAggregationCollector,
 };
-use crate::aggregation::{
-    f64_from_fastfield_u64, f64_to_fastfield_u64, format_date, Key, SerializedKey,
-};
+use crate::aggregation::*;
 use crate::TantivyError;

 /// Provide user-defined buckets to aggregate on.
@@ -72,11 +70,19 @@ pub struct RangeAggregationRange {
    pub key: Option<String>,
    /// The from range value, which is inclusive in the range.
    /// `None` equals to an open ended interval.
-    #[serde(skip_serializing_if = "Option::is_none", default)]
+    #[serde(
+        skip_serializing_if = "Option::is_none",
+        default,
+        deserialize_with = "deserialize_option_f64"
+    )]
    pub from: Option<f64>,
    /// The to range value, which is not inclusive in the range.
    /// `None` equals to an open ended interval.
-    #[serde(skip_serializing_if = "Option::is_none", default)]
+    #[serde(
+        skip_serializing_if = "Option::is_none",
+        default,
+        deserialize_with = "deserialize_option_f64"
+    )]
    pub to: Option<f64>,
 }

--- a/src/aggregation/bucket/term_agg.rs
+++ b/src/aggregation/bucket/term_agg.rs
@@ -99,24 +99,15 @@ pub struct TermsAggregation {
    #[serde(skip_serializing_if = "Option::is_none", default)]
    pub size: Option<u32>,

-    /// Unused by tantivy.
-    ///
-    /// Since tantivy doesn't know shards, this parameter is merely there to be used by consumers
-    /// of tantivy. shard_size is the number of terms returned by each shard.
-    /// The default value in elasticsearch is size * 1.5 + 10.
-    ///
-    /// Should never be smaller than size.
-    #[serde(skip_serializing_if = "Option::is_none", default)]
-    #[serde(alias = "shard_size")]
-    pub split_size: Option<u32>,
-
-    /// The get more accurate results, we fetch more than `size` from each segment.
+    /// To get more accurate results, we fetch more than `size` from each segment.
    ///
    /// Increasing this value is will increase the cost for more accuracy.
    ///
    /// Defaults to 10 * size.
    #[serde(skip_serializing_if = "Option::is_none", default)]
-    pub segment_size: Option<u32>,
+    #[serde(alias = "segment_size")]
+    #[serde(alias = "split_size")]
+    pub shard_size: Option<u32>,

    /// If you set the `show_term_doc_count_error` parameter to true, the terms aggregation will
    /// include doc_count_error_upper_bound, which is an upper bound to the error on the
@@ -205,7 +196,7 @@ impl TermsAggregationInternal {
    pub(crate) fn from_req(req: &TermsAggregation) -> Self {
        let size = req.size.unwrap_or(10);

-        let mut segment_size = req.segment_size.unwrap_or(size * 10);
+        let mut segment_size = req.shard_size.unwrap_or(size * 10);

        let order = req.order.clone().unwrap_or_default();
        segment_size = segment_size.max(size);
@@ -256,7 +247,7 @@ pub struct SegmentTermCollector {
    term_buckets: TermBuckets,
    req: TermsAggregationInternal,
    blueprint: Option<Box<dyn SegmentAggregationCollector>>,
-    field_type: ColumnType,
+    column_type: ColumnType,
    accessor_idx: usize,
 }

@@ -355,7 +346,7 @@ impl SegmentTermCollector {
        field_type: ColumnType,
        accessor_idx: usize,
    ) -> crate::Result<Self> {
-        if field_type == ColumnType::Bytes || field_type == ColumnType::Bool {
+        if field_type == ColumnType::Bytes {
            return Err(TantivyError::InvalidArgument(format!(
                "terms aggregation is not supported for column type {:?}",
                field_type
@@ -389,7 +380,7 @@ impl SegmentTermCollector {
            req: TermsAggregationInternal::from_req(req),
            term_buckets,
            blueprint,
-            field_type,
+            column_type: field_type,
            accessor_idx,
        })
    }
@@ -466,7 +457,7 @@ impl SegmentTermCollector {
                Ok(intermediate_entry)
            };

-        if self.field_type == ColumnType::Str {
+        if self.column_type == ColumnType::Str {
            let term_dict = agg_with_accessor
                .str_dict_column
                .as_ref()
@@ -531,28 +522,34 @@ impl SegmentTermCollector {
                        });
                }
            }
-        } else if self.field_type == ColumnType::DateTime {
+        } else if self.column_type == ColumnType::DateTime {
            for (val, doc_count) in entries {
                let intermediate_entry = into_intermediate_bucket_entry(val, doc_count)?;
                let val = i64::from_u64(val);
                let date = format_date(val)?;
                dict.insert(IntermediateKey::Str(date), intermediate_entry);
            }
+        } else if self.column_type == ColumnType::Bool {
+            for (val, doc_count) in entries {
+                let intermediate_entry = into_intermediate_bucket_entry(val, doc_count)?;
+                let val = bool::from_u64(val);
+                dict.insert(IntermediateKey::Bool(val), intermediate_entry);
+            }
        } else {
            for (val, doc_count) in entries {
                let intermediate_entry = into_intermediate_bucket_entry(val, doc_count)?;
-                let val = f64_from_fastfield_u64(val, &self.field_type);
+                let val = f64_from_fastfield_u64(val, &self.column_type);
                dict.insert(IntermediateKey::F64(val), intermediate_entry);
            }
        };

-        Ok(IntermediateBucketResult::Terms(
-            IntermediateTermBucketResult {
+        Ok(IntermediateBucketResult::Terms {
+            buckets: IntermediateTermBucketResult {
                entries: dict,
                sum_other_doc_count,
                doc_count_error_upper_bound: term_doc_count_before_cutoff,
            },
-        ))
+        })
    }
 }

@@ -1365,7 +1362,7 @@ mod tests {

    #[test]
    fn terms_aggregation_different_tokenizer_on_ff_test() -> crate::Result<()> {
-        let terms = vec!["Hello Hello", "Hallo Hallo"];
+        let terms = vec!["Hello Hello", "Hallo Hallo", "Hallo Hallo"];

        let index = get_test_index_from_terms(true, &[terms])?;

@@ -1383,7 +1380,7 @@ mod tests {
        println!("{}", serde_json::to_string_pretty(&res).unwrap());

        assert_eq!(res["my_texts"]["buckets"][0]["key"], "Hallo Hallo");
-        assert_eq!(res["my_texts"]["buckets"][0]["doc_count"], 1);
+        assert_eq!(res["my_texts"]["buckets"][0]["doc_count"], 2);

        assert_eq!(res["my_texts"]["buckets"][1]["key"], "Hello Hello");
        assert_eq!(res["my_texts"]["buckets"][1]["doc_count"], 1);
@@ -1894,4 +1891,40 @@ mod tests {

        Ok(())
    }
+
+    #[test]
+    fn terms_aggregation_bool() -> crate::Result<()> {
+        let mut schema_builder = Schema::builder();
+        let field = schema_builder.add_bool_field("bool_field", FAST);
+        let schema = schema_builder.build();
+        let index = Index::create_in_ram(schema);
+        {
+            let mut writer = index.writer_with_num_threads(1, 15_000_000)?;
+            writer.add_document(doc!(field=>true))?;
+            writer.add_document(doc!(field=>false))?;
+            writer.add_document(doc!(field=>true))?;
+            writer.commit()?;
+        }
+
+        let agg_req: Aggregations = serde_json::from_value(json!({
+            "my_bool": {
+                "terms": {
+                    "field": "bool_field"
+                },
+            }
+        }))
+        .unwrap();
+
+        let res = exec_request_with_query(agg_req, &index, None)?;
+
+        assert_eq!(res["my_bool"]["buckets"][0]["key"], 1.0);
+        assert_eq!(res["my_bool"]["buckets"][0]["key_as_string"], "true");
+        assert_eq!(res["my_bool"]["buckets"][0]["doc_count"], 2);
+        assert_eq!(res["my_bool"]["buckets"][1]["key"], 0.0);
+        assert_eq!(res["my_bool"]["buckets"][1]["key_as_string"], "false");
+        assert_eq!(res["my_bool"]["buckets"][1]["doc_count"], 1);
+        assert_eq!(res["my_bool"]["buckets"][2]["key"], serde_json::Value::Null);
+
+        Ok(())
+    }
 }
--- a/src/aggregation/bucket/term_missing_agg.rs
+++ b/src/aggregation/bucket/term_missing_agg.rs
@@ -73,11 +73,13 @@ impl SegmentAggregationCollector for TermMissingAgg {

        entries.insert(missing.into(), missing_entry);

-        let bucket = IntermediateBucketResult::Terms(IntermediateTermBucketResult {
-            entries,
-            sum_other_doc_count: 0,
-            doc_count_error_upper_bound: 0,
-        });
+        let bucket = IntermediateBucketResult::Terms {
+            buckets: IntermediateTermBucketResult {
+                entries,
+                sum_other_doc_count: 0,
+                doc_count_error_upper_bound: 0,
+            },
+        };

        results.push(name, IntermediateAggregationResult::Bucket(bucket))?;

@@ -90,7 +92,10 @@ impl SegmentAggregationCollector for TermMissingAgg {
        agg_with_accessor: &mut AggregationsWithAccessor,
    ) -> crate::Result<()> {
        let agg = &mut agg_with_accessor.aggs.values[self.accessor_idx];
-        let has_value = agg.accessors.iter().any(|acc| acc.index.has_value(doc));
+        let has_value = agg
+            .accessors
+            .iter()
+            .any(|(acc, _)| acc.index.has_value(doc));
        if !has_value {
            self.missing_count += 1;
            if let Some(sub_agg) = self.sub_agg.as_mut() {
--- a/src/aggregation/collector.rs
+++ b/src/aggregation/collector.rs
@@ -8,7 +8,7 @@ use super::segment_agg_result::{
 };
 use crate::aggregation::agg_req_with_accessor::get_aggs_with_segment_accessor_and_validate;
 use crate::collector::{Collector, SegmentCollector};
-use crate::{DocId, SegmentReader, TantivyError};
+use crate::{DocId, SegmentOrdinal, SegmentReader, TantivyError};

 /// The default max bucket count, before the aggregation fails.
 pub const DEFAULT_BUCKET_LIMIT: u32 = 65000;
@@ -64,10 +64,15 @@ impl Collector for DistributedAggregationCollector {

    fn for_segment(
        &self,
-        _segment_local_id: crate::SegmentOrdinal,
+        segment_local_id: crate::SegmentOrdinal,
        reader: &crate::SegmentReader,
    ) -> crate::Result<Self::Child> {
-        AggregationSegmentCollector::from_agg_req_and_reader(&self.agg, reader, &self.limits)
+        AggregationSegmentCollector::from_agg_req_and_reader(
+            &self.agg,
+            reader,
+            segment_local_id,
+            &self.limits,
+        )
    }

    fn requires_scoring(&self) -> bool {
@@ -89,10 +94,15 @@ impl Collector for AggregationCollector {

    fn for_segment(
        &self,
-        _segment_local_id: crate::SegmentOrdinal,
+        segment_local_id: crate::SegmentOrdinal,
        reader: &crate::SegmentReader,
    ) -> crate::Result<Self::Child> {
-        AggregationSegmentCollector::from_agg_req_and_reader(&self.agg, reader, &self.limits)
+        AggregationSegmentCollector::from_agg_req_and_reader(
+            &self.agg,
+            reader,
+            segment_local_id,
+            &self.limits,
+        )
    }

    fn requires_scoring(&self) -> bool {
@@ -135,10 +145,11 @@ impl AggregationSegmentCollector {
    pub fn from_agg_req_and_reader(
        agg: &Aggregations,
        reader: &SegmentReader,
+        segment_ordinal: SegmentOrdinal,
        limits: &AggregationLimits,
    ) -> crate::Result<Self> {
        let mut aggs_with_accessor =
-            get_aggs_with_segment_accessor_and_validate(agg, reader, limits)?;
+            get_aggs_with_segment_accessor_and_validate(agg, reader, segment_ordinal, limits)?;
        let result =
            BufAggregationCollector::new(build_segment_agg_collector(&mut aggs_with_accessor)?);
        Ok(AggregationSegmentCollector {
--- a/src/aggregation/intermediate_agg_result.rs
+++ b/src/aggregation/intermediate_agg_result.rs
@@ -19,7 +19,7 @@ use super::bucket::{
 };
 use super::metric::{
    IntermediateAverage, IntermediateCount, IntermediateMax, IntermediateMin, IntermediateStats,
-    IntermediateSum, PercentilesCollector,
+    IntermediateSum, PercentilesCollector, TopHitsCollector,
 };
 use super::segment_agg_result::AggregationLimits;
 use super::{format_date, AggregationError, Key, SerializedKey};
@@ -41,6 +41,8 @@ pub struct IntermediateAggregationResults {
 /// This might seem redundant with `Key`, but the point is to have a different
 /// Serialize implementation.
 pub enum IntermediateKey {
+    /// Bool key
+    Bool(bool),
    /// String key
    Str(String),
    /// `f64` key
@@ -59,6 +61,7 @@ impl From<IntermediateKey> for Key {
        match value {
            IntermediateKey::Str(s) => Self::Str(s),
            IntermediateKey::F64(f) => Self::F64(f),
+            IntermediateKey::Bool(f) => Self::F64(f as u64 as f64),
        }
    }
 }
@@ -71,6 +74,7 @@ impl std::hash::Hash for IntermediateKey {
        match self {
            IntermediateKey::Str(text) => text.hash(state),
            IntermediateKey::F64(val) => val.to_bits().hash(state),
+            IntermediateKey::Bool(val) => val.hash(state),
        }
    }
 }
@@ -166,9 +170,9 @@ impl IntermediateAggregationResults {
 pub(crate) fn empty_from_req(req: &Aggregation) -> IntermediateAggregationResult {
    use AggregationVariants::*;
    match req.agg {
-        Terms(_) => IntermediateAggregationResult::Bucket(IntermediateBucketResult::Terms(
-            Default::default(),
-        )),
+        Terms(_) => IntermediateAggregationResult::Bucket(IntermediateBucketResult::Terms {
+            buckets: Default::default(),
+        }),
        Range(_) => IntermediateAggregationResult::Bucket(IntermediateBucketResult::Range(
            Default::default(),
        )),
@@ -205,6 +209,9 @@ pub(crate) fn empty_from_req(req: &Aggregation) -> IntermediateAggregationResult
        Percentiles(_) => IntermediateAggregationResult::Metric(
            IntermediateMetricResult::Percentiles(PercentilesCollector::default()),
        ),
+        TopHits(_) => IntermediateAggregationResult::Metric(IntermediateMetricResult::TopHits(
+            TopHitsCollector::default(),
+        )),
    }
 }

@@ -265,6 +272,8 @@ pub enum IntermediateMetricResult {
    Stats(IntermediateStats),
    /// Intermediate sum result.
    Sum(IntermediateSum),
+    /// Intermediate top_hits result
+    TopHits(TopHitsCollector),
 }

 impl IntermediateMetricResult {
@@ -292,9 +301,13 @@ impl IntermediateMetricResult {
                percentiles
                    .into_final_result(req.agg.as_percentile().expect("unexpected metric type")),
            ),
+            IntermediateMetricResult::TopHits(top_hits) => {
+                MetricResult::TopHits(top_hits.finalize())
+            }
        }
    }

+    // TODO: this is our top-of-the-chain fruit merge mech
    fn merge_fruits(&mut self, other: IntermediateMetricResult) -> crate::Result<()> {
        match (self, other) {
            (
@@ -330,6 +343,9 @@ impl IntermediateMetricResult {
            ) => {
                left.merge_fruits(right)?;
            }
+            (IntermediateMetricResult::TopHits(left), IntermediateMetricResult::TopHits(right)) => {
+                left.merge_fruits(right)?;
+            }
            _ => {
                panic!("incompatible fruit types in tree or missing merge_fruits handler");
            }
@@ -351,11 +367,14 @@ pub enum IntermediateBucketResult {
    Histogram {
        /// The column_type of the underlying `Column` is DateTime
        is_date_agg: bool,
-        /// The buckets
+        /// The histogram buckets
        buckets: Vec<IntermediateHistogramBucketEntry>,
    },
    /// Term aggregation
-    Terms(IntermediateTermBucketResult),
+    Terms {
+        /// The term buckets
+        buckets: IntermediateTermBucketResult,
+    },
 }

 impl IntermediateBucketResult {
@@ -432,7 +451,7 @@ impl IntermediateBucketResult {
                };
                Ok(BucketResult::Histogram { buckets })
            }
-            IntermediateBucketResult::Terms(terms) => terms.into_final_result(
+            IntermediateBucketResult::Terms { buckets: terms } => terms.into_final_result(
                req.agg
                    .as_term()
                    .expect("unexpected aggregation, expected term aggregation"),
@@ -445,8 +464,12 @@ impl IntermediateBucketResult {
    fn merge_fruits(&mut self, other: IntermediateBucketResult) -> crate::Result<()> {
        match (self, other) {
            (
-                IntermediateBucketResult::Terms(term_res_left),
-                IntermediateBucketResult::Terms(term_res_right),
+                IntermediateBucketResult::Terms {
+                    buckets: term_res_left,
+                },
+                IntermediateBucketResult::Terms {
+                    buckets: term_res_right,
+                },
            ) => {
                merge_maps(&mut term_res_left.entries, term_res_right.entries)?;
                term_res_left.sum_other_doc_count += term_res_right.sum_other_doc_count;
@@ -530,8 +553,15 @@ impl IntermediateTermBucketResult {
            .into_iter()
            .filter(|bucket| bucket.1.doc_count as u64 >= req.min_doc_count)
            .map(|(key, entry)| {
+                let key_as_string = match key {
+                    IntermediateKey::Bool(key) => {
+                        let val = if key { "true" } else { "false" };
+                        Some(val.to_string())
+                    }
+                    _ => None,
+                };
                Ok(BucketEntry {
-                    key_as_string: None,
+                    key_as_string,
                    key: key.into(),
                    doc_count: entry.doc_count as u64,
                    sub_aggregation: entry
--- a/src/aggregation/metric/average.rs
+++ b/src/aggregation/metric/average.rs
@@ -2,7 +2,8 @@ use std::fmt::Debug;

 use serde::{Deserialize, Serialize};

-use super::{IntermediateStats, SegmentStatsCollector};
+use super::*;
+use crate::aggregation::*;

 /// A single-value metric aggregation that computes the average of numeric values that are
 /// extracted from the aggregated documents.
@@ -24,7 +25,7 @@ pub struct AverageAggregation {
    /// By default they will be ignored but it is also possible to treat them as if they had a
    /// value. Examples in JSON format:
    /// { "field": "my_numbers", "missing": "10.0" }
-    #[serde(default)]
+    #[serde(default, deserialize_with = "deserialize_option_f64")]
    pub missing: Option<f64>,
 }

@@ -65,3 +66,71 @@ impl IntermediateAverage {
        self.stats.finalize().avg
    }
 }
+
+#[cfg(test)]
+mod tests {
+    use super::*;
+
+    #[test]
+    fn deserialization_with_missing_test1() {
+        let json = r#"{
+            "field": "score",
+            "missing": "10.0"
+        }"#;
+        let avg: AverageAggregation = serde_json::from_str(json).unwrap();
+        assert_eq!(avg.field, "score");
+        assert_eq!(avg.missing, Some(10.0));
+        // no dot
+        let json = r#"{
+            "field": "score",
+            "missing": "10"
+        }"#;
+        let avg: AverageAggregation = serde_json::from_str(json).unwrap();
+        assert_eq!(avg.field, "score");
+        assert_eq!(avg.missing, Some(10.0));
+
+        // from value
+        let avg: AverageAggregation = serde_json::from_value(json!({
+            "field": "score_f64",
+            "missing": 10u64,
+        }))
+        .unwrap();
+        assert_eq!(avg.missing, Some(10.0));
+        // from value
+        let avg: AverageAggregation = serde_json::from_value(json!({
+            "field": "score_f64",
+            "missing": 10u32,
+        }))
+        .unwrap();
+        assert_eq!(avg.missing, Some(10.0));
+        let avg: AverageAggregation = serde_json::from_value(json!({
+            "field": "score_f64",
+            "missing": 10i8,
+        }))
+        .unwrap();
+        assert_eq!(avg.missing, Some(10.0));
+    }
+
+    #[test]
+    fn deserialization_with_missing_test_fail() {
+        let json = r#"{
+            "field": "score",
+            "missing": "a"
+        }"#;
+        let avg: Result<AverageAggregation, _> = serde_json::from_str(json);
+        assert!(avg.is_err());
+        assert!(avg
+            .unwrap_err()
+            .to_string()
+            .contains("Failed to parse f64 from string: \"a\""));
+
+        // Disallow NaN
+        let json = r#"{
+            "field": "score",
+            "missing": "NaN"
+        }"#;
+        let avg: Result<AverageAggregation, _> = serde_json::from_str(json);
+        assert!(avg.is_err());
+        assert!(avg.unwrap_err().to_string().contains("NaN"));
+    }
+}
--- a/src/aggregation/metric/count.rs
+++ b/src/aggregation/metric/count.rs
@@ -2,7 +2,8 @@ use std::fmt::Debug;

 use serde::{Deserialize, Serialize};

-use super::{IntermediateStats, SegmentStatsCollector};
+use super::*;
+use crate::aggregation::*;

 /// A single-value metric aggregation that counts the number of values that are
 /// extracted from the aggregated documents.
@@ -24,7 +25,7 @@ pub struct CountAggregation {
    /// By default they will be ignored but it is also possible to treat them as if they had a
    /// value. Examples in JSON format:
    /// { "field": "my_numbers", "missing": "10.0" }
-    #[serde(default)]
+    #[serde(default, deserialize_with = "deserialize_option_f64")]
    pub missing: Option<f64>,
 }

--- a/src/aggregation/metric/max.rs
+++ b/src/aggregation/metric/max.rs
@@ -2,7 +2,8 @@ use std::fmt::Debug;

 use serde::{Deserialize, Serialize};

-use super::{IntermediateStats, SegmentStatsCollector};
+use super::*;
+use crate::aggregation::*;

 /// A single-value metric aggregation that computes the maximum of numeric values that are
 /// extracted from the aggregated documents.
@@ -24,7 +25,7 @@ pub struct MaxAggregation {
    /// By default they will be ignored but it is also possible to treat them as if they had a
    /// value. Examples in JSON format:
    /// { "field": "my_numbers", "missing": "10.0" }
-    #[serde(default)]
+    #[serde(default, deserialize_with = "deserialize_option_f64")]
    pub missing: Option<f64>,
 }

--- a/src/aggregation/metric/min.rs
+++ b/src/aggregation/metric/min.rs
@@ -2,7 +2,8 @@ use std::fmt::Debug;

 use serde::{Deserialize, Serialize};

-use super::{IntermediateStats, SegmentStatsCollector};
+use super::*;
+use crate::aggregation::*;

 /// A single-value metric aggregation that computes the minimum of numeric values that are
 /// extracted from the aggregated documents.
@@ -24,7 +25,7 @@ pub struct MinAggregation {
    /// By default they will be ignored but it is also possible to treat them as if they had a
    /// value. Examples in JSON format:
    /// { "field": "my_numbers", "missing": "10.0" }
-    #[serde(default)]
+    #[serde(default, deserialize_with = "deserialize_option_f64")]
    pub missing: Option<f64>,
 }

--- a/src/aggregation/metric/mod.rs
+++ b/src/aggregation/metric/mod.rs
@@ -23,6 +23,8 @@ mod min;
 mod percentiles;
 mod stats;
 mod sum;
+mod top_hits;
+
 pub use average::*;
 pub use count::*;
 pub use max::*;
@@ -32,6 +34,7 @@ use rustc_hash::FxHashMap;
 use serde::{Deserialize, Serialize};
 pub use stats::*;
 pub use sum::*;
+pub use top_hits::*;

 /// Single-metric aggregations use this common result structure.
 ///
@@ -81,6 +84,27 @@ pub struct PercentilesMetricResult {
    pub values: PercentileValues,
 }

+/// The top_hits metric results entry
+#[derive(Clone, Debug, PartialEq, Serialize, Deserialize)]
+pub struct TopHitsVecEntry {
+    /// The sort values of the document, depending on the sort criteria in the request.
+    pub sort: Vec<Option<u64>>,
+
+    /// Search results, for queries that include field retrieval requests
+    /// (`docvalue_fields`).
+    #[serde(flatten)]
+    pub search_results: FieldRetrivalResult,
+}
+
+/// The top_hits metric aggregation results a list of top hits by sort criteria.
+///
+/// The main reason for wrapping it in `hits` is to match elasticsearch output structure.
+#[derive(Clone, Debug, PartialEq, Serialize, Deserialize)]
+pub struct TopHitsMetricResult {
+    /// The result of the top_hits metric.
+    pub hits: Vec<TopHitsVecEntry>,
+}
+
 #[cfg(test)]
 mod tests {
    use crate::aggregation::agg_req::Aggregations;
--- a/src/aggregation/metric/percentiles.rs
+++ b/src/aggregation/metric/percentiles.rs
@@ -11,7 +11,7 @@ use crate::aggregation::intermediate_agg_result::{
    IntermediateAggregationResult, IntermediateAggregationResults, IntermediateMetricResult,
 };
 use crate::aggregation::segment_agg_result::SegmentAggregationCollector;
-use crate::aggregation::{f64_from_fastfield_u64, f64_to_fastfield_u64, AggregationError};
+use crate::aggregation::*;
 use crate::{DocId, TantivyError};

 /// # Percentiles
@@ -84,7 +84,11 @@ pub struct PercentilesAggregationReq {
    /// By default they will be ignored but it is also possible to treat them as if they had a
    /// value. Examples in JSON format:
    /// { "field": "my_numbers", "missing": "10.0" }
-    #[serde(skip_serializing_if = "Option::is_none", default)]
+    #[serde(
+        skip_serializing_if = "Option::is_none",
+        default,
+        deserialize_with = "deserialize_option_f64"
+    )]
    pub missing: Option<f64>,
 }
 fn default_percentiles() -> &'static [f64] {
@@ -133,7 +137,6 @@ pub(crate) struct SegmentPercentilesCollector {
    field_type: ColumnType,
    pub(crate) percentiles: PercentilesCollector,
    pub(crate) accessor_idx: usize,
-    val_cache: Vec<u64>,
    missing: Option<u64>,
 }

@@ -243,7 +246,6 @@ impl SegmentPercentilesCollector {
            field_type,
            percentiles: PercentilesCollector::new(),
            accessor_idx,
-            val_cache: Default::default(),
            missing,
        })
    }
--- a/src/aggregation/metric/stats.rs
+++ b/src/aggregation/metric/stats.rs
@@ -9,7 +9,7 @@ use crate::aggregation::intermediate_agg_result::{
    IntermediateAggregationResult, IntermediateAggregationResults, IntermediateMetricResult,
 };
 use crate::aggregation::segment_agg_result::SegmentAggregationCollector;
-use crate::aggregation::{f64_from_fastfield_u64, f64_to_fastfield_u64};
+use crate::aggregation::*;
 use crate::{DocId, TantivyError};

 /// A multi-value metric aggregation that computes a collection of statistics on numeric values that
@@ -33,7 +33,7 @@ pub struct StatsAggregation {
    /// By default they will be ignored but it is also possible to treat them as if they had a
    /// value. Examples in JSON format:
    /// { "field": "my_numbers", "missing": "10.0" }
-    #[serde(default)]
+    #[serde(default, deserialize_with = "deserialize_option_f64")]
    pub missing: Option<f64>,
 }

@@ -580,6 +580,30 @@ mod tests {
            })
        );

+        // From string
+        let agg_req: Aggregations = serde_json::from_value(json!({
+            "my_stats": {
+                "stats": {
+                    "field": "json.partially_empty",
+                    "missing": "0.0"
+                },
+            }
+        }))
+        .unwrap();
+
+        let res = exec_request_with_query(agg_req, &index, None)?;
+
+        assert_eq!(
+            res["my_stats"],
+            json!({
+                "avg":  2.5,
+                "count": 4,
+                "max": 10.0,
+                "min": 0.0,
+                "sum": 10.0
+            })
+        );
+
        Ok(())
    }

--- a/src/aggregation/metric/sum.rs
+++ b/src/aggregation/metric/sum.rs
@@ -2,7 +2,8 @@ use std::fmt::Debug;

 use serde::{Deserialize, Serialize};

-use super::{IntermediateStats, SegmentStatsCollector};
+use super::*;
+use crate::aggregation::*;

 /// A single-value metric aggregation that sums up numeric values that are
 /// extracted from the aggregated documents.
@@ -24,7 +25,7 @@ pub struct SumAggregation {
    /// By default they will be ignored but it is also possible to treat them as if they had a
    /// value. Examples in JSON format:
    /// { "field": "my_numbers", "missing": "10.0" }
-    #[serde(default)]
+    #[serde(default, deserialize_with = "deserialize_option_f64")]
    pub missing: Option<f64>,
 }

--- a/src/aggregation/metric/top_hits.rs
+++ b/src/aggregation/metric/top_hits.rs
@@ -0,0 +1,837 @@
+use std::collections::HashMap;
+use std::fmt::Formatter;
+
+use columnar::{ColumnarReader, DynamicColumn};
+use regex::Regex;
+use serde::ser::SerializeMap;
+use serde::{Deserialize, Deserializer, Serialize, Serializer};
+
+use super::{TopHitsMetricResult, TopHitsVecEntry};
+use crate::aggregation::bucket::Order;
+use crate::aggregation::intermediate_agg_result::{
+    IntermediateAggregationResult, IntermediateMetricResult,
+};
+use crate::aggregation::segment_agg_result::SegmentAggregationCollector;
+use crate::collector::TopNComputer;
+use crate::schema::term::JSON_PATH_SEGMENT_SEP_STR;
+use crate::schema::OwnedValue;
+use crate::{DocAddress, DocId, SegmentOrdinal};
+
+/// # Top Hits
+///
+/// The top hits aggregation is a useful tool to answer questions like:
+/// - "What are the most recent posts by each author?"
+/// - "What are the most popular items in each category?"
+///
+/// It does so by keeping track of the most relevant document being aggregated,
+/// in terms of a sort criterion that can consist of multiple fields and their
+/// sort-orders (ascending or descending).
+///
+/// `top_hits` should not be used as a top-level aggregation. It is intended to be
+/// used as a sub-aggregation, inside a `terms` aggregation or a `filters` aggregation,
+/// for example.
+///
+/// Note that this aggregator does not return the actual document addresses, but
+/// rather a list of the values of the fields that were requested to be retrieved.
+/// These values can be specified in the `docvalue_fields` parameter, which can include
+/// a list of fast fields to be retrieved. At the moment, only fast fields are supported
+/// but it is possible that we support the `fields` parameter to retrieve any stored
+/// field in the future.
+///
+/// The following example demonstrates a request for the top_hits aggregation:
+/// ```JSON
+/// {
+///     "aggs": {
+///         "top_authors": {
+///             "terms": {
+///                 "field": "author",
+///                 "size": 5
+///             }
+///         },
+///         "aggs": {
+///             "top_hits": {
+///                 "size": 2,
+///                 "from": 0
+///                 "sort": [
+///                     { "date": "desc" }
+///                 ]
+///                 "docvalue_fields": ["date", "title", "iden"]
+///             }
+///         }
+/// }
+/// ```
+///
+/// This request will return an object containing the top two documents, sorted
+/// by the `date` field in descending order. You can also sort by multiple fields, which
+/// helps to resolve ties. The aggregation object for each bucket will look like:
+/// ```JSON
+/// {
+///     "hits": [
+///         {
+///           "score": [<time_u64>],
+///           "docvalue_fields": {
+///             "date": "<date_RFC3339>",
+///             "title": "<title>",
+///             "iden": "<iden>"
+///           }
+///         },
+///         {
+///           "score": [<time_u64>]
+///           "docvalue_fields": {
+///             "date": "<date_RFC3339>",
+///             "title": "<title>",
+///             "iden": "<iden>"
+///           }
+///         }
+///     ]
+/// }
+/// ```
+#[derive(Debug, Clone, PartialEq, Serialize, Deserialize, Default)]
+pub struct TopHitsAggregation {
+    sort: Vec<KeyOrder>,
+    size: usize,
+    from: Option<usize>,
+
+    #[serde(flatten)]
+    retrieval: RetrievalFields,
+}
+
+const fn default_doc_value_fields() -> Vec<String> {
+    Vec::new()
+}
+
+/// Search query spec for each matched document
+/// TODO: move this to a common module
+#[derive(Debug, Clone, PartialEq, Serialize, Deserialize, Default)]
+pub struct RetrievalFields {
+    /// The fast fields to return for each hit.
+    /// This is the only variant supported for now.
+    /// TODO: support the {field, format} variant for custom formatting.
+    #[serde(rename = "docvalue_fields")]
+    #[serde(default = "default_doc_value_fields")]
+    pub doc_value_fields: Vec<String>,
+}
+
+/// Search query result for each matched document
+/// TODO: move this to a common module
+#[derive(Debug, Clone, PartialEq, Serialize, Deserialize, Default)]
+pub struct FieldRetrivalResult {
+    /// The fast fields returned for each hit.
+    #[serde(rename = "docvalue_fields")]
+    #[serde(skip_serializing_if = "HashMap::is_empty")]
+    pub doc_value_fields: HashMap<String, OwnedValue>,
+}
+
+impl RetrievalFields {
+    fn get_field_names(&self) -> Vec<&str> {
+        self.doc_value_fields.iter().map(|s| s.as_str()).collect()
+    }
+
+    fn resolve_field_names(&mut self, reader: &ColumnarReader) -> crate::Result<()> {
+        // Tranform a glob (`pattern*`, for example) into a regex::Regex (`^pattern.*$`)
+        let globbed_string_to_regex = |glob: &str| {
+            // Replace `*` glob with `.*` regex
+            let sanitized = format!("^{}$", regex::escape(glob).replace(r"\*", ".*"));
+            Regex::new(&sanitized.replace('*', ".*")).map_err(|e| {
+                crate::TantivyError::SchemaError(format!(
+                    "Invalid regex '{}' in docvalue_fields: {}",
+                    glob, e
+                ))
+            })
+        };
+        self.doc_value_fields = self
+            .doc_value_fields
+            .iter()
+            .map(|field| {
+                if !field.contains('*')
+                    && reader
+                        .iter_columns()?
+                        .any(|(name, _)| name.as_str() == field)
+                {
+                    return Ok(vec![field.to_owned()]);
+                }
+
+                let pattern = globbed_string_to_regex(field)?;
+                let fields = reader
+                    .iter_columns()?
+                    .map(|(name, _)| {
+                        // normalize path from internal fast field repr
+                        name.replace(JSON_PATH_SEGMENT_SEP_STR, ".")
+                    })
+                    .filter(|name| pattern.is_match(name))
+                    .collect::<Vec<_>>();
+                assert!(
+                    !fields.is_empty(),
+                    "No fields matched the glob '{}' in docvalue_fields",
+                    field
+                );
+                Ok(fields)
+            })
+            .collect::<crate::Result<Vec<_>>>()?
+            .into_iter()
+            .flatten()
+            .collect();
+
+        Ok(())
+    }
+
+    fn get_document_field_data(
+        &self,
+        accessors: &HashMap<String, Vec<DynamicColumn>>,
+        doc_id: DocId,
+    ) -> FieldRetrivalResult {
+        let dvf = self
+            .doc_value_fields
+            .iter()
+            .map(|field| {
+                let accessors = accessors
+                    .get(field)
+                    .unwrap_or_else(|| panic!("field '{}' not found in accessors", field));
+
+                let values: Vec<OwnedValue> = accessors
+                    .iter()
+                    .flat_map(|accessor| match accessor {
+                        DynamicColumn::U64(accessor) => accessor
+                            .values_for_doc(doc_id)
+                            .map(OwnedValue::U64)
+                            .collect::<Vec<_>>(),
+                        DynamicColumn::I64(accessor) => accessor
+                            .values_for_doc(doc_id)
+                            .map(OwnedValue::I64)
+                            .collect::<Vec<_>>(),
+                        DynamicColumn::F64(accessor) => accessor
+                            .values_for_doc(doc_id)
+                            .map(OwnedValue::F64)
+                            .collect::<Vec<_>>(),
+                        DynamicColumn::Bytes(accessor) => accessor
+                            .term_ords(doc_id)
+                            .map(|term_ord| {
+                                let mut buffer = vec![];
+                                assert!(
+                                    accessor
+                                        .ord_to_bytes(term_ord, &mut buffer)
+                                        .expect("could not read term dictionary"),
+                                    "term corresponding to term_ord does not exist"
+                                );
+                                OwnedValue::Bytes(buffer)
+                            })
+                            .collect::<Vec<_>>(),
+                        DynamicColumn::Str(accessor) => accessor
+                            .term_ords(doc_id)
+                            .map(|term_ord| {
+                                let mut buffer = vec![];
+                                assert!(
+                                    accessor
+                                        .ord_to_bytes(term_ord, &mut buffer)
+                                        .expect("could not read term dictionary"),
+                                    "term corresponding to term_ord does not exist"
+                                );
+                                OwnedValue::Str(String::from_utf8(buffer).unwrap())
+                            })
+                            .collect::<Vec<_>>(),
+                        DynamicColumn::Bool(accessor) => accessor
+                            .values_for_doc(doc_id)
+                            .map(OwnedValue::Bool)
+                            .collect::<Vec<_>>(),
+                        DynamicColumn::IpAddr(accessor) => accessor
+                            .values_for_doc(doc_id)
+                            .map(OwnedValue::IpAddr)
+                            .collect::<Vec<_>>(),
+                        DynamicColumn::DateTime(accessor) => accessor
+                            .values_for_doc(doc_id)
+                            .map(OwnedValue::Date)
+                            .collect::<Vec<_>>(),
+                    })
+                    .collect();
+
+                (field.to_owned(), OwnedValue::Array(values))
+            })
+            .collect();
+        FieldRetrivalResult {
+            doc_value_fields: dvf,
+        }
+    }
+}
+
+#[derive(Debug, Clone, PartialEq, Default)]
+struct KeyOrder {
+    field: String,
+    order: Order,
+}
+
+impl Serialize for KeyOrder {
+    fn serialize<S: Serializer>(&self, serializer: S) -> Result<S::Ok, S::Error> {
+        let KeyOrder { field, order } = self;
+        let mut map = serializer.serialize_map(Some(1))?;
+        map.serialize_entry(field, order)?;
+        map.end()
+    }
+}
+
+impl<'de> Deserialize<'de> for KeyOrder {
+    fn deserialize<D>(deserializer: D) -> Result<Self, D::Error>
+    where D: Deserializer<'de> {
+        let mut k_o = <HashMap<String, Order>>::deserialize(deserializer)?.into_iter();
+        let (k, v) = k_o.next().ok_or(serde::de::Error::custom(
+            "Expected exactly one key-value pair in KeyOrder, found none",
+        ))?;
+        if k_o.next().is_some() {
+            return Err(serde::de::Error::custom(
+                "Expected exactly one key-value pair in KeyOrder, found more",
+            ));
+        }
+        Ok(Self { field: k, order: v })
+    }
+}
+
+impl TopHitsAggregation {
+    /// Validate and resolve field retrieval parameters
+    pub fn validate_and_resolve(&mut self, reader: &ColumnarReader) -> crate::Result<()> {
+        self.retrieval.resolve_field_names(reader)
+    }
+
+    /// Return fields accessed by the aggregator, in order.
+    pub fn field_names(&self) -> Vec<&str> {
+        self.sort
+            .iter()
+            .map(|KeyOrder { field, .. }| field.as_str())
+            .collect()
+    }
+
+    /// Return fields accessed by the aggregator's value retrieval.
+    pub fn value_field_names(&self) -> Vec<&str> {
+        self.retrieval.get_field_names()
+    }
+}
+
+/// Holds a single comparable doc feature, and the order in which it should be sorted.
+#[derive(Clone, Serialize, Deserialize, Debug)]
+struct ComparableDocFeature {
+    /// Stores any u64-mappable feature.
+    value: Option<u64>,
+    /// Sort order for the doc feature
+    order: Order,
+}
+
+impl Ord for ComparableDocFeature {
+    fn cmp(&self, other: &Self) -> std::cmp::Ordering {
+        let invert = |cmp: std::cmp::Ordering| match self.order {
+            Order::Asc => cmp,
+            Order::Desc => cmp.reverse(),
+        };
+
+        match (self.value, other.value) {
+            (Some(self_value), Some(other_value)) => invert(self_value.cmp(&other_value)),
+            (Some(_), None) => std::cmp::Ordering::Greater,
+            (None, Some(_)) => std::cmp::Ordering::Less,
+            (None, None) => std::cmp::Ordering::Equal,
+        }
+    }
+}
+
+impl PartialOrd for ComparableDocFeature {
+    fn partial_cmp(&self, other: &Self) -> Option<std::cmp::Ordering> {
+        Some(self.cmp(other))
+    }
+}
+
+impl PartialEq for ComparableDocFeature {
+    fn eq(&self, other: &Self) -> bool {
+        self.value.cmp(&other.value) == std::cmp::Ordering::Equal
+    }
+}
+
+impl Eq for ComparableDocFeature {}
+
+#[derive(Clone, Serialize, Deserialize, Debug)]
+struct ComparableDocFeatures(Vec<ComparableDocFeature>, FieldRetrivalResult);
+
+impl Ord for ComparableDocFeatures {
+    fn cmp(&self, other: &Self) -> std::cmp::Ordering {
+        for (self_feature, other_feature) in self.0.iter().zip(other.0.iter()) {
+            let cmp = self_feature.cmp(other_feature);
+            if cmp != std::cmp::Ordering::Equal {
+                return cmp;
+            }
+        }
+        std::cmp::Ordering::Equal
+    }
+}
+
+impl PartialOrd for ComparableDocFeatures {
+    fn partial_cmp(&self, other: &Self) -> Option<std::cmp::Ordering> {
+        Some(self.cmp(other))
+    }
+}
+
+impl PartialEq for ComparableDocFeatures {
+    fn eq(&self, other: &Self) -> bool {
+        self.cmp(other) == std::cmp::Ordering::Equal
+    }
+}
+
+impl Eq for ComparableDocFeatures {}
+
+/// The TopHitsCollector used for collecting over segments and merging results.
+#[derive(Clone, Serialize, Deserialize)]
+pub struct TopHitsCollector {
+    req: TopHitsAggregation,
+    top_n: TopNComputer<ComparableDocFeatures, DocAddress, false>,
+}
+
+impl Default for TopHitsCollector {
+    fn default() -> Self {
+        Self {
+            req: TopHitsAggregation::default(),
+            top_n: TopNComputer::new(1),
+        }
+    }
+}
+
+impl std::fmt::Debug for TopHitsCollector {
+    fn fmt(&self, f: &mut Formatter<'_>) -> std::fmt::Result {
+        f.debug_struct("TopHitsCollector")
+            .field("req", &self.req)
+            .field("top_n_threshold", &self.top_n.threshold)
+            .finish()
+    }
+}
+
+impl std::cmp::PartialEq for TopHitsCollector {
+    fn eq(&self, _other: &Self) -> bool {
+        false
+    }
+}
+
+impl TopHitsCollector {
+    fn collect(&mut self, features: ComparableDocFeatures, doc: DocAddress) {
+        self.top_n.push(features, doc);
+    }
+
+    pub(crate) fn merge_fruits(&mut self, other_fruit: Self) -> crate::Result<()> {
+        for doc in other_fruit.top_n.into_vec() {
+            self.collect(doc.feature, doc.doc);
+        }
+        Ok(())
+    }
+
+    /// Finalize by converting self into the final result form
+    pub fn finalize(self) -> TopHitsMetricResult {
+        let mut hits: Vec<TopHitsVecEntry> = self
+            .top_n
+            .into_sorted_vec()
+            .into_iter()
+            .map(|doc| TopHitsVecEntry {
+                sort: doc.feature.0.iter().map(|f| f.value).collect(),
+                search_results: doc.feature.1,
+            })
+            .collect();
+
+        // Remove the first `from` elements
+        // Truncating from end would be more efficient, but we need to truncate from the front
+        // because `into_sorted_vec` gives us a descending order because of the inverted
+        // `Ord` semantics of the heap elements.
+        hits.drain(..self.req.from.unwrap_or(0));
+        TopHitsMetricResult { hits }
+    }
+}
+
+#[derive(Clone)]
+pub(crate) struct SegmentTopHitsCollector {
+    segment_ordinal: SegmentOrdinal,
+    accessor_idx: usize,
+    inner_collector: TopHitsCollector,
+}
+
+impl SegmentTopHitsCollector {
+    pub fn from_req(
+        req: &TopHitsAggregation,
+        accessor_idx: usize,
+        segment_ordinal: SegmentOrdinal,
+    ) -> Self {
+        Self {
+            inner_collector: TopHitsCollector {
+                req: req.clone(),
+                top_n: TopNComputer::new(req.size + req.from.unwrap_or(0)),
+            },
+            segment_ordinal,
+            accessor_idx,
+        }
+    }
+}
+
+impl std::fmt::Debug for SegmentTopHitsCollector {
+    fn fmt(&self, f: &mut Formatter<'_>) -> std::fmt::Result {
+        f.debug_struct("SegmentTopHitsCollector")
+            .field("segment_id", &self.segment_ordinal)
+            .field("accessor_idx", &self.accessor_idx)
+            .field("inner_collector", &self.inner_collector)
+            .finish()
+    }
+}
+
+impl SegmentAggregationCollector for SegmentTopHitsCollector {
+    fn add_intermediate_aggregation_result(
+        self: Box<Self>,
+        agg_with_accessor: &crate::aggregation::agg_req_with_accessor::AggregationsWithAccessor,
+        results: &mut crate::aggregation::intermediate_agg_result::IntermediateAggregationResults,
+    ) -> crate::Result<()> {
+        let name = agg_with_accessor.aggs.keys[self.accessor_idx].to_string();
+        let intermediate_result = IntermediateMetricResult::TopHits(self.inner_collector);
+        results.push(
+            name,
+            IntermediateAggregationResult::Metric(intermediate_result),
+        )
+    }
+
+    fn collect(
+        &mut self,
+        doc_id: crate::DocId,
+        agg_with_accessor: &mut crate::aggregation::agg_req_with_accessor::AggregationsWithAccessor,
+    ) -> crate::Result<()> {
+        let accessors = &agg_with_accessor.aggs.values[self.accessor_idx].accessors;
+        let value_accessors = &agg_with_accessor.aggs.values[self.accessor_idx].value_accessors;
+        let features: Vec<ComparableDocFeature> = self
+            .inner_collector
+            .req
+            .sort
+            .iter()
+            .enumerate()
+            .map(|(idx, KeyOrder { order, .. })| {
+                let order = *order;
+                let value = accessors
+                    .get(idx)
+                    .expect("could not find field in accessors")
+                    .0
+                    .values_for_doc(doc_id)
+                    .next();
+                ComparableDocFeature { value, order }
+            })
+            .collect();
+
+        let retrieval_result = self
+            .inner_collector
+            .req
+            .retrieval
+            .get_document_field_data(value_accessors, doc_id);
+
+        self.inner_collector.collect(
+            ComparableDocFeatures(features, retrieval_result),
+            DocAddress {
+                segment_ord: self.segment_ordinal,
+                doc_id,
+            },
+        );
+        Ok(())
+    }
+
+    fn collect_block(
+        &mut self,
+        docs: &[crate::DocId],
+        agg_with_accessor: &mut crate::aggregation::agg_req_with_accessor::AggregationsWithAccessor,
+    ) -> crate::Result<()> {
+        // TODO: Consider getting fields with the column block accessor and refactor this.
+        // ---
+        // Would the additional complexity of getting fields with the column_block_accessor
+        // make sense here? Probably yes, but I want to get a first-pass review first
+        // before proceeding.
+        for doc in docs {
+            self.collect(*doc, agg_with_accessor)?;
+        }
+        Ok(())
+    }
+}
+
+#[cfg(test)]
+mod tests {
+    use common::DateTime;
+    use pretty_assertions::assert_eq;
+    use serde_json::Value;
+    use time::macros::datetime;
+
+    use super::{ComparableDocFeature, ComparableDocFeatures, Order};
+    use crate::aggregation::agg_req::Aggregations;
+    use crate::aggregation::agg_result::AggregationResults;
+    use crate::aggregation::bucket::tests::get_test_index_from_docs;
+    use crate::aggregation::tests::get_test_index_from_values;
+    use crate::aggregation::AggregationCollector;
+    use crate::collector::ComparableDoc;
+    use crate::query::AllQuery;
+    use crate::schema::OwnedValue as SchemaValue;
+
+    fn invert_order(cmp_feature: ComparableDocFeature) -> ComparableDocFeature {
+        let ComparableDocFeature { value, order } = cmp_feature;
+        let order = match order {
+            Order::Asc => Order::Desc,
+            Order::Desc => Order::Asc,
+        };
+        ComparableDocFeature { value, order }
+    }
+
+    fn collector_with_capacity(capacity: usize) -> super::TopHitsCollector {
+        super::TopHitsCollector {
+            top_n: super::TopNComputer::new(capacity),
+            ..Default::default()
+        }
+    }
+
+    fn invert_order_features(cmp_features: ComparableDocFeatures) -> ComparableDocFeatures {
+        let ComparableDocFeatures(cmp_features, search_results) = cmp_features;
+        let cmp_features = cmp_features
+            .into_iter()
+            .map(invert_order)
+            .collect::<Vec<_>>();
+        ComparableDocFeatures(cmp_features, search_results)
+    }
+
+    #[test]
+    fn test_comparable_doc_feature() -> crate::Result<()> {
+        let small = ComparableDocFeature {
+            value: Some(1),
+            order: Order::Asc,
+        };
+        let big = ComparableDocFeature {
+            value: Some(2),
+            order: Order::Asc,
+        };
+        let none = ComparableDocFeature {
+            value: None,
+            order: Order::Asc,
+        };
+
+        assert!(small < big);
+        assert!(none < small);
+        assert!(none < big);
+
+        let small = invert_order(small);
+        let big = invert_order(big);
+        let none = invert_order(none);
+
+        assert!(small > big);
+        assert!(none < small);
+        assert!(none < big);
+
+        Ok(())
+    }
+
+    #[test]
+    fn test_comparable_doc_features() -> crate::Result<()> {
+        let features_1 = ComparableDocFeatures(
+            vec![ComparableDocFeature {
+                value: Some(1),
+                order: Order::Asc,
+            }],
+            Default::default(),
+        );
+
+        let features_2 = ComparableDocFeatures(
+            vec![ComparableDocFeature {
+                value: Some(2),
+                order: Order::Asc,
+            }],
+            Default::default(),
+        );
+
+        assert!(features_1 < features_2);
+
+        assert!(invert_order_features(features_1.clone()) > invert_order_features(features_2));
+
+        Ok(())
+    }
+
+    #[test]
+    fn test_aggregation_top_hits_empty_index() -> crate::Result<()> {
+        let values = vec![];
+
+        let index = get_test_index_from_values(false, &values)?;
+
+        let d: Aggregations = serde_json::from_value(json!({
+            "top_hits_req": {
+                "top_hits": {
+                    "size": 2,
+                    "sort": [
+                        { "date": "desc" }
+                    ],
+                    "from": 0,
+                }
+        }
+        }))
+        .unwrap();
+
+        let collector = AggregationCollector::from_aggs(d, Default::default());
+
+        let reader = index.reader()?;
+        let searcher = reader.searcher();
+        let agg_res: AggregationResults = searcher.search(&AllQuery, &collector).unwrap();
+
+        let res: Value = serde_json::from_str(
+            &serde_json::to_string(&agg_res).expect("JSON serialization failed"),
+        )
+        .expect("JSON parsing failed");
+
+        assert_eq!(
+            res,
+            json!({
+                "top_hits_req": {
+                    "hits": []
+                }
+            })
+        );
+
+        Ok(())
+    }
+
+    #[test]
+    fn test_top_hits_collector_single_feature() -> crate::Result<()> {
+        let docs = vec![
+            ComparableDoc::<_, _, false> {
+                doc: crate::DocAddress {
+                    segment_ord: 0,
+                    doc_id: 0,
+                },
+                feature: ComparableDocFeatures(
+                    vec![ComparableDocFeature {
+                        value: Some(1),
+                        order: Order::Asc,
+                    }],
+                    Default::default(),
+                ),
+            },
+            ComparableDoc {
+                doc: crate::DocAddress {
+                    segment_ord: 0,
+                    doc_id: 2,
+                },
+                feature: ComparableDocFeatures(
+                    vec![ComparableDocFeature {
+                        value: Some(3),
+                        order: Order::Asc,
+                    }],
+                    Default::default(),
+                ),
+            },
+            ComparableDoc {
+                doc: crate::DocAddress {
+                    segment_ord: 0,
+                    doc_id: 1,
+                },
+                feature: ComparableDocFeatures(
+                    vec![ComparableDocFeature {
+                        value: Some(5),
+                        order: Order::Asc,
+                    }],
+                    Default::default(),
+                ),
+            },
+        ];
+
+        let mut collector = collector_with_capacity(3);
+        for doc in docs.clone() {
+            collector.collect(doc.feature, doc.doc);
+        }
+
+        let res = collector.finalize();
+
+        assert_eq!(
+            res,
+            super::TopHitsMetricResult {
+                hits: vec![
+                    super::TopHitsVecEntry {
+                        sort: vec![docs[0].feature.0[0].value],
+                        search_results: Default::default(),
+                    },
+                    super::TopHitsVecEntry {
+                        sort: vec![docs[1].feature.0[0].value],
+                        search_results: Default::default(),
+                    },
+                    super::TopHitsVecEntry {
+                        sort: vec![docs[2].feature.0[0].value],
+                        search_results: Default::default(),
+                    },
+                ]
+            }
+        );
+
+        Ok(())
+    }
+
+    fn test_aggregation_top_hits(merge_segments: bool) -> crate::Result<()> {
+        let docs = vec![
+            vec![
+                r#"{ "date": "2015-01-02T00:00:00Z", "text": "bbb", "text2": "bbb", "mixed": { "dyn_arr": [1, "2"] } }"#,
+                r#"{ "date": "2017-06-15T00:00:00Z", "text": "ccc", "text2": "ddd", "mixed": { "dyn_arr": [3, "4"] } }"#,
+            ],
+            vec![
+                r#"{ "text": "aaa", "text2": "bbb", "date": "2018-01-02T00:00:00Z", "mixed": { "dyn_arr": ["9", 8] } }"#,
+                r#"{ "text": "aaa", "text2": "bbb", "date": "2016-01-02T00:00:00Z", "mixed": { "dyn_arr": ["7", 6] } }"#,
+            ],
+        ];
+
+        let index = get_test_index_from_docs(merge_segments, &docs)?;
+
+        let d: Aggregations = serde_json::from_value(json!({
+            "top_hits_req": {
+                "top_hits": {
+                    "size": 2,
+                    "sort": [
+                        { "date": "desc" }
+                    ],
+                    "from": 1,
+                    "docvalue_fields": [
+                        "date",
+                        "tex*",
+                        "mixed.*",
+                    ],
+                }
+        }
+        }))?;
+
+        let collector = AggregationCollector::from_aggs(d, Default::default());
+        let reader = index.reader()?;
+        let searcher = reader.searcher();
+
+        let agg_res =
+            serde_json::to_value(searcher.search(&AllQuery, &collector).unwrap()).unwrap();
+
+        let date_2017 = datetime!(2017-06-15 00:00:00 UTC);
+        let date_2016 = datetime!(2016-01-02 00:00:00 UTC);
+
+        assert_eq!(
+            agg_res["top_hits_req"],
+            json!({
+                "hits": [
+                    {
+                        "sort": [common::i64_to_u64(date_2017.unix_timestamp_nanos() as i64)],
+                        "docvalue_fields": {
+                            "date": [ SchemaValue::Date(DateTime::from_utc(date_2017)) ],
+                            "text": [ "ccc" ],
+                            "text2": [ "ddd" ],
+                            "mixed.dyn_arr": [ 3, "4" ],
+                        }
+                    },
+                    {
+                        "sort": [common::i64_to_u64(date_2016.unix_timestamp_nanos() as i64)],
+                        "docvalue_fields": {
+                            "date": [ SchemaValue::Date(DateTime::from_utc(date_2016)) ],
+                            "text": [ "aaa" ],
+                            "text2": [ "bbb" ],
+                            "mixed.dyn_arr": [ 6, "7" ],
+                        }
+                    }
+                ]
+            }),
+        );
+
+        Ok(())
+    }
+
+    #[test]
+    fn test_aggregation_top_hits_single_segment() -> crate::Result<()> {
+        test_aggregation_top_hits(true)
+    }
+
+    #[test]
+    fn test_aggregation_top_hits_multi_segment() -> crate::Result<()> {
+        test_aggregation_top_hits(false)
+    }
+}
--- a/src/aggregation/mod.rs
+++ b/src/aggregation/mod.rs
@@ -145,6 +145,8 @@ mod agg_tests;

 mod agg_bench;

+use core::fmt;
+
 pub use agg_limits::AggregationLimits;
 pub use collector::{
    AggregationCollector, AggregationSegmentCollector, DistributedAggregationCollector,
@@ -154,7 +156,106 @@ use columnar::{ColumnType, MonotonicallyMappableToU64};
 pub(crate) use date::format_date;
 pub use error::AggregationError;
 use itertools::Itertools;
-use serde::{Deserialize, Serialize};
+use serde::de::{self, Visitor};
+use serde::{Deserialize, Deserializer, Serialize};
+
+fn parse_str_into_f64<E: de::Error>(value: &str) -> Result<f64, E> {
+    let parsed = value.parse::<f64>().map_err(|_err| {
+        de::Error::custom(format!("Failed to parse f64 from string: {:?}", value))
+    })?;
+
+    // Check if the parsed value is NaN or infinity
+    if parsed.is_nan() || parsed.is_infinite() {
+        Err(de::Error::custom(format!(
+            "Value is not a valid f64 (NaN or Infinity): {:?}",
+            value
+        )))
+    } else {
+        Ok(parsed)
+    }
+}
+
+/// deserialize Option<f64> from string or float
+pub(crate) fn deserialize_option_f64<'de, D>(deserializer: D) -> Result<Option<f64>, D::Error>
+where D: Deserializer<'de> {
+    struct StringOrFloatVisitor;
+
+    impl<'de> Visitor<'de> for StringOrFloatVisitor {
+        type Value = Option<f64>;
+
+        fn expecting(&self, formatter: &mut fmt::Formatter) -> fmt::Result {
+            formatter.write_str("a string or a float")
+        }
+
+        fn visit_str<E>(self, value: &str) -> Result<Self::Value, E>
+        where E: de::Error {
+            parse_str_into_f64(value).map(Some)
+        }
+
+        fn visit_f64<E>(self, value: f64) -> Result<Self::Value, E>
+        where E: de::Error {
+            Ok(Some(value))
+        }
+
+        fn visit_i64<E>(self, value: i64) -> Result<Self::Value, E>
+        where E: de::Error {
+            Ok(Some(value as f64))
+        }
+
+        fn visit_u64<E>(self, value: u64) -> Result<Self::Value, E>
+        where E: de::Error {
+            Ok(Some(value as f64))
+        }
+
+        fn visit_none<E>(self) -> Result<Self::Value, E>
+        where E: de::Error {
+            Ok(None)
+        }
+
+        fn visit_unit<E>(self) -> Result<Self::Value, E>
+        where E: de::Error {
+            Ok(None)
+        }
+    }
+
+    deserializer.deserialize_any(StringOrFloatVisitor)
+}
+
+/// deserialize f64 from string or float
+pub(crate) fn deserialize_f64<'de, D>(deserializer: D) -> Result<f64, D::Error>
+where D: Deserializer<'de> {
+    struct StringOrFloatVisitor;
+
+    impl<'de> Visitor<'de> for StringOrFloatVisitor {
+        type Value = f64;
+
+        fn expecting(&self, formatter: &mut fmt::Formatter) -> fmt::Result {
+            formatter.write_str("a string or a float")
+        }
+
+        fn visit_str<E>(self, value: &str) -> Result<Self::Value, E>
+        where E: de::Error {
+            parse_str_into_f64(value)
+        }
+
+        fn visit_f64<E>(self, value: f64) -> Result<Self::Value, E>
+        where E: de::Error {
+            Ok(value)
+        }
+
+        fn visit_i64<E>(self, value: i64) -> Result<Self::Value, E>
+        where E: de::Error {
+            Ok(value as f64)
+        }
+
+        fn visit_u64<E>(self, value: u64) -> Result<Self::Value, E>
+        where E: de::Error {
+            Ok(value as f64)
+        }
+    }
+
+    deserializer.deserialize_any(StringOrFloatVisitor)
+}

 /// Represents an associative array `(key => values)` in a very efficient manner.
 #[derive(PartialEq, Serialize, Deserialize)]
@@ -281,6 +382,7 @@ pub(crate) fn f64_from_fastfield_u64(val: u64, field_type: &ColumnType) -> f64 {
        ColumnType::U64 => val as f64,
        ColumnType::I64 | ColumnType::DateTime => i64::from_u64(val) as f64,
        ColumnType::F64 => f64::from_u64(val),
+        ColumnType::Bool => val as f64,
        _ => {
            panic!("unexpected type {field_type:?}. This should not happen")
        }
@@ -301,6 +403,7 @@ pub(crate) fn f64_to_fastfield_u64(val: f64, field_type: &ColumnType) -> Option<
        ColumnType::U64 => Some(val as u64),
        ColumnType::I64 | ColumnType::DateTime => Some((val as i64).to_u64()),
        ColumnType::F64 => Some(val.to_u64()),
+        ColumnType::Bool => Some(val as u64),
        _ => None,
    }
 }
--- a/src/aggregation/segment_agg_result.rs
+++ b/src/aggregation/segment_agg_result.rs
@@ -16,6 +16,7 @@ use super::metric::{
    SumAggregation,
 };
 use crate::aggregation::bucket::TermMissingAgg;
+use crate::aggregation::metric::SegmentTopHitsCollector;

 pub(crate) trait SegmentAggregationCollector: CollectorClone + Debug {
    fn add_intermediate_aggregation_result(
@@ -160,6 +161,11 @@ pub(crate) fn build_single_agg_segment_collector(
                accessor_idx,
            )?,
        )),
+        TopHits(top_hits_req) => Ok(Box::new(SegmentTopHitsCollector::from_req(
+            top_hits_req,
+            accessor_idx,
+            req.segment_ordinal,
+        ))),
    }
 }

--- a/src/collector/facet_collector.rs
+++ b/src/collector/facet_collector.rs
@@ -410,6 +410,7 @@ impl SegmentCollector for FacetSegmentCollector {

 /// Intermediary result of the `FacetCollector` that stores
 /// the facet counts for all the segments.
+#[derive(Default, Clone)]
 pub struct FacetCounts {
    facet_counts: BTreeMap<Facet, u64>,
 }
@@ -493,7 +494,7 @@ mod tests {
    use super::{FacetCollector, FacetCounts};
    use crate::collector::facet_collector::compress_mapping;
    use crate::collector::Count;
-    use crate::core::Index;
+    use crate::index::Index;
    use crate::query::{AllQuery, QueryParser, TermQuery};
    use crate::schema::{Facet, FacetOptions, IndexRecordOption, Schema, TantivyDocument};
    use crate::{IndexWriter, Term};
--- a/src/collector/mod.rs
+++ b/src/collector/mod.rs
@@ -97,6 +97,7 @@ pub use self::multi_collector::{FruitHandle, MultiCollector, MultiFruit};
 mod top_collector;

 mod top_score_collector;
+pub use self::top_collector::ComparableDoc;
 pub use self::top_score_collector::{TopDocs, TopNComputer};

 mod custom_score_top_collector;
--- a/src/collector/tests.rs
+++ b/src/collector/tests.rs
@@ -2,7 +2,7 @@ use columnar::{BytesColumn, Column};

 use super::*;
 use crate::collector::{Count, FilterCollector, TopDocs};
-use crate::core::SegmentReader;
+use crate::index::SegmentReader;
 use crate::query::{AllQuery, QueryParser};
 use crate::schema::{Schema, FAST, TEXT};
 use crate::time::format_description::well_known::Rfc3339;
--- a/src/collector/top_collector.rs
+++ b/src/collector/top_collector.rs
@@ -1,47 +1,58 @@
 use std::cmp::Ordering;
 use std::marker::PhantomData;

+use serde::{Deserialize, Serialize};
+
 use super::top_score_collector::TopNComputer;
 use crate::{DocAddress, DocId, SegmentOrdinal, SegmentReader};

 /// Contains a feature (field, score, etc.) of a document along with the document address.
 ///
-/// It has a custom implementation of `PartialOrd` that reverses the order. This is because the
-/// default Rust heap is a max heap, whereas a min heap is needed.
-///
-/// Additionally, it guarantees stable sorting: in case of a tie on the feature, the document
+/// It guarantees stable sorting: in case of a tie on the feature, the document
 /// address is used.
 ///
+/// The REVERSE_ORDER generic parameter controls whether the by-feature order
+/// should be reversed, which is useful for achieving for example largest-first
+/// semantics without having to wrap the feature in a `Reverse`.
+///
 /// WARNING: equality is not what you would expect here.
 /// Two elements are equal if their feature is equal, and regardless of whether `doc`
 /// is equal. This should be perfectly fine for this usage, but let's make sure this
 /// struct is never public.
-pub(crate) struct ComparableDoc<T, D> {
+#[derive(Clone, Default, Serialize, Deserialize)]
+pub struct ComparableDoc<T, D, const REVERSE_ORDER: bool = false> {
+    /// The feature of the document. In practice, this is
+    /// is any type that implements `PartialOrd`.
    pub feature: T,
+    /// The document address. In practice, this is any
+    /// type that implements `PartialOrd`, and is guaranteed
+    /// to be unique for each document.
    pub doc: D,
 }
-impl<T: std::fmt::Debug, D: std::fmt::Debug> std::fmt::Debug for ComparableDoc<T, D> {
+impl<T: std::fmt::Debug, D: std::fmt::Debug, const R: bool> std::fmt::Debug
+    for ComparableDoc<T, D, R>
+{
    fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
-        f.debug_struct("ComparableDoc")
+        f.debug_struct(format!("ComparableDoc<_, _ {R}").as_str())
            .field("feature", &self.feature)
            .field("doc", &self.doc)
            .finish()
    }
 }

-impl<T: PartialOrd, D: PartialOrd> PartialOrd for ComparableDoc<T, D> {
+impl<T: PartialOrd, D: PartialOrd, const R: bool> PartialOrd for ComparableDoc<T, D, R> {
    fn partial_cmp(&self, other: &Self) -> Option<Ordering> {
        Some(self.cmp(other))
    }
 }

-impl<T: PartialOrd, D: PartialOrd> Ord for ComparableDoc<T, D> {
+impl<T: PartialOrd, D: PartialOrd, const R: bool> Ord for ComparableDoc<T, D, R> {
    #[inline]
    fn cmp(&self, other: &Self) -> Ordering {
-        // Reversed to make BinaryHeap work as a min-heap
-        let by_feature = other
+        let by_feature = self
            .feature
-            .partial_cmp(&self.feature)
+            .partial_cmp(&other.feature)
+            .map(|ord| if R { ord.reverse() } else { ord })
            .unwrap_or(Ordering::Equal);

        let lazy_by_doc_address = || self.doc.partial_cmp(&other.doc).unwrap_or(Ordering::Equal);
@@ -53,13 +64,13 @@ impl<T: PartialOrd, D: PartialOrd> Ord for ComparableDoc<T, D> {
    }
 }

-impl<T: PartialOrd, D: PartialOrd> PartialEq for ComparableDoc<T, D> {
+impl<T: PartialOrd, D: PartialOrd, const R: bool> PartialEq for ComparableDoc<T, D, R> {
    fn eq(&self, other: &Self) -> bool {
        self.cmp(other) == Ordering::Equal
    }
 }

-impl<T: PartialOrd, D: PartialOrd> Eq for ComparableDoc<T, D> {}
+impl<T: PartialOrd, D: PartialOrd, const R: bool> Eq for ComparableDoc<T, D, R> {}

 pub(crate) struct TopCollector<T> {
    pub limit: usize,
@@ -99,10 +110,10 @@ where T: PartialOrd + Clone
        if self.limit == 0 {
            return Ok(Vec::new());
        }
-        let mut top_collector = TopNComputer::new(self.limit + self.offset);
+        let mut top_collector: TopNComputer<_, _> = TopNComputer::new(self.limit + self.offset);
        for child_fruit in children {
            for (feature, doc) in child_fruit {
-                top_collector.push(ComparableDoc { feature, doc });
+                top_collector.push(feature, doc);
            }
        }

@@ -143,6 +154,8 @@ where T: PartialOrd + Clone
 /// The theoretical complexity for collecting the top `K` out of `n` documents
 /// is `O(n + K)`.
 pub(crate) struct TopSegmentCollector<T> {
+    /// We reverse the order of the feature in order to
+    /// have top-semantics instead of bottom semantics.
    topn_computer: TopNComputer<T, DocId>,
    segment_ord: u32,
 }
@@ -180,7 +193,7 @@ impl<T: PartialOrd + Clone> TopSegmentCollector<T> {
    /// will compare the lowest scoring item with the given one and keep whichever is greater.
    #[inline]
    pub fn collect(&mut self, doc: DocId, feature: T) {
-        self.topn_computer.push(ComparableDoc { feature, doc });
+        self.topn_computer.push(feature, doc);
    }
 }

--- a/src/collector/top_score_collector.rs
+++ b/src/collector/top_score_collector.rs
@@ -3,6 +3,8 @@ use std::marker::PhantomData;
 use std::sync::Arc;

 use columnar::ColumnValues;
+use serde::de::DeserializeOwned;
+use serde::{Deserialize, Serialize};

 use super::Collector;
 use crate::collector::custom_score_top_collector::CustomScoreTopCollector;
@@ -309,7 +311,7 @@ impl TopDocs {
    ///
    /// To comfortably work with `u64`s, `i64`s, `f64`s, or `date`s, please refer to
    /// the [.order_by_fast_field(...)](TopDocs::order_by_fast_field) method.
-    fn order_by_u64_field(
+    pub fn order_by_u64_field(
        self,
        field: impl ToString,
        order: Order,
@@ -663,7 +665,7 @@ impl Collector for TopDocs {
        reader: &SegmentReader,
    ) -> crate::Result<<Self::Child as SegmentCollector>::Fruit> {
        let heap_len = self.0.limit + self.0.offset;
-        let mut top_n = TopNComputer::new(heap_len);
+        let mut top_n: TopNComputer<_, _> = TopNComputer::new(heap_len);

        if let Some(alive_bitset) = reader.alive_bitset() {
            let mut threshold = Score::MIN;
@@ -672,21 +674,13 @@ impl Collector for TopDocs {
                if alive_bitset.is_deleted(doc) {
                    return threshold;
                }
-                let doc = ComparableDoc {
-                    feature: score,
-                    doc,
-                };
-                top_n.push(doc);
+                top_n.push(score, doc);
                threshold = top_n.threshold.unwrap_or(Score::MIN);
                threshold
            })?;
        } else {
            weight.for_each_pruning(Score::MIN, reader, &mut |doc, score| {
-                let doc = ComparableDoc {
-                    feature: score,
-                    doc,
-                };
-                top_n.push(doc);
+                top_n.push(score, doc);
                top_n.threshold.unwrap_or(Score::MIN)
            })?;
        }
@@ -725,17 +719,65 @@ impl SegmentCollector for TopScoreSegmentCollector {

 /// Fast TopN Computation
 ///
+/// Capacity of the vec is 2 * top_n.
+/// The buffer is truncated to the top_n elements when it reaches the capacity of the Vec.
+/// That means capacity has special meaning and should be carried over when cloning or serializing.
+///
 /// For TopN == 0, it will be relative expensive.
-pub struct TopNComputer<Score, DocId> {
-    buffer: Vec<ComparableDoc<Score, DocId>>,
+#[derive(Serialize, Deserialize)]
+#[serde(from = "TopNComputerDeser<Score, D, REVERSE_ORDER>")]
+pub struct TopNComputer<Score, D, const REVERSE_ORDER: bool = true> {
+    /// The buffer reverses sort order to get top-semantics instead of bottom-semantics
+    buffer: Vec<ComparableDoc<Score, D, REVERSE_ORDER>>,
    top_n: usize,
    pub(crate) threshold: Option<Score>,
 }
+// Intermediate struct for TopNComputer for deserialization, to keep vec capacity
+#[derive(Deserialize)]
+struct TopNComputerDeser<Score, D, const REVERSE_ORDER: bool> {
+    buffer: Vec<ComparableDoc<Score, D, REVERSE_ORDER>>,
+    top_n: usize,
+    threshold: Option<Score>,
+}

-impl<Score, DocId> TopNComputer<Score, DocId>
+// Custom clone to keep capacity
+impl<Score: Clone, D: Clone, const REVERSE_ORDER: bool> Clone
+    for TopNComputer<Score, D, REVERSE_ORDER>
+{
+    fn clone(&self) -> Self {
+        let mut buffer_clone = Vec::with_capacity(self.buffer.capacity());
+        buffer_clone.extend(self.buffer.iter().cloned());
+
+        TopNComputer {
+            buffer: buffer_clone,
+            top_n: self.top_n,
+            threshold: self.threshold.clone(),
+        }
+    }
+}
+
+impl<Score, D, const R: bool> From<TopNComputerDeser<Score, D, R>> for TopNComputer<Score, D, R> {
+    fn from(mut value: TopNComputerDeser<Score, D, R>) -> Self {
+        let expected_cap = value.top_n.max(1) * 2;
+        let current_cap = value.buffer.capacity();
+        if current_cap < expected_cap {
+            value.buffer.reserve_exact(expected_cap - current_cap);
+        } else {
+            value.buffer.shrink_to(expected_cap);
+        }
+
+        TopNComputer {
+            buffer: value.buffer,
+            top_n: value.top_n,
+            threshold: value.threshold,
+        }
+    }
+}
+
+impl<Score, D, const R: bool> TopNComputer<Score, D, R>
 where
    Score: PartialOrd + Clone,
-    DocId: Ord + Clone,
+    D: Serialize + DeserializeOwned + Ord + Clone,
 {
    /// Create a new `TopNComputer`.
    /// Internally it will allocate a buffer of size `2 * top_n`.
@@ -748,10 +790,12 @@ where
        }
    }

+    /// Push a new document to the top n.
+    /// If the document is below the current threshold, it will be ignored.
    #[inline]
-    pub(crate) fn push(&mut self, doc: ComparableDoc<Score, DocId>) {
+    pub fn push(&mut self, feature: Score, doc: D) {
        if let Some(last_median) = self.threshold.clone() {
-            if doc.feature < last_median {
+            if feature < last_median {
                return;
            }
        }
@@ -766,7 +810,7 @@ where
        let uninit = self.buffer.spare_capacity_mut();
        // This cannot panic, because we truncate_median will at least remove one element, since
        // the min capacity is 2.
-        uninit[0].write(doc);
+        uninit[0].write(ComparableDoc { doc, feature });
        // This is safe because it would panic in the line above
        unsafe {
            self.buffer.set_len(self.buffer.len() + 1);
@@ -785,13 +829,24 @@ where
        median_score
    }

-    pub(crate) fn into_sorted_vec(mut self) -> Vec<ComparableDoc<Score, DocId>> {
+    /// Returns the top n elements in sorted order.
+    pub fn into_sorted_vec(mut self) -> Vec<ComparableDoc<Score, D, R>> {
        if self.buffer.len() > self.top_n {
            self.truncate_top_n();
        }
        self.buffer.sort_unstable();
        self.buffer
    }
+
+    /// Returns the top n elements in stored order.
+    /// Useful if you do not need the elements in sorted order,
+    /// for example when merging the results of multiple segments.
+    pub fn into_vec(mut self) -> Vec<ComparableDoc<Score, D, R>> {
+        if self.buffer.len() > self.top_n {
+            self.truncate_top_n();
+        }
+        self.buffer
+    }
 }

 #[cfg(test)]
@@ -825,49 +880,44 @@ mod tests {
            crate::assert_nearly_equals!(result.0, expected.0);
        }
    }
+    #[test]
+    fn test_topn_computer_serde() {
+        let computer: TopNComputer<u32, u32> = TopNComputer::new(1);
+
+        let computer_ser = serde_json::to_string(&computer).unwrap();
+        let mut computer: TopNComputer<u32, u32> = serde_json::from_str(&computer_ser).unwrap();
+
+        computer.push(1u32, 5u32);
+        computer.push(1u32, 0u32);
+        computer.push(1u32, 7u32);
+
+        assert_eq!(
+            computer.into_sorted_vec(),
+            &[ComparableDoc {
+                feature: 1u32,
+                doc: 0u32,
+            },]
+        );
+    }

    #[test]
    fn test_empty_topn_computer() {
        let mut computer: TopNComputer<u32, u32> = TopNComputer::new(0);

-        computer.push(ComparableDoc {
-            feature: 1u32,
-            doc: 1u32,
-        });
-        computer.push(ComparableDoc {
-            feature: 1u32,
-            doc: 2u32,
-        });
-        computer.push(ComparableDoc {
-            feature: 1u32,
-            doc: 3u32,
-        });
+        computer.push(1u32, 1u32);
+        computer.push(1u32, 2u32);
+        computer.push(1u32, 3u32);
        assert!(computer.into_sorted_vec().is_empty());
    }
    #[test]
    fn test_topn_computer() {
        let mut computer: TopNComputer<u32, u32> = TopNComputer::new(2);

-        computer.push(ComparableDoc {
-            feature: 1u32,
-            doc: 1u32,
-        });
-        computer.push(ComparableDoc {
-            feature: 2u32,
-            doc: 2u32,
-        });
-        computer.push(ComparableDoc {
-            feature: 3u32,
-            doc: 3u32,
-        });
-        computer.push(ComparableDoc {
-            feature: 2u32,
-            doc: 4u32,
-        });
-        computer.push(ComparableDoc {
-            feature: 1u32,
-            doc: 5u32,
-        });
+        computer.push(1u32, 1u32);
+        computer.push(2u32, 2u32);
+        computer.push(3u32, 3u32);
+        computer.push(2u32, 4u32);
+        computer.push(1u32, 5u32);
        assert_eq!(
            computer.into_sorted_vec(),
            &[
@@ -889,10 +939,7 @@ mod tests {
            let mut computer: TopNComputer<u32, u32> = TopNComputer::new(top_n);

            for _ in 0..1 + top_n * 2 {
-                computer.push(ComparableDoc {
-                    feature: 1u32,
-                    doc: 1u32,
-                });
+                computer.push(1u32, 1u32);
            }
            let _vals = computer.into_sorted_vec();
        }
--- a/src/core/mod.rs
+++ b/src/core/mod.rs
@@ -1,32 +1,14 @@
 mod executor;
-pub mod index;
-mod index_meta;
-mod inverted_index_reader;
 #[doc(hidden)]
 pub mod json_utils;
 pub mod searcher;
-mod segment;
-mod segment_component;
-mod segment_id;
-mod segment_reader;
-mod single_segment_index_writer;

 use std::path::Path;

 use once_cell::sync::Lazy;

 pub use self::executor::Executor;
-pub use self::index::{Index, IndexBuilder};
-pub use self::index_meta::{
-    IndexMeta, IndexSettings, IndexSortByField, Order, SegmentMeta, SegmentMetaInventory,
-};
-pub use self::inverted_index_reader::InvertedIndexReader;
 pub use self::searcher::{Searcher, SearcherGeneration};
-pub use self::segment::Segment;
-pub use self::segment_component::SegmentComponent;
-pub use self::segment_id::SegmentId;
-pub use self::segment_reader::{merge_field_meta_data, FieldMetadata, SegmentReader};
-pub use self::single_segment_index_writer::SingleSegmentIndexWriter;

 /// The meta file contains all the information about the list of segments and the schema
 /// of the index.
--- a/src/core/searcher.rs
+++ b/src/core/searcher.rs
@@ -3,7 +3,8 @@ use std::sync::Arc;
 use std::{fmt, io};

 use crate::collector::Collector;
-use crate::core::{Executor, SegmentReader};
+use crate::core::Executor;
+use crate::index::SegmentReader;
 use crate::query::{Bm25StatisticsProvider, EnableScoring, Query};
 use crate::schema::document::DocumentDeserialize;
 use crate::schema::{Schema, Term};
--- a/src/core/tests.rs
+++ b/src/core/tests.rs
@@ -424,7 +424,7 @@ fn test_non_text_json_term_freq() {
    json_term_writer.set_fast_value(75u64);
    let postings = inv_idx
        .read_postings(
-            &json_term_writer.term(),
+            json_term_writer.term(),
            IndexRecordOption::WithFreqsAndPositions,
        )
        .unwrap()
@@ -462,7 +462,7 @@ fn test_non_text_json_term_freq_bitpacked() {
    json_term_writer.set_fast_value(75u64);
    let mut postings = inv_idx
        .read_postings(
-            &json_term_writer.term(),
+            json_term_writer.term(),
            IndexRecordOption::WithFreqsAndPositions,
        )
        .unwrap()
--- a/src/directory/mmap_directory.rs
+++ b/src/directory/mmap_directory.rs
@@ -479,6 +479,7 @@ impl Directory for MmapDirectory {
        let file: File = OpenOptions::new()
            .write(true)
            .create(true) //< if the file does not exist yet, create it.
+            .truncate(false)
            .open(full_path)
            .map_err(LockError::wrap_io_error)?;
        if lock.is_blocking {
@@ -673,7 +674,7 @@ mod tests {
            let num_segments = reader.searcher().segment_readers().len();
            assert!(num_segments <= 4);
            let num_components_except_deletes_and_tempstore =
-                crate::core::SegmentComponent::iterator().len() - 2;
+                crate::index::SegmentComponent::iterator().len() - 2;
            let max_num_mmapped = num_components_except_deletes_and_tempstore * num_segments;
            assert_eventually(|| {
                let num_mmapped = mmap_directory.get_cache_info().mmapped.len();
--- a/src/directory/ram_directory.rs
+++ b/src/directory/ram_directory.rs
@@ -85,7 +85,7 @@ impl InnerDirectory {
        self.fs
            .get(path)
            .ok_or_else(|| OpenReadError::FileDoesNotExist(PathBuf::from(path)))
-            .map(Clone::clone)
+            .cloned()
    }

    fn delete(&mut self, path: &Path) -> result::Result<(), DeleteError> {
--- a/src/index/index.rs
+++ b/src/index/index.rs
@@ -6,24 +6,23 @@ use std::path::PathBuf;
 use std::sync::Arc;

 use super::segment::Segment;
-use super::IndexSettings;
-use crate::core::single_segment_index_writer::SingleSegmentIndexWriter;
-use crate::core::{
-    Executor, IndexMeta, SegmentId, SegmentMeta, SegmentMetaInventory, META_FILEPATH,
-};
+use super::segment_reader::merge_field_meta_data;
+use super::{FieldMetadata, IndexSettings};
+use crate::core::{Executor, META_FILEPATH};
 use crate::directory::error::OpenReadError;
 #[cfg(feature = "mmap")]
 use crate::directory::MmapDirectory;
 use crate::directory::{Directory, ManagedDirectory, RamDirectory, INDEX_WRITER_LOCK};
 use crate::error::{DataCorruption, TantivyError};
+use crate::index::{IndexMeta, SegmentId, SegmentMeta, SegmentMetaInventory};
 use crate::indexer::index_writer::{MAX_NUM_THREAD, MEMORY_BUDGET_NUM_BYTES_MIN};
 use crate::indexer::segment_updater::save_metas;
-use crate::indexer::IndexWriter;
+use crate::indexer::{IndexWriter, SingleSegmentIndexWriter};
 use crate::reader::{IndexReader, IndexReaderBuilder};
 use crate::schema::document::Document;
 use crate::schema::{Field, FieldType, Schema};
 use crate::tokenizer::{TextAnalyzer, TokenizerManager};
-use crate::{merge_field_meta_data, FieldMetadata, SegmentReader};
+use crate::SegmentReader;

 fn load_metas(
    directory: &dyn Directory,
@@ -323,6 +322,15 @@ impl Index {
        Ok(())
    }

+    /// Custom thread pool by a outer thread pool.
+    pub fn set_shared_multithread_executor(
+        &mut self,
+        shared_thread_pool: Arc<Executor>,
+    ) -> crate::Result<()> {
+        self.executor = shared_thread_pool.clone();
+        Ok(())
+    }
+
    /// Replace the default single thread search executor pool
    /// by a thread pool with as many threads as there are CPUs on the system.
    pub fn set_default_multithread_executor(&mut self) -> crate::Result<()> {
--- a/src/index/index_meta.rs
+++ b/src/index/index_meta.rs
@@ -7,7 +7,7 @@ use std::sync::Arc;
 use serde::{Deserialize, Serialize};

 use super::SegmentComponent;
-use crate::core::SegmentId;
+use crate::index::SegmentId;
 use crate::schema::Schema;
 use crate::store::Compressor;
 use crate::{Inventory, Opstamp, TrackedObject};
@@ -19,7 +19,7 @@ struct DeleteMeta {
 }

 #[derive(Clone, Default)]
-pub struct SegmentMetaInventory {
+pub(crate) struct SegmentMetaInventory {
    inventory: Inventory<InnerSegmentMeta>,
 }

@@ -408,7 +408,7 @@ impl fmt::Debug for IndexMeta {
 mod tests {

    use super::IndexMeta;
-    use crate::core::index_meta::UntrackedIndexMeta;
+    use crate::index::index_meta::UntrackedIndexMeta;
    use crate::schema::{Schema, TEXT};
    use crate::store::Compressor;
    #[cfg(feature = "zstd-compression")]
--- a/src/index/inverted_index_reader.rs
+++ b/src/index/inverted_index_reader.rs
@@ -266,7 +266,9 @@ impl InvertedIndexReader {

    /// Warmup a block postings given a `Term`.
    /// This method is for an advanced usage only.
-    pub async fn warm_postings(&self, term: &Term, with_positions: bool) -> io::Result<()> {
+    ///
+    /// returns a boolean, whether the term was found in the dictionary
+    pub async fn warm_postings(&self, term: &Term, with_positions: bool) -> io::Result<bool> {
        let term_info_opt: Option<TermInfo> = self.get_term_info_async(term).await?;
        if let Some(term_info) = term_info_opt {
            let postings = self
@@ -280,23 +282,27 @@ impl InvertedIndexReader {
            } else {
                postings.await?;
            }
+            Ok(true)
+        } else {
+            Ok(false)
        }
-        Ok(())
    }

    /// Warmup a block postings given a range of `Term`s.
    /// This method is for an advanced usage only.
+    ///
+    /// returns a boolean, whether a term matching the range was found in the dictionary
    pub async fn warm_postings_range(
        &self,
        terms: impl std::ops::RangeBounds<Term>,
        limit: Option<u64>,
        with_positions: bool,
-    ) -> io::Result<()> {
+    ) -> io::Result<bool> {
        let mut term_info = self.get_term_range_async(terms, limit).await?;

        let Some(first_terminfo) = term_info.next() else {
            // no key matches, nothing more to load
-            return Ok(());
+            return Ok(false);
        };

        let last_terminfo = term_info.last().unwrap_or_else(|| first_terminfo.clone());
@@ -316,7 +322,7 @@ impl InvertedIndexReader {
        } else {
            postings.await?;
        }
-        Ok(())
+        Ok(true)
    }

    /// Warmup the block postings for all terms.
--- a/src/index/mod.rs
+++ b/src/index/mod.rs
@@ -0,0 +1,22 @@
+//! # Index Module
+//!
+//! The `index` module in Tantivy contains core components to read and write indexes.
+//!
+//! It contains `Index` and `Segment`, where a `Index` consists of one or more `Segment`s.
+
+mod index;
+mod index_meta;
+mod inverted_index_reader;
+mod segment;
+mod segment_component;
+mod segment_id;
+mod segment_reader;
+
+pub use self::index::{Index, IndexBuilder};
+pub(crate) use self::index_meta::SegmentMetaInventory;
+pub use self::index_meta::{IndexMeta, IndexSettings, IndexSortByField, Order, SegmentMeta};
+pub use self::inverted_index_reader::InvertedIndexReader;
+pub use self::segment::Segment;
+pub use self::segment_component::SegmentComponent;
+pub use self::segment_id::SegmentId;
+pub use self::segment_reader::{FieldMetadata, SegmentReader};
--- a/src/index/segment.rs
+++ b/src/index/segment.rs
@@ -2,9 +2,9 @@ use std::fmt;
 use std::path::PathBuf;

 use super::SegmentComponent;
-use crate::core::{Index, SegmentId, SegmentMeta};
 use crate::directory::error::{OpenReadError, OpenWriteError};
 use crate::directory::{Directory, FileSlice, WritePtr};
+use crate::index::{Index, SegmentId, SegmentMeta};
 use crate::schema::Schema;
 use crate::Opstamp;

--- a/src/index/segment_component.rs
+++ b/src/index/segment_component.rs
--- a/src/index/segment_id.rs
+++ b/src/index/segment_id.rs
--- a/src/index/segment_reader.rs
+++ b/src/index/segment_reader.rs
@@ -6,11 +6,11 @@ use std::{fmt, io};
 use fnv::FnvHashMap;
 use itertools::Itertools;

-use crate::core::{InvertedIndexReader, Segment, SegmentComponent, SegmentId};
 use crate::directory::{CompositeFile, FileSlice};
 use crate::error::DataCorruption;
 use crate::fastfield::{intersect_alive_bitsets, AliveBitSet, FacetReader, FastFieldReaders};
 use crate::fieldnorm::{FieldNormReader, FieldNormReaders};
+use crate::index::{InvertedIndexReader, Segment, SegmentComponent, SegmentId};
 use crate::json_utils::json_path_sep_to_dot;
 use crate::schema::{Field, IndexRecordOption, Schema, Type};
 use crate::space_usage::SegmentSpaceUsage;
@@ -515,9 +515,9 @@ impl fmt::Debug for SegmentReader {
 #[cfg(test)]
 mod test {
    use super::*;
-    use crate::core::Index;
+    use crate::index::Index;
    use crate::schema::{Schema, SchemaBuilder, Term, STORED, TEXT};
-    use crate::{DocId, FieldMetadata, IndexWriter};
+    use crate::{DocId, IndexWriter};

    #[test]
    fn test_merge_field_meta_data_same() {
--- a/src/indexer/index_writer.rs
+++ b/src/indexer/index_writer.rs
@@ -9,10 +9,10 @@ use smallvec::smallvec;
 use super::operation::{AddOperation, UserOperation};
 use super::segment_updater::SegmentUpdater;
 use super::{AddBatch, AddBatchReceiver, AddBatchSender, PreparedCommit};
-use crate::core::{Index, Segment, SegmentComponent, SegmentId, SegmentMeta, SegmentReader};
 use crate::directory::{DirectoryLock, GarbageCollectionResult, TerminatingWrite};
 use crate::error::TantivyError;
 use crate::fastfield::write_alive_bitset;
+use crate::index::{Index, Segment, SegmentComponent, SegmentId, SegmentMeta, SegmentReader};
 use crate::indexer::delete_queue::{DeleteCursor, DeleteQueue};
 use crate::indexer::doc_opstamp_mapping::DocToOpstampMapping;
 use crate::indexer::index_writer_status::IndexWriterStatus;
--- a/src/indexer/log_merge_policy.rs
+++ b/src/indexer/log_merge_policy.rs
@@ -3,7 +3,7 @@ use std::cmp;
 use itertools::Itertools;

 use super::merge_policy::{MergeCandidate, MergePolicy};
-use crate::core::SegmentMeta;
+use crate::index::SegmentMeta;

 const DEFAULT_LEVEL_LOG_SIZE: f64 = 0.75;
 const DEFAULT_MIN_LAYER_SIZE: u32 = 10_000;
@@ -144,7 +144,7 @@ mod tests {
    use once_cell::sync::Lazy;

    use super::*;
-    use crate::core::{SegmentId, SegmentMeta, SegmentMetaInventory};
+    use crate::index::{SegmentId, SegmentMeta, SegmentMetaInventory};
    use crate::indexer::merge_policy::MergePolicy;
    use crate::schema;
    use crate::schema::INDEXED;
--- a/src/indexer/merge_policy.rs
+++ b/src/indexer/merge_policy.rs
@@ -1,7 +1,7 @@
 use std::fmt::Debug;
 use std::marker;

-use crate::core::{SegmentId, SegmentMeta};
+use crate::index::{SegmentId, SegmentMeta};

 /// Set of segment suggested for a merge.
 #[derive(Debug, Clone)]
@@ -39,7 +39,7 @@ impl MergePolicy for NoMergePolicy {
 pub mod tests {

    use super::*;
-    use crate::core::{SegmentId, SegmentMeta};
+    use crate::index::{SegmentId, SegmentMeta};

    /// `MergePolicy` useful for test purposes.
    ///
--- a/src/indexer/merger.rs
+++ b/src/indexer/merger.rs
@@ -8,12 +8,12 @@ use common::ReadOnlyBitSet;
 use itertools::Itertools;
 use measure_time::debug_time;

-use crate::core::{Segment, SegmentReader};
 use crate::directory::WritePtr;
 use crate::docset::{DocSet, TERMINATED};
 use crate::error::DataCorruption;
 use crate::fastfield::{AliveBitSet, FastFieldNotAvailableError};
 use crate::fieldnorm::{FieldNormReader, FieldNormReaders, FieldNormsSerializer, FieldNormsWriter};
+use crate::index::{Segment, SegmentReader};
 use crate::indexer::doc_id_mapping::{MappingType, SegmentDocIdMapping};
 use crate::indexer::SegmentSerializer;
 use crate::postings::{InvertedIndexSerializer, Postings, SegmentPostings};
@@ -794,7 +794,7 @@ mod tests {
        BytesFastFieldTestCollector, FastFieldTestCollector, TEST_COLLECTOR_WITH_SCORE,
    };
    use crate::collector::{Count, FacetCollector};
-    use crate::core::Index;
+    use crate::index::Index;
    use crate::query::{AllQuery, BooleanQuery, EnableScoring, Scorer, TermQuery};
    use crate::schema::document::Value;
    use crate::schema::{
--- a/src/indexer/merger_sorted_index_test.rs
+++ b/src/indexer/merger_sorted_index_test.rs
@@ -1,8 +1,8 @@
 #[cfg(test)]
 mod tests {
    use crate::collector::TopDocs;
-    use crate::core::Index;
    use crate::fastfield::AliveBitSet;
+    use crate::index::Index;
    use crate::query::QueryParser;
    use crate::schema::document::Value;
    use crate::schema::{
@@ -485,7 +485,7 @@ mod bench_sorted_index_merge {

    use test::{self, Bencher};

-    use crate::core::Index;
+    use crate::index::Index;
    use crate::indexer::merger::IndexMerger;
    use crate::schema::{NumericOptions, Schema};
    use crate::{IndexSettings, IndexSortByField, IndexWriter, Order};
--- a/src/indexer/mod.rs
+++ b/src/indexer/mod.rs
@@ -25,6 +25,7 @@ mod segment_register;
 pub(crate) mod segment_serializer;
 pub(crate) mod segment_updater;
 pub(crate) mod segment_writer;
+pub(crate) mod single_segment_index_writer;
 mod stamper;

 use crossbeam_channel as channel;
@@ -34,13 +35,14 @@ pub use self::index_writer::IndexWriter;
 pub use self::log_merge_policy::LogMergePolicy;
 pub use self::merge_operation::MergeOperation;
 pub use self::merge_policy::{MergeCandidate, MergePolicy, NoMergePolicy};
+use self::operation::AddOperation;
 pub use self::operation::UserOperation;
 pub use self::prepared_commit::PreparedCommit;
 pub use self::segment_entry::SegmentEntry;
 pub(crate) use self::segment_serializer::SegmentSerializer;
 pub use self::segment_updater::{merge_filtered_segments, merge_indices};
 pub use self::segment_writer::SegmentWriter;
-use crate::indexer::operation::AddOperation;
+pub use self::single_segment_index_writer::SingleSegmentIndexWriter;

 /// Alias for the default merge policy, which is the `LogMergePolicy`.
 pub type DefaultMergePolicy = LogMergePolicy;
@@ -63,9 +65,10 @@ mod tests_mmap {
    use crate::aggregation::agg_result::AggregationResults;
    use crate::aggregation::AggregationCollector;
    use crate::collector::{Count, TopDocs};
+    use crate::index::FieldMetadata;
    use crate::query::{AllQuery, QueryParser};
    use crate::schema::{JsonObjectOptions, Schema, Type, FAST, INDEXED, STORED, TEXT};
-    use crate::{FieldMetadata, Index, IndexWriter, Term};
+    use crate::{Index, IndexWriter, Term};

    #[test]
    fn test_advance_delete_bug() -> crate::Result<()> {
@@ -403,11 +406,10 @@ mod tests_mmap {

        let searcher = reader.searcher();

-        let fields_and_vals = vec![
-            // Only way to address or it gets shadowed by `json.shadow` field
+        let fields_and_vals = [
            ("json.shadow\u{1}val".to_string(), "a"), // Succeeds
            //("json.shadow.val".to_string(), "a"),   // Fails
-            ("json.shadow.val".to_string(), "b"), // Succeeds
+            ("json.shadow.val".to_string(), "b"),
        ];

        let query_parser = QueryParser::for_index(&index, vec![]);
--- a/src/indexer/segment_entry.rs
+++ b/src/indexer/segment_entry.rs
@@ -2,7 +2,7 @@ use std::fmt;

 use common::BitSet;

-use crate::core::{SegmentId, SegmentMeta};
+use crate::index::{SegmentId, SegmentMeta};
 use crate::indexer::delete_queue::DeleteCursor;

 /// A segment entry describes the state of
--- a/src/indexer/segment_manager.rs
+++ b/src/indexer/segment_manager.rs
@@ -3,8 +3,8 @@ use std::fmt::{self, Debug, Formatter};
 use std::sync::{RwLock, RwLockReadGuard, RwLockWriteGuard};

 use super::segment_register::SegmentRegister;
-use crate::core::{SegmentId, SegmentMeta};
 use crate::error::TantivyError;
+use crate::index::{SegmentId, SegmentMeta};
 use crate::indexer::delete_queue::DeleteCursor;
 use crate::indexer::SegmentEntry;

--- a/src/indexer/segment_register.rs
+++ b/src/indexer/segment_register.rs
@@ -1,7 +1,7 @@
 use std::collections::{HashMap, HashSet};
 use std::fmt::{self, Debug, Display, Formatter};

-use crate::core::{SegmentId, SegmentMeta};
+use crate::index::{SegmentId, SegmentMeta};
 use crate::indexer::delete_queue::DeleteCursor;
 use crate::indexer::segment_entry::SegmentEntry;

@@ -103,7 +103,7 @@ impl SegmentRegister {
 #[cfg(test)]
 mod tests {
    use super::*;
-    use crate::core::{SegmentId, SegmentMetaInventory};
+    use crate::index::{SegmentId, SegmentMetaInventory};
    use crate::indexer::delete_queue::*;

    fn segment_ids(segment_register: &SegmentRegister) -> Vec<SegmentId> {
--- a/src/indexer/segment_serializer.rs
+++ b/src/indexer/segment_serializer.rs
@@ -1,8 +1,8 @@
 use common::TerminatingWrite;

-use crate::core::{Segment, SegmentComponent};
 use crate::directory::WritePtr;
 use crate::fieldnorm::FieldNormsSerializer;
+use crate::index::{Segment, SegmentComponent};
 use crate::postings::InvertedIndexSerializer;
 use crate::store::StoreWriter;

--- a/src/indexer/segment_updater.rs
+++ b/src/indexer/segment_updater.rs
@@ -9,11 +9,10 @@ use std::sync::{Arc, RwLock};
 use rayon::{ThreadPool, ThreadPoolBuilder};

 use super::segment_manager::SegmentManager;
-use crate::core::{
-    Index, IndexMeta, IndexSettings, Segment, SegmentId, SegmentMeta, META_FILEPATH,
-};
+use crate::core::META_FILEPATH;
 use crate::directory::{Directory, DirectoryClone, GarbageCollectionResult};
 use crate::fastfield::AliveBitSet;
+use crate::index::{Index, IndexMeta, IndexSettings, Segment, SegmentId, SegmentMeta};
 use crate::indexer::delete_queue::DeleteCursor;
 use crate::indexer::index_writer::advance_deletes;
 use crate::indexer::merge_operation::MergeOperationInventory;
--- a/src/indexer/segment_writer.rs
+++ b/src/indexer/segment_writer.rs
@@ -6,9 +6,9 @@ use tokenizer_api::BoxTokenStream;
 use super::doc_id_mapping::{get_doc_id_mapping_from_field, DocIdMapping};
 use super::operation::AddOperation;
 use crate::core::json_utils::index_json_values;
-use crate::core::Segment;
 use crate::fastfield::FastFieldsWriter;
 use crate::fieldnorm::{FieldNormReaders, FieldNormsWriter};
+use crate::index::Segment;
 use crate::indexer::segment_serializer::SegmentSerializer;
 use crate::postings::{
    compute_table_memory_size, serialize_postings, IndexingContext, IndexingPosition,
--- a/src/indexer/single_segment_index_writer.rs
+++ b/src/indexer/single_segment_index_writer.rs
--- a/src/lib.rs
+++ b/src/lib.rs
@@ -189,6 +189,7 @@ pub mod collector;
 pub mod directory;
 pub mod fastfield;
 pub mod fieldnorm;
+pub mod index;
 pub mod positions;
 pub mod postings;

@@ -220,18 +221,18 @@ pub use self::docset::{DocSet, TERMINATED};
 pub use self::snippet::{Snippet, SnippetGenerator};
 #[doc(hidden)]
 pub use crate::core::json_utils;
-pub use crate::core::{
-    merge_field_meta_data, Executor, FieldMetadata, Index, IndexBuilder, IndexMeta, IndexSettings,
-    IndexSortByField, InvertedIndexReader, Order, Searcher, SearcherGeneration, Segment,
-    SegmentComponent, SegmentId, SegmentMeta, SegmentReader, SingleSegmentIndexWriter,
-};
+pub use crate::core::{Executor, Searcher, SearcherGeneration};
 pub use crate::directory::Directory;
-pub use crate::indexer::IndexWriter;
+pub use crate::index::{
+    Index, IndexBuilder, IndexMeta, IndexSettings, IndexSortByField, InvertedIndexReader, Order,
+    Segment, SegmentComponent, SegmentId, SegmentMeta, SegmentReader,
+};
 #[deprecated(
    since = "0.22.0",
    note = "Will be removed in tantivy 0.23. Use export from indexer module instead"
 )]
-pub use crate::indexer::{merge_filtered_segments, merge_indices, PreparedCommit};
+pub use crate::indexer::PreparedCommit;
+pub use crate::indexer::{IndexWriter, SingleSegmentIndexWriter};
 pub use crate::postings::Postings;
 #[allow(deprecated)]
 pub use crate::schema::DatePrecision;
@@ -338,7 +339,7 @@ impl DocAddress {
 ///
 /// The id used for the segment is actually an ordinal
 /// in the list of `Segment`s held by a `Searcher`.
-#[derive(Debug, Clone, Copy, PartialEq, Eq, PartialOrd, Ord, Hash)]
+#[derive(Debug, Clone, Copy, PartialEq, Eq, PartialOrd, Ord, Hash, Serialize, Deserialize)]
 pub struct DocAddress {
    /// The segment ordinal id that identifies the segment
    /// hosting the document in the `Searcher` it is called from.
@@ -386,8 +387,8 @@ pub mod tests {
    use time::OffsetDateTime;

    use crate::collector::tests::TEST_COLLECTOR_WITH_SCORE;
-    use crate::core::SegmentReader;
    use crate::docset::{DocSet, TERMINATED};
+    use crate::index::SegmentReader;
    use crate::merge_policy::NoMergePolicy;
    use crate::query::BooleanQuery;
    use crate::schema::document::Value;
--- a/src/postings/block_segment_postings.rs
+++ b/src/postings/block_segment_postings.rs
@@ -383,8 +383,8 @@ mod tests {
    use common::HasLen;

    use super::BlockSegmentPostings;
-    use crate::core::Index;
    use crate::docset::{DocSet, TERMINATED};
+    use crate::index::Index;
    use crate::postings::compression::COMPRESSION_BLOCK_SIZE;
    use crate::postings::postings::Postings;
    use crate::postings::SegmentPostings;
--- a/src/postings/mod.rs
+++ b/src/postings/mod.rs
@@ -42,9 +42,9 @@ pub mod tests {
    use std::mem;

    use super::{InvertedIndexSerializer, Postings};
-    use crate::core::{Index, SegmentComponent, SegmentReader};
    use crate::docset::{DocSet, TERMINATED};
    use crate::fieldnorm::FieldNormReader;
+    use crate::index::{Index, SegmentComponent, SegmentReader};
    use crate::indexer::operation::AddOperation;
    use crate::indexer::SegmentWriter;
    use crate::query::Scorer;
--- a/src/postings/serializer.rs
+++ b/src/postings/serializer.rs
@@ -4,9 +4,9 @@ use std::io::{self, Write};
 use common::{BinarySerializable, CountingWriter, VInt};

 use super::TermInfo;
-use crate::core::Segment;
 use crate::directory::{CompositeWrite, WritePtr};
 use crate::fieldnorm::FieldNormReader;
+use crate::index::Segment;
 use crate::positions::PositionSerializer;
 use crate::postings::compression::{BlockEncoder, VIntEncoder, COMPRESSION_BLOCK_SIZE};
 use crate::postings::skip::SkipSerializer;
--- a/src/query/all_query.rs
+++ b/src/query/all_query.rs
@@ -1,5 +1,5 @@
-use crate::core::SegmentReader;
 use crate::docset::{DocSet, BUFFER_LEN, TERMINATED};
+use crate::index::SegmentReader;
 use crate::query::boost_query::BoostScorer;
 use crate::query::explanation::does_not_match;
 use crate::query::{EnableScoring, Explanation, Query, Scorer, Weight};
--- a/src/query/automaton_weight.rs
+++ b/src/query/automaton_weight.rs
@@ -5,7 +5,7 @@ use common::BitSet;
 use tantivy_fst::Automaton;

 use super::phrase_prefix_query::prefix_end;
-use crate::core::SegmentReader;
+use crate::index::SegmentReader;
 use crate::query::{BitSetDocSet, ConstScorer, Explanation, Scorer, Weight};
 use crate::schema::{Field, IndexRecordOption};
 use crate::termdict::{TermDictionary, TermStreamer};
--- a/src/query/boolean_query/boolean_weight.rs
+++ b/src/query/boolean_query/boolean_weight.rs
@@ -1,7 +1,7 @@
 use std::collections::HashMap;

-use crate::core::SegmentReader;
 use crate::docset::BUFFER_LEN;
+use crate::index::SegmentReader;
 use crate::postings::FreqReadingOption;
 use crate::query::explanation::does_not_match;
 use crate::query::score_combiner::{DoNothingCombiner, ScoreCombiner};
--- a/src/query/exist_query.rs
+++ b/src/query/exist_query.rs
@@ -3,8 +3,8 @@ use core::fmt::Debug;
 use columnar::{ColumnIndex, DynamicColumn};

 use super::{ConstScorer, EmptyScorer};
-use crate::core::SegmentReader;
 use crate::docset::{DocSet, TERMINATED};
+use crate::index::SegmentReader;
 use crate::query::explanation::does_not_match;
 use crate::query::{EnableScoring, Explanation, Query, Scorer, Weight};
 use crate::{DocId, Score, TantivyError};
--- a/src/query/phrase_prefix_query/phrase_prefix_weight.rs
+++ b/src/query/phrase_prefix_query/phrase_prefix_weight.rs
@@ -1,6 +1,6 @@
 use super::{prefix_end, PhrasePrefixScorer};
-use crate::core::SegmentReader;
 use crate::fieldnorm::FieldNormReader;
+use crate::index::SegmentReader;
 use crate::postings::SegmentPostings;
 use crate::query::bm25::Bm25Weight;
 use crate::query::explanation::does_not_match;
@@ -157,8 +157,8 @@ impl Weight for PhrasePrefixWeight {

 #[cfg(test)]
 mod tests {
-    use crate::core::Index;
    use crate::docset::TERMINATED;
+    use crate::index::Index;
    use crate::query::{EnableScoring, PhrasePrefixQuery, Query};
    use crate::schema::{Schema, TEXT};
    use crate::{DocSet, IndexWriter, Term};
--- a/src/query/phrase_query/mod.rs
+++ b/src/query/phrase_query/mod.rs
@@ -14,7 +14,7 @@ pub mod tests {

    use super::*;
    use crate::collector::tests::{TEST_COLLECTOR_WITHOUT_SCORE, TEST_COLLECTOR_WITH_SCORE};
-    use crate::core::Index;
+    use crate::index::Index;
    use crate::query::{EnableScoring, QueryParser, Weight};
    use crate::schema::{Schema, Term, TEXT};
    use crate::{assert_nearly_equals, DocAddress, DocId, IndexWriter, TERMINATED};
--- a/src/query/phrase_query/phrase_weight.rs
+++ b/src/query/phrase_query/phrase_weight.rs
@@ -1,6 +1,6 @@
 use super::PhraseScorer;
-use crate::core::SegmentReader;
 use crate::fieldnorm::FieldNormReader;
+use crate::index::SegmentReader;
 use crate::postings::SegmentPostings;
 use crate::query::bm25::Bm25Weight;
 use crate::query::explanation::does_not_match;
--- a/src/query/query_parser/query_parser.rs
+++ b/src/query/query_parser/query_parser.rs
@@ -13,7 +13,7 @@ use super::logical_ast::*;
 use crate::core::json_utils::{
    convert_to_fast_value_and_get_term, set_string_and_get_terms, JsonTermWriter,
 };
-use crate::core::Index;
+use crate::index::Index;
 use crate::query::range_query::{is_type_valid_for_fastfield_range_query, RangeQuery};
 use crate::query::{
    AllQuery, BooleanQuery, BoostQuery, EmptyQuery, FuzzyTermQuery, Occur, PhrasePrefixQuery,
--- a/src/query/range_query/range_query.rs
+++ b/src/query/range_query/range_query.rs
@@ -7,8 +7,8 @@ use common::{BinarySerializable, BitSet};

 use super::map_bound;
 use super::range_query_u64_fastfield::FastFieldRangeWeight;
-use crate::core::SegmentReader;
 use crate::error::TantivyError;
+use crate::index::SegmentReader;
 use crate::query::explanation::does_not_match;
 use crate::query::range_query::range_query_ip_fastfield::IPFastFieldRangeWeight;
 use crate::query::range_query::{is_type_valid_for_fastfield_range_query, map_bound_res};
--- a/src/query/term_query/term_weight.rs
+++ b/src/query/term_query/term_weight.rs
@@ -1,7 +1,7 @@
 use super::term_scorer::TermScorer;
-use crate::core::SegmentReader;
 use crate::docset::{DocSet, BUFFER_LEN};
 use crate::fieldnorm::FieldNormReader;
+use crate::index::SegmentReader;
 use crate::postings::SegmentPostings;
 use crate::query::bm25::Bm25Weight;
 use crate::query::explanation::does_not_match;
--- a/src/query/weight.rs
+++ b/src/query/weight.rs
@@ -1,6 +1,6 @@
 use super::Scorer;
-use crate::core::SegmentReader;
 use crate::docset::BUFFER_LEN;
+use crate::index::SegmentReader;
 use crate::query::Explanation;
 use crate::{DocId, DocSet, Score, TERMINATED};

--- a/src/schema/document/de.rs
+++ b/src/schema/document/de.rs
@@ -889,7 +889,7 @@ mod tests {

    #[test]
    fn test_array_serialize() {
-        let elements = vec![serde_json::Value::Null, serde_json::Value::Null];
+        let elements = [serde_json::Value::Null, serde_json::Value::Null];
        let result = serialize_value(ReferenceValue::Array(elements.iter()));
        let value = deserialize_value(result);
        assert_eq!(
@@ -900,7 +900,7 @@ mod tests {
            ]),
        );

-        let elements = vec![
+        let elements = [
            serde_json::Value::String("Hello, world".into()),
            serde_json::Value::String("Some demo".into()),
        ];
@@ -914,12 +914,12 @@ mod tests {
            ]),
        );

-        let elements = vec![];
+        let elements = [];
        let result = serialize_value(ReferenceValue::Array(elements.iter()));
        let value = deserialize_value(result);
        assert_eq!(value, crate::schema::OwnedValue::Array(vec![]));

-        let elements = vec![
+        let elements = [
            serde_json::Value::Null,
            serde_json::Value::String("Hello, world".into()),
            serde_json::Value::Number(12345.into()),
--- a/src/schema/document/se.rs
+++ b/src/schema/document/se.rs
@@ -453,7 +453,7 @@ mod tests {

    #[test]
    fn test_array_serialize() {
-        let elements = vec![serde_json::Value::Null, serde_json::Value::Null];
+        let elements = [serde_json::Value::Null, serde_json::Value::Null];
        let result = serialize_value(ReferenceValue::Array(elements.iter()));
        let expected = binary_repr!(
            collection type_codes::ARRAY_CODE,
@@ -466,7 +466,7 @@ mod tests {
            "Expected serialized value to match the binary representation"
        );

-        let elements = vec![
+        let elements = [
            serde_json::Value::String("Hello, world".into()),
            serde_json::Value::String("Some demo".into()),
        ];
@@ -482,7 +482,7 @@ mod tests {
            "Expected serialized value to match the binary representation"
        );

-        let elements = vec![];
+        let elements = [];
        let result = serialize_value(ReferenceValue::Array(elements.iter()));
        let expected = binary_repr!(
            collection type_codes::ARRAY_CODE,
@@ -493,7 +493,7 @@ mod tests {
            "Expected serialized value to match the binary representation"
        );

-        let elements = vec![
+        let elements = [
            serde_json::Value::Null,
            serde_json::Value::String("Hello, world".into()),
            serde_json::Value::Number(12345.into()),
--- a/src/snippet/mod.rs
+++ b/src/snippet/mod.rs
@@ -743,11 +743,12 @@ Survey in 2016, 2017, and 2018."#;

    #[test]
    fn test_collapse_overlapped_ranges() {
-        assert_eq!(&collapse_overlapped_ranges(&[0..1, 2..3,]), &[0..1, 2..3]);
-        assert_eq!(&collapse_overlapped_ranges(&[0..1, 1..2,]), &[0..1, 1..2]);
-        assert_eq!(&collapse_overlapped_ranges(&[0..2, 1..2,]), &[0..2]);
-        assert_eq!(&collapse_overlapped_ranges(&[0..2, 1..3,]), &[0..3]);
-        assert_eq!(&collapse_overlapped_ranges(&[0..3, 1..2,]), &[0..3]);
+        #![allow(clippy::single_range_in_vec_init)]
+        assert_eq!(&collapse_overlapped_ranges(&[0..1, 2..3]), &[0..1, 2..3]);
+        assert_eq!(&collapse_overlapped_ranges(&[0..1, 1..2]), &[0..1, 1..2]);
+        assert_eq!(&collapse_overlapped_ranges(&[0..2, 1..2]), &[0..2]);
+        assert_eq!(&collapse_overlapped_ranges(&[0..2, 1..3]), &[0..3]);
+        assert_eq!(&collapse_overlapped_ranges(&[0..3, 1..2]), &[0..3]);
    }

    #[test]
--- a/src/space_usage/mod.rs
+++ b/src/space_usage/mod.rs
@@ -290,7 +290,7 @@ impl FieldUsage {

 #[cfg(test)]
 mod test {
-    use crate::core::Index;
+    use crate::index::Index;
    use crate::schema::{Field, Schema, FAST, INDEXED, STORED, TEXT};
    use crate::space_usage::PerFieldSpaceUsage;
    use crate::{IndexWriter, Term};
--- a/src/store/mod.rs
+++ b/src/store/mod.rs
@@ -129,10 +129,7 @@ pub mod tests {
            );
        }

-        for (_, doc) in store
-            .iter::<TantivyDocument>(Some(&alive_bitset))
-            .enumerate()
-        {
+        for doc in store.iter::<TantivyDocument>(Some(&alive_bitset)) {
            let doc = doc?;
            let title_content = doc.get_first(field_title).unwrap().as_str().unwrap();
            if !title_content.starts_with("Doc ") {
--- a/stacker/Cargo.toml
+++ b/stacker/Cargo.toml
@@ -11,6 +11,7 @@ description = "term hashmap used for indexing"
 murmurhash32 = "0.3"
 common = { version = "0.6", path = "../common/", package = "tantivy-common" }
 ahash = { version = "0.8.3", default-features = false, optional = true }
+rand_distr = "0.4.3"

 [[bench]]
 harness = false
--- a/stacker/fuzz_test/Cargo.toml
+++ b/stacker/fuzz_test/Cargo.toml
@@ -0,0 +1,15 @@
+[package]
+name = "fuzz_test"
+version = "0.1.0"
+edition = "2021"
+
+# See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html
+
+[dependencies]
+ahash = "0.8.7"
+rand = "0.8.5"
+rand_distr = "0.4.3"
+tantivy-stacker = { version = "0.2.0", path = ".." }
+
+[workspace]
+
--- a/stacker/fuzz_test/src/main.rs
+++ b/stacker/fuzz_test/src/main.rs
@@ -0,0 +1,45 @@
+use ahash::AHashMap;
+use rand::{rngs::StdRng, Rng, SeedableRng};
+use rand_distr::Exp;
+use tantivy_stacker::ArenaHashMap;
+
+fn main() {
+    for _ in 0..1_000_000 {
+        let seed: u64 = rand::random();
+        test_with_seed(seed);
+    }
+}
+
+fn test_with_seed(seed: u64) {
+    let mut hash_map = AHashMap::new();
+    let mut arena_hashmap = ArenaHashMap::default();
+    let mut rng = StdRng::seed_from_u64(seed);
+    let key_count = rng.gen_range(1_000..=1_000_000);
+    let exp = Exp::new(0.05).unwrap();
+
+    for _ in 0..key_count {
+        let key_length = rng.sample::<f32, _>(exp).min(u16::MAX as f32).max(1.0) as usize;
+
+        let key: Vec<u8> = (0..key_length).map(|_| rng.gen()).collect();
+
+        arena_hashmap.mutate_or_create(&key, |current_count| {
+            let count: u64 = current_count.unwrap_or(0);
+            count + 1
+        });
+        hash_map.entry(key).and_modify(|e| *e += 1).or_insert(1);
+    }
+
+    println!(
+        "Seed: {} \t {:.2}MB",
+        seed,
+        arena_hashmap.memory_arena.len() as f32 / 1024.0 / 1024.0
+    );
+    // Check the contents of the ArenaHashMap
+    for (key, addr) in arena_hashmap.iter() {
+        let count: u64 = arena_hashmap.read(addr);
+        let count_expected = hash_map
+            .get(key)
+            .unwrap_or_else(|| panic!("NOT FOUND: Key: {:?}, Count: {}", key, count));
+        assert_eq!(count, *count_expected);
+    }
+}
--- a/stacker/src/memory_arena.rs
+++ b/stacker/src/memory_arena.rs
@@ -113,6 +113,15 @@ impl MemoryArena {
        self.pages.len() * PAGE_SIZE
    }

+    /// Returns the number of bytes allocated in the arena.
+    pub fn len(&self) -> usize {
+        self.pages.len().saturating_sub(1) * PAGE_SIZE + self.pages.last().unwrap().len
+    }
+
+    pub fn is_empty(&self) -> bool {
+        self.len() == 0
+    }
+
    #[inline]
    pub fn write_at<Item: Copy + 'static>(&mut self, addr: Addr, val: Item) {
        let dest = self.slice_mut(addr, std::mem::size_of::<Item>());
@@ -189,6 +198,11 @@ struct Page {

 impl Page {
    fn new(page_id: usize) -> Page {
+        // We use 32-bits addresses.
+        // - 20 bits for the in-page addressing
+        // - 12 bits for the page id.
+        // This limits us to 2^12 - 1=4095 for the page id.
+        assert!(page_id < 4096);
        Page {
            page_id,
            len: 0,
@@ -238,6 +252,7 @@ impl Page {
 mod tests {

    use super::MemoryArena;
+    use crate::memory_arena::PAGE_SIZE;

    #[test]
    fn test_arena_allocate_slice() {
@@ -255,6 +270,31 @@ mod tests {
        assert_eq!(arena.slice(addr_b, b.len()), b);
    }

+    #[test]
+    fn test_arena_allocate_end_of_page() {
+        let mut arena = MemoryArena::default();
+
+        // A big block
+        let len_a = PAGE_SIZE - 2;
+        let addr_a = arena.allocate_space(len_a);
+        *arena.slice_mut(addr_a, len_a).last_mut().unwrap() = 1;
+
+        // Single bytes
+        let addr_b = arena.allocate_space(1);
+        arena.slice_mut(addr_b, 1)[0] = 2;
+
+        let addr_c = arena.allocate_space(1);
+        arena.slice_mut(addr_c, 1)[0] = 3;
+
+        let addr_d = arena.allocate_space(1);
+        arena.slice_mut(addr_d, 1)[0] = 4;
+
+        assert_eq!(arena.slice(addr_a, len_a)[len_a - 1], 1);
+        assert_eq!(arena.slice(addr_b, 1)[0], 2);
+        assert_eq!(arena.slice(addr_c, 1)[0], 3);
+        assert_eq!(arena.slice(addr_d, 1)[0], 4);
+    }
+
    #[derive(Clone, Copy, Debug, Eq, PartialEq)]
    struct MyTest {
        pub a: usize,
--- a/stacker/src/shared_arena_hashmap.rs
+++ b/stacker/src/shared_arena_hashmap.rs
@@ -295,6 +295,8 @@ impl SharedArenaHashMap {
    /// will be in charge of returning a default value.
    /// If the key already as an associated value, then it will be passed
    /// `Some(previous_value)`.
+    ///
+    /// The key will be truncated to u16::MAX bytes.
    #[inline]
    pub fn mutate_or_create<V>(
        &mut self,
@@ -308,6 +310,8 @@ impl SharedArenaHashMap {
        if self.is_saturated() {
            self.resize();
        }
+        // Limit the key size to u16::MAX
+        let key = &key[..std::cmp::min(key.len(), u16::MAX as usize)];
        let hash = self.get_hash(key);
        let mut probe = self.probe(hash);
        let mut bucket = probe.next_probe();
@@ -379,6 +383,36 @@ mod tests {
        }
        assert_eq!(vanilla_hash_map.len(), 2);
    }
+
+    #[test]
+    fn test_long_key_truncation() {
+        // Keys longer than u16::MAX are truncated.
+        let mut memory_arena = MemoryArena::default();
+        let mut hash_map: SharedArenaHashMap = SharedArenaHashMap::default();
+        let key1 = (0..u16::MAX as usize).map(|i| i as u8).collect::<Vec<_>>();
+        hash_map.mutate_or_create(&key1, &mut memory_arena, |opt_val: Option<u32>| {
+            assert_eq!(opt_val, None);
+            4u32
+        });
+        // Due to truncation, this key is the same as key1
+        let key2 = (0..u16::MAX as usize + 1)
+            .map(|i| i as u8)
+            .collect::<Vec<_>>();
+        hash_map.mutate_or_create(&key2, &mut memory_arena, |opt_val: Option<u32>| {
+            assert_eq!(opt_val, Some(4));
+            3u32
+        });
+        let mut vanilla_hash_map = HashMap::new();
+        let iter_values = hash_map.iter(&memory_arena);
+        for (key, addr) in iter_values {
+            let val: u32 = memory_arena.read(addr);
+            vanilla_hash_map.insert(key.to_owned(), val);
+            assert_eq!(key.len(), key1[..].len());
+            assert_eq!(key, &key1[..])
+        }
+        assert_eq!(vanilla_hash_map.len(), 1); // Both map to the same key
+    }
+
    #[test]
    fn test_empty_hashmap() {
        let memory_arena = MemoryArena::default();
Author	SHA1	Message	Date
Raphaël Marinier	0890503fc2	Speed up searches by removing repeated memsets coming from vec.resize() Also, reserve exactly the size needed, which is surprisingly needed to get the full speedup of ~5% on a good fraction of the queries.	2024-03-12 17:50:23 +01:00
trinity-1686a	f6b0cc1aab	allow some mixing of occur and bool in strict query parser (#2323 ) * allow some mixing of occur and bool in strict query parser * allow all mixing of binary and occur in strict parser	2024-03-07 15:17:48 +01:00
PSeitz	7e41d31c6e	agg: support to deserialize f64 from string (#2311 ) * agg: support to deserialize f64 from string * remove visit_string * disallow NaN	2024-03-05 05:49:41 +01:00
Adam Reichold	40aa4abfe5	Make FacetCounts defaultable and cloneable. (#2322 )	2024-03-05 04:11:11 +01:00
dependabot[bot]	2650317622	Update fs4 requirement from 0.7.0 to 0.8.0 (#2321 ) Updates the requirements on [fs4](https://github.com/al8n/fs4-rs) to permit the latest version. - [Release notes](https://github.com/al8n/fs4-rs/releases) - [Commits](https://github.com/al8n/fs4-rs/commits) --- updated-dependencies: - dependency-name: fs4 dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2024-02-27 03:38:04 +01:00
Paul Masurel	6739357314	Removing split_size and adding split_size and shard_size as segmnet_size (#2320 ) aliases.	2024-02-26 11:35:22 +01:00
PSeitz	d57622d54b	support bool type in term aggregation (#2318 ) * support bool type in term aggregation * add Bool to Intermediate Key	2024-02-20 03:22:22 +01:00
PSeitz	f745dbc054	fix Clone for TopNComputer, add top_hits bench (#2315 ) * fix Clone for TopNComputer, add top_hits bench add top_hits agg bench test aggregation::agg_bench::bench::bench_aggregation_terms_many_with_sub_agg ... bench: 123,475,175 ns/iter (+/- 30,608,889) test aggregation::agg_bench::bench::bench_aggregation_terms_many_with_sub_agg_multi ... bench: 194,170,414 ns/iter (+/- 36,495,516) test aggregation::agg_bench::bench::bench_aggregation_terms_many_with_sub_agg_opt ... bench: 179,742,809 ns/iter (+/- 29,976,507) test aggregation::agg_bench::bench::bench_aggregation_terms_many_with_sub_agg_sparse ... bench: 27,592,534 ns/iter (+/- 2,672,370) test aggregation::agg_bench::bench::bench_aggregation_terms_many_with_top_hits_agg ... bench: 552,851,227 ns/iter (+/- 71,975,886) test aggregation::agg_bench::bench::bench_aggregation_terms_many_with_top_hits_agg_multi ... bench: 558,616,384 ns/iter (+/- 100,890,124) test aggregation::agg_bench::bench::bench_aggregation_terms_many_with_top_hits_agg_opt ... bench: 554,031,368 ns/iter (+/- 165,452,650) test aggregation::agg_bench::bench::bench_aggregation_terms_many_with_top_hits_agg_sparse ... bench: 46,435,919 ns/iter (+/- 13,681,935) * add comment	2024-02-20 03:22:00 +01:00
PSeitz	79b041f81f	clippy (#2314 )	2024-02-13 05:56:31 +01:00
PSeitz	0e16ed9ef7	Fix serde for TopNComputer (#2313 ) * Fix serde for TopNComputer The top hits aggregation changed the TopNComputer to be serializable, but capacity needs to be carried over, as it contains logic which is checked against when pushing elements (capacity == 0 is not allowed). * use serde from deser * remove pub, clippy	2024-02-07 12:52:06 +01:00
mochi	88a3275dbb	add shared search executor (#2312 )	2024-02-05 09:33:00 +01:00
PSeitz	1223a87eb2	add fuzz test for hashmap (#2310 )	2024-01-31 10:30:21 +01:00
PSeitz	48630ceec9	move into new index module (#2259 ) move core modules to index module	2024-01-31 10:30:04 +01:00
Adam Reichold	72002e8a89	Make test builds Clippy clean. (#2277 )	2024-01-31 02:47:06 +01:00
trinity-1686a	3c9297dd64	report if posting list was actually loaded when warming it up (#2309 )	2024-01-29 15:23:16 +01:00
Tushar	0e04ec3136	feat(aggregators/metric): Add a top_hits aggregator (#2198 ) * feat(aggregators/metric): Implement a top_hits aggregator * fix: Expose get_fields * fix: Serializer for top_hits request Also removes extraneous the extraneous third-party serialization helper. * chore: Avert panick on parsing invalid top_hits query * refactor: Allow multiple field names from aggregations * perf: Replace binary heap with TopNComputer * fix: Avoid comparator inversion by ComparableDoc * fix: Rank missing field values lower than present values * refactor: Make KeyOrder a struct * feat: Rough attempt at docvalue_fields * feat: Complete stab at docvalue_fields - Rename "SearchResult" => "Retrieval" - Revert Vec => HashMap for aggregation accessors. - Split accessors for core aggregation and field retrieval. - Resolve globbed field names in docvalue_fields retrieval. - Handle strings/bytes and other column types with DynamicColumn * test(unit): Add tests for top_hits aggregator * fix: docfield_value field globbing * test(unit): Include dynamic fields * fix: Value -> OwnedValue * fix: Use OwnedValue's native Null variant * chore: Improve readability of test asserts * chore: Remove DocAddress from top_hits result * docs: Update aggregator doc * revert: accidental doc test * chore: enable time macros only for tests * chore: Apply suggestions from review * chore: Apply suggestions from review * fix: Retrieve all values for fields * test(unit): Update for multi-value retrieval * chore: Assert term existence * feat: Include all columns for a column name Since a (name, type) constitutes a unique column. * fix: Resolve json fields Introduces a translation step to bridge the difference between ColumnarReaders null `\0` separated json field keys to the common `.` separated used by SegmentReader. Although, this should probably be the default behavior for ColumnarReader's public API perhaps. * chore: Address review on mutability * chore: s/segment_id/segment_ordinal instances of SegmentOrdinal * chore: Revert erroneous grammar change	2024-01-26 16:46:41 +01:00
Paul Masurel	9b7f3a55cf	Bumped census version	2024-01-26 19:32:02 +09:00
PSeitz	1dacdb6c85	add histogram agg test on empty index (#2306 )	2024-01-23 16:27:34 +01:00
François Massot	30483310ca	Minor improvement of README.md (#2305 ) * Update README.md * Remove useless paragraph * Wording.	2024-01-19 17:46:48 +09:00
Tushar	e1d18b5114	chore: Expose TopDocs::order_by_u64_field again (#2282 )	2024-01-18 05:58:24 +01:00
trinity-1686a	108f30ba23	allow newline where we allow space in query parser (#2302 ) fix regression from the new parser	2024-01-17 14:38:35 +01:00
PSeitz	5943ee46bd	Truncate keys to u16::MAX in term hashmap (#2299 ) Truncate keys to u16::MAX, instead e.g. storing 0 bytes for keys with length u16::MAX + 1 The term hashmap has a hidden API contract to only accept terms with lenght up u16::MAX.	2024-01-11 10:19:12 +01:00
PSeitz	f95a76293f	add memory arena test (#2298 ) * add memory arena test * add assert * Update stacker/src/memory_arena.rs Co-authored-by: Paul Masurel <paul@quickwit.io> --------- Co-authored-by: Paul Masurel <paul@quickwit.io>	2024-01-11 07:18:48 +01:00
Paul Masurel	014328e378	Fix bug that can cause `get_docids_for_value_range` to panic. (#2295 ) * Fix bug that can cause `get_docids_for_value_range` to panic. When `selected_docid_range.end == num_rows`, we would get a panic as we try to access a non-existing blockmeta. This PR accepts calls to rank with any value. For any value above num_rows we simply return non_null_rows. Fixes #2293 * add tests, merge variables --------- Co-authored-by: Pascal Seitz <pascal.seitz@gmail.com>	2024-01-09 14:52:20 +01:00