Hotfix 0.15.2

fix store reader iterator, take 2
use github actions for tests
2026-01-04 08:12:54 +00:00 · 2021-06-16 22:15:55 +09:00 · 2021-06-16 22:13:19 +09:00 · 2021-06-14 12:51:46 +02:00 · 2021-06-14 18:45:38 +09:00 · 2021-06-14 11:22:58 +02:00
64 changed files with 1746 additions and 952 deletions
--- a/.github/workflows/test.yml
+++ b/.github/workflows/test.yml
@@ -0,0 +1,24 @@
+name: Rust
+
+on:
+  push:
+    branches: [ main ]
+  pull_request:
+    branches: [ main ]
+
+env:
+  CARGO_TERM_COLOR: always
+
+jobs:
+  build:
+
+    runs-on: ubuntu-latest
+
+    steps:
+    - uses: actions/checkout@v2
+    - name: Build
+      run: cargo build --verbose --workspace
+    - name: Run tests
+      run: cargo test --verbose --workspace
+    - name: Check Formatting
+      run: cargo fmt --all -- --check
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -1,3 +1,11 @@
+Tantivy 0.15.2
+========================
+- Major bugfix. DocStore still panics when a deleted doc is at the beginning of a block. (@appaquet) #1088
+
+Tantivy 0.15.1
+=========================
+- Major bugfix. DocStore panics when first block is deleted. (@appaquet) #1077
+
 Tantivy 0.15.0
 =========================
 - API Changes. Using Range instead of (start, end) in the API and internals (`FileSlice`, `OwnedBytes`, `Snippets`, ...)
@@ -8,11 +16,19 @@ Tantivy 0.15.0
 - Bugfix consistent tie break handling in facet's topk (@hardikpnsp) #357
 - Date field support for range queries (@rihardsk) #516
 - Added lz4-flex as the default compression scheme in tantivy (@PSeitz) #1009
- Renamed a lot of symbols to avoid all uppercasing on acronyms, as per new clippy recommendation. For instance, RAMDirectory -> RamDirectory. (@pmasurel)
+- Renamed a lot of symbols to avoid all uppercasing on acronyms, as per new clippy recommendation. For instance, RAMDirectory -> RamDirectory. (@fulmicoton)
 - Simplified positions index format (@fulmicoton) #1022
 - Moved bitpacking to bitpacker subcrate and add BlockedBitpacker, which bitpacks blocks of 128 elements (@PSeitz) #1030
 - Added support for more-like-this query in tantivy (@evanxg852000) #1011
- Added support for sorting an index, e.g presorting documents in an index by a timestamp field. This can heavily improve performance for certain scenarios, by utilizing the sorted data (Top-n optimizations). #1026
+- Added support for sorting an index, e.g presorting documents in an index by a timestamp field. This can heavily improve performance for certain scenarios, by utilizing the sorted data (Top-n optimizations)(@PSeitz). #1026
+- Add iterator over documents in doc store (@PSeitz). #1044
+- Fix log merge policy (@PSeitz). #1043
+- Add detection to avoid small doc store blocks on merge (@PSeitz). #1054
+- Make doc store compression dynamic (@PSeitz). #1060
+- Switch to json for footer version handling (@PSeitz). #1060
+- Updated TermMerger implementation to rely on the union feature of the FST (@scampi) #469
+- Add boolean marking whether position is required in the query_terms API call (@fulmicoton). #1070
+

 Tantivy 0.14.0
 =========================
--- a/Cargo.toml
+++ b/Cargo.toml
@@ -1,6 +1,6 @@
 [package]
 name = "tantivy"
-version = "0.14.0"
+version = "0.15.2"
 authors = ["Paul Masurel <paul.masurel@gmail.com>"]
 license = "MIT"
 categories = ["database-implementations", "data-structures"]
@@ -20,8 +20,7 @@ once_cell = "1.7.2"
 regex ={ version = "1.5.4", default-features = false, features = ["std"] }
 tantivy-fst = "0.3"
 memmap = {version = "0.7", optional=true}
-lz4_flex = { version = "0.7.5", default-features = false, features = ["checked-decode"], optional = true }
-lz4 = { version = "1.23.2", optional = true }
+lz4_flex = { version = "0.8.0", default-features = false, features = ["checked-decode"], optional = true }
 brotli = { version = "3.3", optional = true }
 snap = { version = "1.0.5", optional = true }
 tempfile = { version = "3.2", optional = true }
@@ -34,7 +33,7 @@ levenshtein_automata = "0.2"
 uuid = { version = "0.8.2", features = ["v4", "serde"] }
 crossbeam = "0.8"
 futures = { version = "0.3.15", features = ["thread-pool"] }
-tantivy-query-grammar = { version="0.14.0", path="./query-grammar" }
+tantivy-query-grammar = { version="0.15.0", path="./query-grammar" }
 tantivy-bitpacker = { version="0.1", path="./bitpacker" }
 stable_deref_trait = "1.2"
 rust-stemmers = "1.2"
@@ -77,12 +76,13 @@ debug-assertions = true
 overflow-checks = true

 [features]
-default = ["mmap", "lz4-block-compression" ]
+default = ["mmap", "lz4-compression" ]
 mmap = ["fs2", "tempfile", "memmap"]
+
 brotli-compression = ["brotli"]
-lz4-compression = ["lz4"]
-lz4-block-compression = ["lz4_flex"]
+lz4-compression = ["lz4_flex"]
 snappy-compression = ["snap"]
+
 failpoints = ["fail/failpoints"]
 unstable = [] # useful for benches.
 wasm-bindgen = ["uuid/wasm-bindgen"]
--- a/appveyor.yml
+++ b/appveyor.yml
@@ -18,5 +18,6 @@ install:
 build: false

 test_script:
-  - REM SET RUST_LOG=tantivy,test & cargo test --all --verbose --no-default-features --features lz4-block-compression --features mmap
+  - REM SET RUST_LOG=tantivy,test & cargo test --all --verbose --no-default-features --features lz4-compression --features mmap
+  - REM SET RUST_LOG=tantivy,test & cargo test test_store --verbose --no-default-features --features lz4-compression --features snappy-compression --features brotli-compression --features mmap
  - REM SET RUST_BACKTRACE=1 & cargo build --examples
--- a/bitpacker/Cargo.toml
+++ b/bitpacker/Cargo.toml
@@ -2,6 +2,13 @@
 name = "tantivy-bitpacker"
 version = "0.1.0"
 edition = "2018"
+authors = ["Paul Masurel <paul.masurel@gmail.com>"]
+license = "MIT"
+categories = []
+description = """Tantivy-sub crate: bitpacking"""
+repository = "https://github.com/tantivy-search/tantivy"
+keywords = []
+

 # See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html

--- a/bitpacker/src/bitpacker.rs
+++ b/bitpacker/src/bitpacker.rs
@@ -17,6 +17,7 @@ impl BitPacker {
        }
    }

+    #[inline]
    pub fn write<TWrite: io::Write>(
        &mut self,
        val: u64,
@@ -79,6 +80,7 @@ impl BitUnpacker {
        }
    }

+    #[inline]
    pub fn get(&self, idx: u64, data: &[u8]) -> u64 {
        if self.num_bits == 0 {
            return 0u64;
--- a/bitpacker/src/blocked_bitpacker.rs
+++ b/bitpacker/src/blocked_bitpacker.rs
@@ -80,6 +80,7 @@ impl BlockedBitpacker {
                * std::mem::size_of_val(&self.buffer.get(0).cloned().unwrap_or_default())
    }

+    #[inline]
    pub fn add(&mut self, val: u64) {
        self.buffer.push(val);
        if self.buffer.len() == BLOCK_SIZE as usize {
@@ -122,6 +123,7 @@ impl BlockedBitpacker {
                .resize(self.compressed_blocks.len() + 8, 0); // add padding for bitpacker
        }
    }
+    #[inline]
    pub fn get(&self, idx: usize) -> u64 {
        let metadata_pos = idx / BLOCK_SIZE as usize;
        let pos_in_block = idx % BLOCK_SIZE as usize;
--- a/examples/custom_collector.rs
+++ b/examples/custom_collector.rs
@@ -10,7 +10,7 @@
 // ---
 // Importing tantivy...
 use tantivy::collector::{Collector, SegmentCollector};
-use tantivy::fastfield::FastFieldReader;
+use tantivy::fastfield::{DynamicFastFieldReader, FastFieldReader};
 use tantivy::query::QueryParser;
 use tantivy::schema::Field;
 use tantivy::schema::{Schema, FAST, INDEXED, TEXT};
@@ -98,7 +98,7 @@ impl Collector for StatsCollector {
 }

 struct StatsSegmentCollector {
-    fast_field_reader: FastFieldReader<u64>,
+    fast_field_reader: DynamicFastFieldReader<u64>,
    stats: Stats,
 }

--- a/examples/deleting_updating_documents.rs
+++ b/examples/deleting_updating_documents.rs
@@ -90,7 +90,7 @@ fn main() -> tantivy::Result<()> {

    let frankenstein_isbn = Term::from_field_text(isbn, "978-9176370711");

-    // Oops our frankenstein doc seems mispelled
+    // Oops our frankenstein doc seems misspelled
    let frankenstein_doc_misspelled = extract_doc_given_isbn(&reader, &frankenstein_isbn)?.unwrap();
    assert_eq!(
        schema.to_json(&frankenstein_doc_misspelled),
--- a/examples/faceted_search.rs
+++ b/examples/faceted_search.rs
@@ -92,7 +92,7 @@ fn main() -> tantivy::Result<()> {

    // Check the reference doc for different ways to create a `Facet` object.
    {
-        let facet = Facet::from_text("/Felidae/Pantherinae");
+        let facet = Facet::from("/Felidae/Pantherinae");
        let facet_term = Term::from_facet(classification, &facet);
        let facet_term_query = TermQuery::new(facet_term, IndexRecordOption::Basic);
        let mut facet_collector = FacetCollector::for_field(classification);
--- a/query-grammar/Cargo.toml
+++ b/query-grammar/Cargo.toml
@@ -1,6 +1,6 @@
 [package]
 name = "tantivy-query-grammar"
-version = "0.14.0"
+version = "0.15.0"
 authors = ["Paul Masurel <paul.masurel@gmail.com>"]
 license = "MIT"
 categories = ["database-implementations", "data-structures"]
--- a/src/collector/facet_collector.rs
+++ b/src/collector/facet_collector.rs
@@ -539,10 +539,10 @@ mod tests {
        let index = Index::create_in_ram(schema);
        let mut index_writer = index.writer_for_tests().unwrap();
        index_writer.add_document(doc!(
-            facet_field => Facet::from_text(&"/subjects/A/a"),
-            facet_field => Facet::from_text(&"/subjects/B/a"),
-            facet_field => Facet::from_text(&"/subjects/A/b"),
-            facet_field => Facet::from_text(&"/subjects/B/b"),
+            facet_field => Facet::from_text(&"/subjects/A/a").unwrap(),
+            facet_field => Facet::from_text(&"/subjects/B/a").unwrap(),
+            facet_field => Facet::from_text(&"/subjects/A/b").unwrap(),
+            facet_field => Facet::from_text(&"/subjects/B/b").unwrap(),
        ));
        index_writer.commit().unwrap();
        let reader = index.reader().unwrap();
@@ -563,16 +563,16 @@ mod tests {
        let index = Index::create_in_ram(schema);
        let mut index_writer = index.writer_for_tests()?;
        index_writer.add_document(doc!(
-            facet_field => Facet::from_text(&"/A/A"),
+            facet_field => Facet::from_text(&"/A/A").unwrap(),
        ));
        index_writer.add_document(doc!(
-            facet_field => Facet::from_text(&"/A/B"),
+            facet_field => Facet::from_text(&"/A/B").unwrap(),
        ));
        index_writer.add_document(doc!(
-            facet_field => Facet::from_text(&"/A/C/A"),
+            facet_field => Facet::from_text(&"/A/C/A").unwrap(),
        ));
        index_writer.add_document(doc!(
-            facet_field => Facet::from_text(&"/D/C/A"),
+            facet_field => Facet::from_text(&"/D/C/A").unwrap(),
        ));
        index_writer.commit()?;
        let reader = index.reader()?;
@@ -580,7 +580,7 @@ mod tests {
        assert_eq!(searcher.num_docs(), 4);

        let count_facet = |facet_str: &str| {
-            let term = Term::from_facet(facet_field, &Facet::from_text(facet_str));
+            let term = Term::from_facet(facet_field, &Facet::from_text(facet_str).unwrap());
            searcher
                .search(&TermQuery::new(term, IndexRecordOption::Basic), &Count)
                .unwrap()
--- a/src/collector/filter_collector_wrapper.rs
+++ b/src/collector/filter_collector_wrapper.rs
@@ -12,7 +12,7 @@
 use std::marker::PhantomData;

 use crate::collector::{Collector, SegmentCollector};
-use crate::fastfield::{FastFieldReader, FastValue};
+use crate::fastfield::{DynamicFastFieldReader, FastFieldReader, FastValue};
 use crate::schema::Field;
 use crate::{Score, SegmentReader, TantivyError};

@@ -155,7 +155,7 @@ where
    TPredicate: 'static,
    TPredicateValue: FastValue,
 {
-    fast_field_reader: FastFieldReader<TPredicateValue>,
+    fast_field_reader: DynamicFastFieldReader<TPredicateValue>,
    segment_collector: TSegmentCollector,
    predicate: TPredicate,
    t_predicate_value: PhantomData<TPredicateValue>,
--- a/src/collector/histogram_collector.rs
+++ b/src/collector/histogram_collector.rs
@@ -1,5 +1,5 @@
 use crate::collector::{Collector, SegmentCollector};
-use crate::fastfield::{FastFieldReader, FastValue};
+use crate::fastfield::{DynamicFastFieldReader, FastFieldReader, FastValue};
 use crate::schema::{Field, Type};
 use crate::{DocId, Score};
 use fastdivide::DividerU64;
@@ -84,7 +84,7 @@ impl HistogramComputer {
 }
 pub struct SegmentHistogramCollector {
    histogram_computer: HistogramComputer,
-    ff_reader: FastFieldReader<u64>,
+    ff_reader: DynamicFastFieldReader<u64>,
 }

 impl SegmentCollector for SegmentHistogramCollector {
--- a/src/collector/tests.rs
+++ b/src/collector/tests.rs
@@ -1,6 +1,7 @@
 use super::*;
 use crate::core::SegmentReader;
 use crate::fastfield::BytesFastFieldReader;
+use crate::fastfield::DynamicFastFieldReader;
 use crate::fastfield::FastFieldReader;
 use crate::schema::Field;
 use crate::DocId;
@@ -162,7 +163,7 @@ pub struct FastFieldTestCollector {

 pub struct FastFieldSegmentCollector {
    vals: Vec<u64>,
-    reader: FastFieldReader<u64>,
+    reader: DynamicFastFieldReader<u64>,
 }

 impl FastFieldTestCollector {
--- a/src/collector/top_score_collector.rs
+++ b/src/collector/top_score_collector.rs
@@ -4,7 +4,7 @@ use crate::collector::tweak_score_top_collector::TweakedScoreTopCollector;
 use crate::collector::{
    CustomScorer, CustomSegmentScorer, ScoreSegmentTweaker, ScoreTweaker, SegmentCollector,
 };
-use crate::fastfield::FastFieldReader;
+use crate::fastfield::{DynamicFastFieldReader, FastFieldReader};
 use crate::query::Weight;
 use crate::schema::Field;
 use crate::DocAddress;
@@ -129,7 +129,7 @@ impl fmt::Debug for TopDocs {
 }

 struct ScorerByFastFieldReader {
-    ff_reader: FastFieldReader<u64>,
+    ff_reader: DynamicFastFieldReader<u64>,
 }

 impl CustomSegmentScorer<u64> for ScorerByFastFieldReader {
@@ -151,7 +151,7 @@ impl CustomScorer<u64> for ScorerByField {
        // mapping is monotonic, so it is sufficient to compute our top-K docs.
        //
        // The conversion will then happen only on the top-K docs.
-        let ff_reader: FastFieldReader<u64> = segment_reader
+        let ff_reader = segment_reader
            .fast_fields()
            .typed_fast_field_reader(self.field)?;
        Ok(ScorerByFastFieldReader { ff_reader })
@@ -401,6 +401,7 @@ impl TopDocs {
    /// # use tantivy::query::QueryParser;
    /// use tantivy::SegmentReader;
    /// use tantivy::collector::TopDocs;
+    /// use tantivy::fastfield::FastFieldReader;
    /// use tantivy::schema::Field;
    ///
    /// fn create_schema() -> Schema {
@@ -508,6 +509,7 @@ impl TopDocs {
    /// use tantivy::SegmentReader;
    /// use tantivy::collector::TopDocs;
    /// use tantivy::schema::Field;
+    /// use tantivy::fastfield::FastFieldReader;
    ///
    /// # fn create_schema() -> Schema {
    /// #    let mut schema_builder = Schema::builder();
--- a/src/common/mod.rs
+++ b/src/common/mod.rs
@@ -8,7 +8,7 @@ pub use self::bitset::BitSet;
 pub(crate) use self::bitset::TinySet;
 pub(crate) use self::composite_file::{CompositeFile, CompositeWrite};
 pub use self::counting_writer::CountingWriter;
-pub use self::serialize::{BinarySerializable, FixedSize};
+pub use self::serialize::{BinarySerializable, DeserializeFrom, FixedSize};
 pub use self::vint::{
    read_u32_vint, read_u32_vint_no_advance, serialize_vint_u32, write_u32_vint, VInt,
 };
--- a/src/common/serialize.rs
+++ b/src/common/serialize.rs
@@ -14,6 +14,20 @@ pub trait BinarySerializable: fmt::Debug + Sized {
    fn deserialize<R: Read>(reader: &mut R) -> io::Result<Self>;
 }

+pub trait DeserializeFrom<T: BinarySerializable> {
+    fn deserialize(&mut self) -> io::Result<T>;
+}
+
+/// Implement deserialize from &[u8] for all types which implement BinarySerializable.
+///
+/// TryFrom would actually be preferrable, but not possible because of the orphan
+/// rules (not completely sure if this could be resolved)
+impl<T: BinarySerializable> DeserializeFrom<T> for &[u8] {
+    fn deserialize(&mut self) -> io::Result<T> {
+        T::deserialize(self)
+    }
+}
+
 /// `FixedSize` marks a `BinarySerializable` as
 /// always serializing to the same size.
 pub trait FixedSize: BinarySerializable {
@@ -61,6 +75,11 @@ impl<Left: BinarySerializable, Right: BinarySerializable> BinarySerializable for
        Ok((Left::deserialize(reader)?, Right::deserialize(reader)?))
    }
 }
+impl<Left: BinarySerializable + FixedSize, Right: BinarySerializable + FixedSize> FixedSize
+    for (Left, Right)
+{
+    const SIZE_IN_BYTES: usize = Left::SIZE_IN_BYTES + Right::SIZE_IN_BYTES;
+}

 impl BinarySerializable for u32 {
    fn serialize<W: Write>(&self, writer: &mut W) -> io::Result<()> {
--- a/src/core/index.rs
+++ b/src/core/index.rs
@@ -76,7 +76,7 @@ fn load_metas(
 /// );
 ///
 /// let schema = schema_builder.build();
-/// let settings = IndexSettings{sort_by_field: Some(IndexSortByField{field:"number".to_string(), order:Order::Asc})};
+/// let settings = IndexSettings{sort_by_field: Some(IndexSortByField{field:"number".to_string(), order:Order::Asc}), ..Default::default()};
 /// let index = Index::builder().schema(schema).settings(settings).create_in_ram();
 ///
 /// ```
@@ -173,7 +173,7 @@ impl IndexBuilder {
            &directory,
        )?;
        let mut metas = IndexMeta::with_schema(self.get_expect_schema()?);
-        metas.index_settings = self.index_settings.clone();
+        metas.index_settings = self.index_settings;
        let index = Index::open_from_metas(directory, &metas, SegmentMetaInventory::default());
        Ok(index)
    }
@@ -460,6 +460,13 @@ impl Index {
    pub fn settings(&self) -> &IndexSettings {
        &self.settings
    }
+
+    /// Accessor to the index settings
+    ///
+    pub fn settings_mut(&mut self) -> &mut IndexSettings {
+        &mut self.settings
+    }
+
    /// Accessor to the index schema
    ///
    /// The schema is actually cloned.
--- a/src/core/index_meta.rs
+++ b/src/core/index_meta.rs
@@ -1,7 +1,7 @@
 use super::SegmentComponent;
-use crate::core::SegmentId;
 use crate::schema::Schema;
 use crate::Opstamp;
+use crate::{core::SegmentId, store::Compressor};
 use census::{Inventory, TrackedObject};
 use serde::{Deserialize, Serialize};
 use std::path::PathBuf;
@@ -233,7 +233,11 @@ impl InnerSegmentMeta {
 pub struct IndexSettings {
    /// Sorts the documents by information
    /// provided in `IndexSortByField`
+    #[serde(skip_serializing_if = "Option::is_none")]
    pub sort_by_field: Option<IndexSortByField>,
+    /// The `Compressor` used to compress the doc store.
+    #[serde(default)]
+    pub docstore_compression: Compressor,
 }
 /// Settings to presort the documents in an index
 ///
@@ -255,6 +259,17 @@ pub enum Order {
    /// Descending Order
    Desc,
 }
+impl Order {
+    /// return if the Order is ascending
+    pub fn is_asc(&self) -> bool {
+        self == &Order::Asc
+    }
+    /// return if the Order is descending
+    pub fn is_desc(&self) -> bool {
+        self == &Order::Desc
+    }
+}
+
 /// Meta information about the `Index`.
 ///
 /// This object is serialized on disk in the `meta.json` file.
@@ -369,6 +384,7 @@ mod tests {
                    field: "text".to_string(),
                    order: Order::Asc,
                }),
+                ..Default::default()
            },
            segments: Vec::new(),
            schema,
@@ -378,7 +394,7 @@ mod tests {
        let json = serde_json::ser::to_string(&index_metas).expect("serialization failed");
        assert_eq!(
            json,
-            r#"{"index_settings":{"sort_by_field":{"field":"text","order":"Asc"}},"segments":[],"schema":[{"name":"text","type":"text","options":{"indexing":{"record":"position","tokenizer":"default"},"stored":false}}],"opstamp":0}"#
+            r#"{"index_settings":{"sort_by_field":{"field":"text","order":"Asc"},"docstore_compression":"lz4"},"segments":[],"schema":[{"name":"text","type":"text","options":{"indexing":{"record":"position","tokenizer":"default"},"stored":false}}],"opstamp":0}"#
        );
    }
 }
--- a/src/directory/file_slice.rs
+++ b/src/directory/file_slice.rs
@@ -147,6 +147,13 @@ impl FileSlice {
        self.slice(from_offset..self.len())
    }

+    /// Returns a slice from the end.
+    ///
+    /// Equivalent to `.slice(self.len() - from_offset, self.len())`
+    pub fn slice_from_end(&self, from_offset: usize) -> FileSlice {
+        self.slice(self.len() - from_offset..self.len())
+    }
+
    /// Like `.slice(...)` but enforcing only the `to`
    /// boundary.
    ///
--- a/src/directory/footer.rs
+++ b/src/directory/footer.rs
@@ -1,69 +1,45 @@
-use crate::common::{BinarySerializable, CountingWriter, FixedSize, HasLen, VInt};
 use crate::directory::error::Incompatibility;
 use crate::directory::FileSlice;
-use crate::directory::{AntiCallToken, TerminatingWrite};
-use crate::Version;
+use crate::{
+    common::{BinarySerializable, CountingWriter, DeserializeFrom, FixedSize, HasLen},
+    directory::{AntiCallToken, TerminatingWrite},
+    Version, INDEX_FORMAT_VERSION,
+};
 use crc32fast::Hasher;
+use serde::{Deserialize, Serialize};
 use std::io;
 use std::io::Write;

-const FOOTER_MAX_LEN: usize = 10_000;
+const FOOTER_MAX_LEN: u32 = 50_000;
+
+/// The magic byte of the footer to identify corruption
+/// or an old version of the footer.
+const FOOTER_MAGIC_NUMBER: u32 = 1337;

 type CrcHashU32 = u32;

-#[derive(Debug, Clone, PartialEq)]
+/// A Footer is appended to every file
+#[derive(Debug, Clone, PartialEq, Serialize, Deserialize)]
 pub struct Footer {
    pub version: Version,
-    pub meta: String,
-    pub versioned_footer: VersionedFooter,
-}
-
-/// Serialises the footer to a byte-array
-/// - versioned_footer_len : 4 bytes
-///-  versioned_footer: variable bytes
-/// - meta_len: 4 bytes
-/// - meta: variable bytes
-/// - version_len: 4 bytes
-/// - version json: variable bytes
-impl BinarySerializable for Footer {
-    fn serialize<W: io::Write>(&self, writer: &mut W) -> io::Result<()> {
-        BinarySerializable::serialize(&self.versioned_footer, writer)?;
-        BinarySerializable::serialize(&self.meta, writer)?;
-        let version_string =
-            serde_json::to_string(&self.version).map_err(|_err| io::ErrorKind::InvalidInput)?;
-        BinarySerializable::serialize(&version_string, writer)?;
-        Ok(())
-    }
-
-    fn deserialize<R: io::Read>(reader: &mut R) -> io::Result<Self> {
-        let versioned_footer = VersionedFooter::deserialize(reader)?;
-        let meta = String::deserialize(reader)?;
-        let version_json = String::deserialize(reader)?;
-        let version = serde_json::from_str(&version_json)?;
-        Ok(Footer {
-            version,
-            meta,
-            versioned_footer,
-        })
-    }
+    pub crc: CrcHashU32,
 }

 impl Footer {
-    pub fn new(versioned_footer: VersionedFooter) -> Self {
+    pub fn new(crc: CrcHashU32) -> Self {
        let version = crate::VERSION.clone();
-        let meta = version.to_string();
-        Footer {
-            version,
-            meta,
-            versioned_footer,
-        }
+        Footer { version, crc }
    }

+    pub fn crc(&self) -> CrcHashU32 {
+        self.crc
+    }
    pub fn append_footer<W: io::Write>(&self, mut write: &mut W) -> io::Result<()> {
        let mut counting_write = CountingWriter::wrap(&mut write);
-        self.serialize(&mut counting_write)?;
-        let written_len = counting_write.written_bytes();
-        (written_len as u32).serialize(write)?;
+        counting_write.write_all(serde_json::to_string(&self)?.as_ref())?;
+        let footer_payload_len = counting_write.written_bytes();
+        BinarySerializable::serialize(&(footer_payload_len as u32), write)?;
+        BinarySerializable::serialize(&(FOOTER_MAGIC_NUMBER as u32), write)?;
        Ok(())
    }

@@ -77,12 +53,47 @@ impl Footer {
                ),
            ));
        }
-        let (body_footer, footer_len_file) = file.split_from_end(u32::SIZE_IN_BYTES);
-        let mut footer_len_bytes = footer_len_file.read_bytes()?;
-        let footer_len = u32::deserialize(&mut footer_len_bytes)? as usize;
-        let (body, footer) = body_footer.split_from_end(footer_len);
-        let mut footer_bytes = footer.read_bytes()?;
-        let footer = Footer::deserialize(&mut footer_bytes)?;
+
+        let footer_metadata_len = <(u32, u32)>::SIZE_IN_BYTES;
+        let (footer_len, footer_magic_byte): (u32, u32) = file
+            .slice_from_end(footer_metadata_len)
+            .read_bytes()?
+            .as_ref()
+            .deserialize()?;
+
+        if footer_magic_byte != FOOTER_MAGIC_NUMBER {
+            return Err(io::Error::new(
+                io::ErrorKind::InvalidData,
+                    "Footer magic byte mismatch. File corrupted or index was created using old an tantivy version which is not supported anymore. Please use tantivy 0.15 or above to recreate the index.",
+            ));
+        }
+
+        if footer_len > FOOTER_MAX_LEN {
+            return Err(io::Error::new(
+                io::ErrorKind::InvalidData,
+                format!(
+                    "Footer seems invalid as it suggests a footer len of {}. File is corrupted, \
+            or the index was created with a different & old version of tantivy.",
+                    footer_len
+                ),
+            ));
+        }
+        let total_footer_size = footer_len as usize + footer_metadata_len;
+        if file.len() < total_footer_size {
+            return Err(io::Error::new(
+                io::ErrorKind::UnexpectedEof,
+                format!(
+                    "File corrupted. The file is smaller than it's footer bytes (len={}).",
+                    total_footer_size
+                ),
+            ));
+        }
+
+        let footer: Footer = serde_json::from_slice(&file.read_bytes_slice(
+            file.len() - total_footer_size..file.len() - footer_metadata_len as usize,
+        )?)?;
+
+        let body = file.slice_to(file.len() - total_footer_size);
        Ok((footer, body))
    }

@@ -90,151 +101,16 @@ impl Footer {
    /// Has to be called after `extract_footer` to make sure it's not accessing uninitialised memory
    pub fn is_compatible(&self) -> Result<(), Incompatibility> {
        let library_version = crate::version();
-        match &self.versioned_footer {
-            VersionedFooter::V1 {
-                crc32: _crc,
-                store_compression,
-            } => {
-                if &library_version.store_compression != store_compression {
-                    return Err(Incompatibility::CompressionMismatch {
-                        library_compression_format: library_version.store_compression.to_string(),
-                        index_compression_format: store_compression.to_string(),
-                    });
-                }
-                Ok(())
-            }
-            VersionedFooter::V2 {
-                crc32: _crc,
-                store_compression,
-            } => {
-                if &library_version.store_compression != store_compression {
-                    return Err(Incompatibility::CompressionMismatch {
-                        library_compression_format: library_version.store_compression.to_string(),
-                        index_compression_format: store_compression.to_string(),
-                    });
-                }
-                Ok(())
-            }
-            VersionedFooter::V3 {
-                crc32: _crc,
-                store_compression,
-            } => {
-                if &library_version.store_compression != store_compression {
-                    return Err(Incompatibility::CompressionMismatch {
-                        library_compression_format: library_version.store_compression.to_string(),
-                        index_compression_format: store_compression.to_string(),
-                    });
-                }
-                Ok(())
-            }
-            VersionedFooter::UnknownVersion => Err(Incompatibility::IndexMismatch {
+        if self.version.index_format_version < 4
+            || self.version.index_format_version > INDEX_FORMAT_VERSION
+        {
+            return Err(Incompatibility::IndexMismatch {
                library_version: library_version.clone(),
                index_version: self.version.clone(),
-            }),
+            });
        }
-    }
-}
-
-/// Footer that includes a crc32 hash that enables us to checksum files in the index
-#[derive(Debug, Clone, PartialEq)]
-pub enum VersionedFooter {
-    UnknownVersion,
-    V1 {
-        crc32: CrcHashU32,
-        store_compression: String,
-    },
-    // Introduction of the Block WAND information.
-    V2 {
-        crc32: CrcHashU32,
-        store_compression: String,
-    },
-    // Block wand max termfred on 1 byte
-    V3 {
-        crc32: CrcHashU32,
-        store_compression: String,
-    },
-}
-
-impl BinarySerializable for VersionedFooter {
-    fn serialize<W: io::Write>(&self, writer: &mut W) -> io::Result<()> {
-        let mut buf = Vec::new();
-        match self {
-            VersionedFooter::V3 {
-                crc32,
-                store_compression: compression,
-            } => {
-                // Serializes a valid `VersionedFooter` or panics if the version is unknown
-                // [   version    |   crc_hash  | compression_mode ]
-                // [    0..4      |     4..8    |     variable     ]
-                BinarySerializable::serialize(&3u32, &mut buf)?;
-                BinarySerializable::serialize(crc32, &mut buf)?;
-                BinarySerializable::serialize(compression, &mut buf)?;
-            }
-            VersionedFooter::V2 { .. }
-            | VersionedFooter::V1 { .. }
-            | VersionedFooter::UnknownVersion => {
-                return Err(io::Error::new(
-                    io::ErrorKind::InvalidInput,
-                    "Cannot serialize an unknown versioned footer ",
-                ));
-            }
-        }
-        BinarySerializable::serialize(&VInt(buf.len() as u64), writer)?;
-        assert!(buf.len() <= FOOTER_MAX_LEN);
-        writer.write_all(&buf[..])?;
        Ok(())
    }
-
-    fn deserialize<R: io::Read>(reader: &mut R) -> io::Result<Self> {
-        let len = VInt::deserialize(reader)?.0 as usize;
-        if len > FOOTER_MAX_LEN {
-            return Err(io::Error::new(
-                io::ErrorKind::InvalidData,
-                format!(
-                    "Footer seems invalid as it suggests a footer len of {}. File is corrupted, \
-            or the index was created with a different & old version of tantivy.",
-                    len
-                ),
-            ));
-        }
-        let mut buf = vec![0u8; len];
-        reader.read_exact(&mut buf[..])?;
-        let mut cursor = &buf[..];
-        let version = u32::deserialize(&mut cursor)?;
-        if version > 3 {
-            return Ok(VersionedFooter::UnknownVersion);
-        }
-        let crc32 = u32::deserialize(&mut cursor)?;
-        let store_compression = String::deserialize(&mut cursor)?;
-        Ok(if version == 1 {
-            VersionedFooter::V1 {
-                crc32,
-                store_compression,
-            }
-        } else if version == 2 {
-            VersionedFooter::V2 {
-                crc32,
-                store_compression,
-            }
-        } else {
-            assert_eq!(version, 3);
-            VersionedFooter::V3 {
-                crc32,
-                store_compression,
-            }
-        })
-    }
-}
-
-impl VersionedFooter {
-    pub fn crc(&self) -> Option<CrcHashU32> {
-        match self {
-            VersionedFooter::V3 { crc32, .. } => Some(*crc32),
-            VersionedFooter::V2 { crc32, .. } => Some(*crc32),
-            VersionedFooter::V1 { crc32, .. } => Some(*crc32),
-            VersionedFooter::UnknownVersion { .. } => None,
-        }
-    }
 }

 pub(crate) struct FooterProxy<W: TerminatingWrite> {
@@ -268,10 +144,7 @@ impl<W: TerminatingWrite> Write for FooterProxy<W> {
 impl<W: TerminatingWrite> TerminatingWrite for FooterProxy<W> {
    fn terminate_ref(&mut self, _: AntiCallToken) -> io::Result<()> {
        let crc32 = self.hasher.take().unwrap().finalize();
-        let footer = Footer::new(VersionedFooter::V3 {
-            crc32,
-            store_compression: crate::store::COMPRESSION.to_string(),
-        });
+        let footer = Footer::new(crc32);
        let mut writer = self.writer.take().unwrap();
        footer.append_footer(&mut writer)?;
        writer.terminate()
@@ -281,140 +154,75 @@ impl<W: TerminatingWrite> TerminatingWrite for FooterProxy<W> {
 #[cfg(test)]
 mod tests {

-    use super::CrcHashU32;
-    use super::FooterProxy;
-    use crate::common::{BinarySerializable, VInt};
-    use crate::directory::footer::{Footer, VersionedFooter};
-    use crate::directory::TerminatingWrite;
-    use byteorder::{ByteOrder, LittleEndian};
-    use regex::Regex;
+    use crate::directory::footer::Footer;
+    use crate::directory::OwnedBytes;
+    use crate::{
+        common::BinarySerializable,
+        directory::{footer::FOOTER_MAGIC_NUMBER, FileSlice},
+    };
    use std::io;

    #[test]
-    fn test_versioned_footer() {
-        let mut vec = Vec::new();
-        let footer_proxy = FooterProxy::new(&mut vec);
-        assert!(footer_proxy.terminate().is_ok());
-        if crate::store::COMPRESSION == "lz4" {
-            assert_eq!(vec.len(), 158);
-        } else if crate::store::COMPRESSION == "snappy" {
-            assert_eq!(vec.len(), 167);
-        } else if crate::store::COMPRESSION == "lz4_block" {
-            assert_eq!(vec.len(), 176);
-        }
-        let footer = Footer::deserialize(&mut &vec[..]).unwrap();
-        assert!(matches!(
-           footer.versioned_footer,
-           VersionedFooter::V3 { store_compression, .. }
-           if store_compression == crate::store::COMPRESSION
-        ));
-        assert_eq!(&footer.version, crate::version());
+    fn test_deserialize_footer() {
+        let mut buf: Vec<u8> = vec![];
+        let footer = Footer::new(123);
+        footer.append_footer(&mut buf).unwrap();
+        let owned_bytes = OwnedBytes::new(buf);
+        let fileslice = FileSlice::new(Box::new(owned_bytes));
+        let (footer_deser, _body) = Footer::extract_footer(fileslice).unwrap();
+        assert_eq!(footer_deser.crc(), footer.crc());
    }
-
    #[test]
-    fn test_serialize_deserialize_footer() {
-        let mut buffer = Vec::new();
-        let crc32 = 123456u32;
-        let footer: Footer = Footer::new(VersionedFooter::V3 {
-            crc32,
-            store_compression: "lz4".to_string(),
-        });
-        footer.serialize(&mut buffer).unwrap();
-        let footer_deser = Footer::deserialize(&mut &buffer[..]).unwrap();
-        assert_eq!(footer_deser, footer);
+    fn test_deserialize_footer_missing_magic_byte() {
+        let mut buf: Vec<u8> = vec![];
+        BinarySerializable::serialize(&0_u32, &mut buf).unwrap();
+        let wrong_magic_byte: u32 = 5555;
+        BinarySerializable::serialize(&wrong_magic_byte, &mut buf).unwrap();
+
+        let owned_bytes = OwnedBytes::new(buf);
+
+        let fileslice = FileSlice::new(Box::new(owned_bytes));
+        let err = Footer::extract_footer(fileslice).unwrap_err();
+        assert_eq!(
+            err.to_string(),
+            "Footer magic byte mismatch. File corrupted or index was created using old an tantivy version which \
+            is not supported anymore. Please use tantivy 0.15 or above to recreate the index."
+        );
    }
-
    #[test]
-    fn footer_length() {
-        let crc32 = 1111111u32;
-        let versioned_footer = VersionedFooter::V3 {
-            crc32,
-            store_compression: "lz4".to_string(),
-        };
-        let mut buf = Vec::new();
-        versioned_footer.serialize(&mut buf).unwrap();
-        assert_eq!(buf.len(), 13);
-        let footer = Footer::new(versioned_footer);
-        let regex_ptn = Regex::new(
-            "tantivy v[0-9]{1,3}\\.[0-9]{1,3}\\.[0-9]{1,3}\\.{0,10}, index_format v[0-9]{1,5}",
-        )
-        .unwrap();
-        assert!(regex_ptn.is_match(&footer.meta));
-    }
+    fn test_deserialize_footer_wrong_filesize() {
+        let mut buf: Vec<u8> = vec![];
+        BinarySerializable::serialize(&100_u32, &mut buf).unwrap();
+        BinarySerializable::serialize(&FOOTER_MAGIC_NUMBER, &mut buf).unwrap();

-    #[test]
-    fn versioned_footer_from_bytes() {
-        let v_footer_bytes = vec![
-            // versionned footer length
-            12 | 128,
-            // index format version
-            3,
-            0,
-            0,
-            0,
-            // crc 32
-            12,
-            35,
-            89,
-            18,
-            // compression format
-            3 | 128,
-            b'l',
-            b'z',
-            b'4',
-        ];
-        let mut cursor = &v_footer_bytes[..];
-        let versioned_footer = VersionedFooter::deserialize(&mut cursor).unwrap();
-        assert!(cursor.is_empty());
-        let expected_crc: u32 = LittleEndian::read_u32(&v_footer_bytes[5..9]) as CrcHashU32;
-        let expected_versioned_footer: VersionedFooter = VersionedFooter::V3 {
-            crc32: expected_crc,
-            store_compression: "lz4".to_string(),
-        };
-        assert_eq!(versioned_footer, expected_versioned_footer);
-        let mut buffer = Vec::new();
-        assert!(versioned_footer.serialize(&mut buffer).is_ok());
-        assert_eq!(&v_footer_bytes[..], &buffer[..]);
-    }
+        let owned_bytes = OwnedBytes::new(buf);

-    #[test]
-    fn versioned_footer_panic() {
-        let v_footer_bytes = vec![6u8 | 128u8, 3u8, 0u8, 0u8, 1u8, 0u8, 0u8];
-        let mut b = &v_footer_bytes[..];
-        let versioned_footer = VersionedFooter::deserialize(&mut b).unwrap();
-        assert!(b.is_empty());
-        let expected_versioned_footer = VersionedFooter::UnknownVersion;
-        assert_eq!(versioned_footer, expected_versioned_footer);
-        let mut buf = Vec::new();
-        assert!(versioned_footer.serialize(&mut buf).is_err());
-    }
-
-    #[test]
-    #[cfg(not(feature = "lz4"))]
-    fn compression_mismatch() {
-        let crc32 = 1111111u32;
-        let versioned_footer = VersionedFooter::V1 {
-            crc32,
-            store_compression: "lz4".to_string(),
-        };
-        let footer = Footer::new(versioned_footer);
-        let res = footer.is_compatible();
-        assert!(res.is_err());
+        let fileslice = FileSlice::new(Box::new(owned_bytes));
+        let err = Footer::extract_footer(fileslice).unwrap_err();
+        assert_eq!(err.kind(), io::ErrorKind::UnexpectedEof);
+        assert_eq!(
+            err.to_string(),
+            "File corrupted. The file is smaller than it\'s footer bytes (len=108)."
+        );
    }

    #[test]
    fn test_deserialize_too_large_footer() {
-        let mut buf = vec![];
-        assert!(FooterProxy::new(&mut buf).terminate().is_ok());
-        let mut long_len_buf = [0u8; 10];
-        let num_bytes = VInt(super::FOOTER_MAX_LEN as u64 + 1u64).serialize_into(&mut long_len_buf);
-        buf[0..num_bytes].copy_from_slice(&long_len_buf[..num_bytes]);
-        let err = Footer::deserialize(&mut &buf[..]).unwrap_err();
+        let mut buf: Vec<u8> = vec![];
+
+        let footer_length = super::FOOTER_MAX_LEN + 1;
+        BinarySerializable::serialize(&footer_length, &mut buf).unwrap();
+        BinarySerializable::serialize(&FOOTER_MAGIC_NUMBER, &mut buf).unwrap();
+
+        let owned_bytes = OwnedBytes::new(buf);
+
+        let fileslice = FileSlice::new(Box::new(owned_bytes));
+        let err = Footer::extract_footer(fileslice).unwrap_err();
        assert_eq!(err.kind(), io::ErrorKind::InvalidData);
        assert_eq!(
            err.to_string(),
-            "Footer seems invalid as it suggests a footer len of 10001. File is corrupted, \
-            or the index was created with a different & old version of tantivy."
+            "Footer seems invalid as it suggests a footer len of 50001. File is corrupted, \
+    or the index was created with a different & old version of tantivy."
        );
    }
 }
--- a/src/directory/managed_directory.rs
+++ b/src/directory/managed_directory.rs
@@ -245,11 +245,7 @@ impl ManagedDirectory {
        let mut hasher = Hasher::new();
        hasher.update(bytes.as_slice());
        let crc = hasher.finalize();
-        Ok(footer
-            .versioned_footer
-            .crc()
-            .map(|v| v == crc)
-            .unwrap_or(false))
+        Ok(footer.crc() == crc)
    }

    /// List files for which checksum does not match content
--- a/src/directory/mmap_directory.rs
+++ b/src/directory/mmap_directory.rs
@@ -593,7 +593,7 @@ mod tests {

            let mut index_writer = index.writer_for_tests().unwrap();
            let mut log_merge_policy = LogMergePolicy::default();
-            log_merge_policy.set_min_merge_size(3);
+            log_merge_policy.set_min_num_segments(3);
            index_writer.set_merge_policy(Box::new(log_merge_policy));
            for _num_commits in 0..10 {
                for _ in 0..10 {
--- a/src/fastfield/bytes/reader.rs
+++ b/src/fastfield/bytes/reader.rs
@@ -1,7 +1,7 @@
+use crate::directory::FileSlice;
 use crate::directory::OwnedBytes;
-use crate::fastfield::FastFieldReader;
+use crate::fastfield::{BitpackedFastFieldReader, FastFieldReader, MultiValueLength};
 use crate::DocId;
-use crate::{directory::FileSlice, fastfield::MultiValueLength};

 /// Reader for byte array fast fields
 ///
@@ -15,13 +15,13 @@ use crate::{directory::FileSlice, fastfield::MultiValueLength};
 /// and the start index for the next document, and keeping the bytes in between.
 #[derive(Clone)]
 pub struct BytesFastFieldReader {
-    idx_reader: FastFieldReader<u64>,
+    idx_reader: BitpackedFastFieldReader<u64>,
    values: OwnedBytes,
 }

 impl BytesFastFieldReader {
    pub(crate) fn open(
-        idx_reader: FastFieldReader<u64>,
+        idx_reader: BitpackedFastFieldReader<u64>,
        values_file: FileSlice,
    ) -> crate::Result<BytesFastFieldReader> {
        let values = values_file.read_bytes()?;
--- a/src/fastfield/bytes/writer.rs
+++ b/src/fastfield/bytes/writer.rs
@@ -1,8 +1,11 @@
 use std::io;

+use crate::fastfield::serializer::FastFieldSerializer;
 use crate::schema::{Document, Field, Value};
 use crate::DocId;
-use crate::{fastfield::serializer::FastFieldSerializer, indexer::doc_id_mapping::DocIdMapping};
+use crate::{
+    fastfield::serializer::CompositeFastFieldSerializer, indexer::doc_id_mapping::DocIdMapping,
+};

 /// Writer for byte array (as in, any number of bytes per document) fast fields
 ///
@@ -104,7 +107,7 @@ impl BytesFastFieldWriter {
    /// Serializes the fast field values by pushing them to the `FastFieldSerializer`.
    pub fn serialize(
        &self,
-        serializer: &mut FastFieldSerializer,
+        serializer: &mut CompositeFastFieldSerializer,
        doc_id_map: Option<&DocIdMapping>,
    ) -> io::Result<()> {
        // writing the offset index
--- a/src/fastfield/facet_reader.rs
+++ b/src/fastfield/facet_reader.rs
@@ -95,7 +95,7 @@ mod tests {
        let schema = schema_builder.build();
        let index = Index::create_in_ram(schema);
        let mut index_writer = index.writer_for_tests()?;
-        index_writer.add_document(doc!(facet_field=>Facet::from_text("/a/b")));
+        index_writer.add_document(doc!(facet_field=>Facet::from_text("/a/b").unwrap()));
        index_writer.commit()?;
        let searcher = index.reader()?.searcher();
        let facet_reader = searcher
@@ -118,7 +118,7 @@ mod tests {
        let schema = schema_builder.build();
        let index = Index::create_in_ram(schema);
        let mut index_writer = index.writer_for_tests()?;
-        index_writer.add_document(doc!(facet_field=>Facet::from_text("/a/b")));
+        index_writer.add_document(doc!(facet_field=>Facet::from_text("/a/b").unwrap()));
        index_writer.commit()?;
        let searcher = index.reader()?.searcher();
        let facet_reader = searcher
@@ -141,7 +141,7 @@ mod tests {
        let schema = schema_builder.build();
        let index = Index::create_in_ram(schema);
        let mut index_writer = index.writer_for_tests()?;
-        index_writer.add_document(doc!(facet_field=>Facet::from_text("/a/b")));
+        index_writer.add_document(doc!(facet_field=>Facet::from_text("/a/b").unwrap()));
        index_writer.commit()?;
        let searcher = index.reader()?.searcher();
        let facet_reader = searcher
@@ -164,7 +164,7 @@ mod tests {
        let schema = schema_builder.build();
        let index = Index::create_in_ram(schema);
        let mut index_writer = index.writer_for_tests()?;
-        index_writer.add_document(doc!(facet_field=>Facet::from_text("/a/b")));
+        index_writer.add_document(doc!(facet_field=>Facet::from_text("/a/b").unwrap()));
        index_writer.commit()?;
        let searcher = index.reader()?.searcher();
        let facet_reader = searcher
@@ -187,7 +187,7 @@ mod tests {
        let schema = schema_builder.build();
        let index = Index::create_in_ram(schema);
        let mut index_writer = index.writer_for_tests()?;
-        index_writer.add_document(doc!(facet_field=>Facet::from_text("/a/b")));
+        index_writer.add_document(doc!(facet_field=>Facet::from_text("/a/b").unwrap()));
        index_writer.add_document(Document::default());
        index_writer.commit()?;
        let searcher = index.reader()?.searcher();
--- a/src/fastfield/mod.rs
+++ b/src/fastfield/mod.rs
@@ -29,8 +29,11 @@ pub use self::delete::DeleteBitSet;
 pub use self::error::{FastFieldNotAvailableError, Result};
 pub use self::facet_reader::FacetReader;
 pub use self::multivalued::{MultiValuedFastFieldReader, MultiValuedFastFieldWriter};
+pub use self::reader::BitpackedFastFieldReader;
+pub use self::reader::DynamicFastFieldReader;
 pub use self::reader::FastFieldReader;
 pub use self::readers::FastFieldReaders;
+pub use self::serializer::CompositeFastFieldSerializer;
 pub use self::serializer::FastFieldSerializer;
 pub use self::writer::{FastFieldsWriter, IntFastFieldWriter};
 use crate::schema::Cardinality;
@@ -57,7 +60,7 @@ mod writer;
 pub trait MultiValueLength {
    /// returns the num of values associated to a doc_id
    fn get_len(&self, doc_id: DocId) -> u64;
-    /// returns the sum of num of all values for all doc_ids
+    /// returns the sum of num values for all doc_ids
    fn get_total_len(&self) -> u64;
 }

@@ -211,7 +214,7 @@ mod tests {
    use super::*;
    use crate::common::CompositeFile;
    use crate::directory::{Directory, RamDirectory, WritePtr};
-    use crate::fastfield::FastFieldReader;
+    use crate::fastfield::BitpackedFastFieldReader;
    use crate::merge_policy::NoMergePolicy;
    use crate::schema::Field;
    use crate::schema::Schema;
@@ -236,7 +239,7 @@ mod tests {

    #[test]
    pub fn test_fastfield() {
-        let test_fastfield = FastFieldReader::<u64>::from(vec![100, 200, 300]);
+        let test_fastfield = BitpackedFastFieldReader::<u64>::from(vec![100, 200, 300]);
        assert_eq!(test_fastfield.get(0), 100);
        assert_eq!(test_fastfield.get(1), 200);
        assert_eq!(test_fastfield.get(2), 300);
@@ -254,7 +257,7 @@ mod tests {
        let directory: RamDirectory = RamDirectory::create();
        {
            let write: WritePtr = directory.open_write(Path::new("test")).unwrap();
-            let mut serializer = FastFieldSerializer::from_write(write).unwrap();
+            let mut serializer = CompositeFastFieldSerializer::from_write(write).unwrap();
            let mut fast_field_writers = FastFieldsWriter::from_schema(&SCHEMA);
            fast_field_writers.add_document(&doc!(*FIELD=>13u64));
            fast_field_writers.add_document(&doc!(*FIELD=>14u64));
@@ -268,7 +271,7 @@ mod tests {
        assert_eq!(file.len(), 36 as usize);
        let composite_file = CompositeFile::open(&file)?;
        let file = composite_file.open_read(*FIELD).unwrap();
-        let fast_field_reader = FastFieldReader::<u64>::open(file)?;
+        let fast_field_reader = BitpackedFastFieldReader::<u64>::open(file)?;
        assert_eq!(fast_field_reader.get(0), 13u64);
        assert_eq!(fast_field_reader.get(1), 14u64);
        assert_eq!(fast_field_reader.get(2), 2u64);
@@ -281,7 +284,7 @@ mod tests {
        let directory: RamDirectory = RamDirectory::create();
        {
            let write: WritePtr = directory.open_write(Path::new("test"))?;
-            let mut serializer = FastFieldSerializer::from_write(write)?;
+            let mut serializer = CompositeFastFieldSerializer::from_write(write)?;
            let mut fast_field_writers = FastFieldsWriter::from_schema(&SCHEMA);
            fast_field_writers.add_document(&doc!(*FIELD=>4u64));
            fast_field_writers.add_document(&doc!(*FIELD=>14_082_001u64));
@@ -300,7 +303,7 @@ mod tests {
        {
            let fast_fields_composite = CompositeFile::open(&file)?;
            let data = fast_fields_composite.open_read(*FIELD).unwrap();
-            let fast_field_reader = FastFieldReader::<u64>::open(data)?;
+            let fast_field_reader = BitpackedFastFieldReader::<u64>::open(data)?;
            assert_eq!(fast_field_reader.get(0), 4u64);
            assert_eq!(fast_field_reader.get(1), 14_082_001u64);
            assert_eq!(fast_field_reader.get(2), 3_052u64);
@@ -321,7 +324,7 @@ mod tests {

        {
            let write: WritePtr = directory.open_write(Path::new("test")).unwrap();
-            let mut serializer = FastFieldSerializer::from_write(write).unwrap();
+            let mut serializer = CompositeFastFieldSerializer::from_write(write).unwrap();
            let mut fast_field_writers = FastFieldsWriter::from_schema(&SCHEMA);
            for _ in 0..10_000 {
                fast_field_writers.add_document(&doc!(*FIELD=>100_000u64));
@@ -336,7 +339,7 @@ mod tests {
        {
            let fast_fields_composite = CompositeFile::open(&file).unwrap();
            let data = fast_fields_composite.open_read(*FIELD).unwrap();
-            let fast_field_reader = FastFieldReader::<u64>::open(data)?;
+            let fast_field_reader = BitpackedFastFieldReader::<u64>::open(data)?;
            for doc in 0..10_000 {
                assert_eq!(fast_field_reader.get(doc), 100_000u64);
            }
@@ -351,7 +354,7 @@ mod tests {

        {
            let write: WritePtr = directory.open_write(Path::new("test")).unwrap();
-            let mut serializer = FastFieldSerializer::from_write(write).unwrap();
+            let mut serializer = CompositeFastFieldSerializer::from_write(write).unwrap();
            let mut fast_field_writers = FastFieldsWriter::from_schema(&SCHEMA);
            // forcing the amplitude to be high
            fast_field_writers.add_document(&doc!(*FIELD=>0u64));
@@ -368,7 +371,7 @@ mod tests {
        {
            let fast_fields_composite = CompositeFile::open(&file)?;
            let data = fast_fields_composite.open_read(*FIELD).unwrap();
-            let fast_field_reader = FastFieldReader::<u64>::open(data)?;
+            let fast_field_reader = BitpackedFastFieldReader::<u64>::open(data)?;
            assert_eq!(fast_field_reader.get(0), 0u64);
            for doc in 1..10_001 {
                assert_eq!(
@@ -390,7 +393,7 @@ mod tests {
        let schema = schema_builder.build();
        {
            let write: WritePtr = directory.open_write(Path::new("test")).unwrap();
-            let mut serializer = FastFieldSerializer::from_write(write).unwrap();
+            let mut serializer = CompositeFastFieldSerializer::from_write(write).unwrap();
            let mut fast_field_writers = FastFieldsWriter::from_schema(&schema);
            for i in -100i64..10_000i64 {
                let mut doc = Document::default();
@@ -407,7 +410,7 @@ mod tests {
        {
            let fast_fields_composite = CompositeFile::open(&file)?;
            let data = fast_fields_composite.open_read(i64_field).unwrap();
-            let fast_field_reader = FastFieldReader::<i64>::open(data)?;
+            let fast_field_reader = BitpackedFastFieldReader::<i64>::open(data)?;

            assert_eq!(fast_field_reader.min_value(), -100i64);
            assert_eq!(fast_field_reader.max_value(), 9_999i64);
@@ -433,7 +436,7 @@ mod tests {

        {
            let write: WritePtr = directory.open_write(Path::new("test")).unwrap();
-            let mut serializer = FastFieldSerializer::from_write(write).unwrap();
+            let mut serializer = CompositeFastFieldSerializer::from_write(write).unwrap();
            let mut fast_field_writers = FastFieldsWriter::from_schema(&schema);
            let doc = Document::default();
            fast_field_writers.add_document(&doc);
@@ -447,7 +450,7 @@ mod tests {
        {
            let fast_fields_composite = CompositeFile::open(&file).unwrap();
            let data = fast_fields_composite.open_read(i64_field).unwrap();
-            let fast_field_reader = FastFieldReader::<i64>::open(data)?;
+            let fast_field_reader = BitpackedFastFieldReader::<i64>::open(data)?;
            assert_eq!(fast_field_reader.get(0u32), 0i64);
        }
        Ok(())
@@ -468,7 +471,7 @@ mod tests {
        let directory = RamDirectory::create();
        {
            let write: WritePtr = directory.open_write(Path::new("test"))?;
-            let mut serializer = FastFieldSerializer::from_write(write)?;
+            let mut serializer = CompositeFastFieldSerializer::from_write(write)?;
            let mut fast_field_writers = FastFieldsWriter::from_schema(&SCHEMA);
            for &x in &permutation {
                fast_field_writers.add_document(&doc!(*FIELD=>x));
@@ -480,7 +483,7 @@ mod tests {
        {
            let fast_fields_composite = CompositeFile::open(&file)?;
            let data = fast_fields_composite.open_read(*FIELD).unwrap();
-            let fast_field_reader = FastFieldReader::<u64>::open(data)?;
+            let fast_field_reader = BitpackedFastFieldReader::<u64>::open(data)?;

            let mut a = 0u64;
            for _ in 0..n {
--- a/src/fastfield/multivalued/reader.rs
+++ b/src/fastfield/multivalued/reader.rs
@@ -1,6 +1,6 @@
 use std::ops::Range;

-use crate::fastfield::{FastFieldReader, FastValue, MultiValueLength};
+use crate::fastfield::{BitpackedFastFieldReader, FastFieldReader, FastValue, MultiValueLength};
 use crate::DocId;

 /// Reader for a multivalued `u64` fast field.
@@ -13,14 +13,14 @@ use crate::DocId;
 ///
 #[derive(Clone)]
 pub struct MultiValuedFastFieldReader<Item: FastValue> {
-    idx_reader: FastFieldReader<u64>,
-    vals_reader: FastFieldReader<Item>,
+    idx_reader: BitpackedFastFieldReader<u64>,
+    vals_reader: BitpackedFastFieldReader<Item>,
 }

 impl<Item: FastValue> MultiValuedFastFieldReader<Item> {
    pub(crate) fn open(
-        idx_reader: FastFieldReader<u64>,
-        vals_reader: FastFieldReader<Item>,
+        idx_reader: BitpackedFastFieldReader<u64>,
+        vals_reader: BitpackedFastFieldReader<Item>,
    ) -> MultiValuedFastFieldReader<Item> {
        MultiValuedFastFieldReader {
            idx_reader,
--- a/src/fastfield/multivalued/writer.rs
+++ b/src/fastfield/multivalued/writer.rs
@@ -1,5 +1,6 @@
-use crate::fastfield::serializer::FastSingleFieldSerializer;
-use crate::fastfield::FastFieldSerializer;
+use crate::fastfield::serializer::DynamicFastFieldSerializer;
+use crate::fastfield::serializer::FastFieldSerializer;
+use crate::fastfield::CompositeFastFieldSerializer;
 use crate::postings::UnorderedTermId;
 use crate::schema::{Document, Field};
 use crate::termdict::TermOrdinal;
@@ -134,7 +135,7 @@ impl MultiValuedFastFieldWriter {
    ///
    pub fn serialize(
        &self,
-        serializer: &mut FastFieldSerializer,
+        serializer: &mut CompositeFastFieldSerializer,
        mapping_opt: Option<&FnvHashMap<UnorderedTermId, TermOrdinal>>,
        doc_id_map: Option<&DocIdMapping>,
    ) -> io::Result<()> {
@@ -154,7 +155,7 @@ impl MultiValuedFastFieldWriter {
        }
        {
            // writing the values themselves.
-            let mut value_serializer: FastSingleFieldSerializer<'_, _>;
+            let mut value_serializer: DynamicFastFieldSerializer<'_, _>;
            match mapping_opt {
                Some(mapping) => {
                    value_serializer = serializer.new_u64_fast_field_with_idx(
--- a/src/fastfield/reader.rs
+++ b/src/fastfield/reader.rs
@@ -4,7 +4,7 @@ use crate::common::CompositeFile;
 use crate::directory::FileSlice;
 use crate::directory::OwnedBytes;
 use crate::directory::{Directory, RamDirectory, WritePtr};
-use crate::fastfield::{FastFieldSerializer, FastFieldsWriter};
+use crate::fastfield::{CompositeFastFieldSerializer, FastFieldsWriter};
 use crate::schema::Schema;
 use crate::schema::FAST;
 use crate::DocId;
@@ -14,12 +14,94 @@ use std::path::Path;
 use tantivy_bitpacker::compute_num_bits;
 use tantivy_bitpacker::BitUnpacker;

+/// FastFieldReader is the trait to access fast field data.
+pub trait FastFieldReader<Item: FastValue>: Clone {
+    /// Return the value associated to the given document.
+    ///
+    /// This accessor should return as fast as possible.
+    ///
+    /// # Panics
+    ///
+    /// May panic if `doc` is greater than the segment
+    fn get(&self, doc: DocId) -> Item;
+
+    /// Fills an output buffer with the fast field values
+    /// associated with the `DocId` going from
+    /// `start` to `start + output.len()`.
+    ///
+    /// Regardless of the type of `Item`, this method works
+    /// - transmuting the output array
+    /// - extracting the `Item`s as if they were `u64`
+    /// - possibly converting the `u64` value to the right type.
+    ///
+    /// # Panics
+    ///
+    /// May panic if `start + output.len()` is greater than
+    /// the segment's `maxdoc`.
+    fn get_range(&self, start: DocId, output: &mut [Item]);
+
+    /// Returns the minimum value for this fast field.
+    ///
+    /// The max value does not take in account of possible
+    /// deleted document, and should be considered as an upper bound
+    /// of the actual maximum value.
+    fn min_value(&self) -> Item;
+
+    /// Returns the maximum value for this fast field.
+    ///
+    /// The max value does not take in account of possible
+    /// deleted document, and should be considered as an upper bound
+    /// of the actual maximum value.
+    fn max_value(&self) -> Item;
+}
+
+#[derive(Clone)]
+/// DynamicFastFieldReader wraps different readers to access
+/// the various encoded fastfield data
+///
+pub enum DynamicFastFieldReader<Item: FastValue> {
+    /// Bitpacked compressed fastfield data.
+    Bitpacked(BitpackedFastFieldReader<Item>),
+}
+
+impl<Item: FastValue> DynamicFastFieldReader<Item> {
+    /// Returns correct the reader wrapped in the `DynamicFastFieldReader` enum for the data.
+    pub fn open(file: FileSlice) -> crate::Result<DynamicFastFieldReader<Item>> {
+        Ok(DynamicFastFieldReader::Bitpacked(
+            BitpackedFastFieldReader::open(file)?,
+        ))
+    }
+}
+
+impl<Item: FastValue> FastFieldReader<Item> for DynamicFastFieldReader<Item> {
+    fn get(&self, doc: DocId) -> Item {
+        match self {
+            Self::Bitpacked(reader) => reader.get(doc),
+        }
+    }
+    fn get_range(&self, start: DocId, output: &mut [Item]) {
+        match self {
+            Self::Bitpacked(reader) => reader.get_range(start, output),
+        }
+    }
+    fn min_value(&self) -> Item {
+        match self {
+            Self::Bitpacked(reader) => reader.min_value(),
+        }
+    }
+    fn max_value(&self) -> Item {
+        match self {
+            Self::Bitpacked(reader) => reader.max_value(),
+        }
+    }
+}
+
 /// Trait for accessing a fastfield.
 ///
 /// Depending on the field type, a different
 /// fast field is required.
 #[derive(Clone)]
-pub struct FastFieldReader<Item: FastValue> {
+pub struct BitpackedFastFieldReader<Item: FastValue> {
    bytes: OwnedBytes,
    bit_unpacker: BitUnpacker,
    min_value_u64: u64,
@@ -27,7 +109,7 @@ pub struct FastFieldReader<Item: FastValue> {
    _phantom: PhantomData<Item>,
 }

-impl<Item: FastValue> FastFieldReader<Item> {
+impl<Item: FastValue> BitpackedFastFieldReader<Item> {
    /// Opens a fast field given a file.
    pub fn open(file: FileSlice) -> crate::Result<Self> {
        let mut bytes = file.read_bytes()?;
@@ -36,7 +118,7 @@ impl<Item: FastValue> FastFieldReader<Item> {
        let max_value = min_value + amplitude;
        let num_bits = compute_num_bits(amplitude);
        let bit_unpacker = BitUnpacker::new(num_bits);
-        Ok(FastFieldReader {
+        Ok(BitpackedFastFieldReader {
            bytes,
            min_value_u64: min_value,
            max_value_u64: max_value,
@@ -44,19 +126,6 @@ impl<Item: FastValue> FastFieldReader<Item> {
            _phantom: PhantomData,
        })
    }
-
-    /// Return the value associated to the given document.
-    ///
-    /// This accessor should return as fast as possible.
-    ///
-    /// # Panics
-    ///
-    /// May panic if `doc` is greater than the segment
-    // `maxdoc`.
-    pub fn get(&self, doc: DocId) -> Item {
-        self.get_u64(u64::from(doc))
-    }
-
    pub(crate) fn get_u64(&self, doc: u64) -> Item {
        Item::from_u64(self.min_value_u64 + self.bit_unpacker.get(doc, &self.bytes))
    }
@@ -78,6 +147,20 @@ impl<Item: FastValue> FastFieldReader<Item> {
            *out = self.get_u64(start + (i as u64));
        }
    }
+}
+
+impl<Item: FastValue> FastFieldReader<Item> for BitpackedFastFieldReader<Item> {
+    /// Return the value associated to the given document.
+    ///
+    /// This accessor should return as fast as possible.
+    ///
+    /// # Panics
+    ///
+    /// May panic if `doc` is greater than the segment
+    // `maxdoc`.
+    fn get(&self, doc: DocId) -> Item {
+        self.get_u64(u64::from(doc))
+    }

    /// Fills an output buffer with the fast field values
    /// associated with the `DocId` going from
@@ -92,7 +175,7 @@ impl<Item: FastValue> FastFieldReader<Item> {
    ///
    /// May panic if `start + output.len()` is greater than
    /// the segment's `maxdoc`.
-    pub fn get_range(&self, start: DocId, output: &mut [Item]) {
+    fn get_range(&self, start: DocId, output: &mut [Item]) {
        self.get_range_u64(u64::from(start), output);
    }

@@ -101,7 +184,7 @@ impl<Item: FastValue> FastFieldReader<Item> {
    /// The max value does not take in account of possible
    /// deleted document, and should be considered as an upper bound
    /// of the actual maximum value.
-    pub fn min_value(&self) -> Item {
+    fn min_value(&self) -> Item {
        Item::from_u64(self.min_value_u64)
    }

@@ -110,13 +193,13 @@ impl<Item: FastValue> FastFieldReader<Item> {
    /// The max value does not take in account of possible
    /// deleted document, and should be considered as an upper bound
    /// of the actual maximum value.
-    pub fn max_value(&self) -> Item {
+    fn max_value(&self) -> Item {
        Item::from_u64(self.max_value_u64)
    }
 }

-impl<Item: FastValue> From<Vec<Item>> for FastFieldReader<Item> {
-    fn from(vals: Vec<Item>) -> FastFieldReader<Item> {
+impl<Item: FastValue> From<Vec<Item>> for BitpackedFastFieldReader<Item> {
+    fn from(vals: Vec<Item>) -> BitpackedFastFieldReader<Item> {
        let mut schema_builder = Schema::builder();
        let field = schema_builder.add_u64_field("field", FAST);
        let schema = schema_builder.build();
@@ -126,7 +209,7 @@ impl<Item: FastValue> From<Vec<Item>> for FastFieldReader<Item> {
            let write: WritePtr = directory
                .open_write(path)
                .expect("With a RamDirectory, this should never fail.");
-            let mut serializer = FastFieldSerializer::from_write(write)
+            let mut serializer = CompositeFastFieldSerializer::from_write(write)
                .expect("With a RamDirectory, this should never fail.");
            let mut fast_field_writers = FastFieldsWriter::from_schema(&schema);
            {
@@ -148,6 +231,6 @@ impl<Item: FastValue> From<Vec<Item>> for FastFieldReader<Item> {
        let field_file = composite_file
            .open_read(field)
            .expect("File component not found");
-        FastFieldReader::open(field_file).unwrap()
+        BitpackedFastFieldReader::open(field_file).unwrap()
    }
 }
--- a/src/fastfield/readers.rs
+++ b/src/fastfield/readers.rs
@@ -1,13 +1,16 @@
 use crate::common::CompositeFile;
 use crate::directory::FileSlice;
 use crate::fastfield::MultiValuedFastFieldReader;
+use crate::fastfield::{BitpackedFastFieldReader, FastFieldNotAvailableError};
 use crate::fastfield::{BytesFastFieldReader, FastValue};
-use crate::fastfield::{FastFieldNotAvailableError, FastFieldReader};
 use crate::schema::{Cardinality, Field, FieldType, Schema};
 use crate::space_usage::PerFieldSpaceUsage;
 use crate::TantivyError;

-/// Provides access to all of the FastFieldReader.
+use super::reader::DynamicFastFieldReader;
+use super::FastFieldReader;
+
+/// Provides access to all of the BitpackedFastFieldReader.
 ///
 /// Internally, `FastFieldReaders` have preloaded fast field readers,
 /// and just wraps several `HashMap`.
@@ -100,9 +103,9 @@ impl FastFieldReaders {
    pub(crate) fn typed_fast_field_reader<TFastValue: FastValue>(
        &self,
        field: Field,
-    ) -> crate::Result<FastFieldReader<TFastValue>> {
+    ) -> crate::Result<DynamicFastFieldReader<TFastValue>> {
        let fast_field_slice = self.fast_field_data(field, 0)?;
-        FastFieldReader::open(fast_field_slice)
+        DynamicFastFieldReader::open(fast_field_slice)
    }

    pub(crate) fn typed_fast_field_multi_reader<TFastValue: FastValue>(
@@ -111,16 +114,16 @@ impl FastFieldReaders {
    ) -> crate::Result<MultiValuedFastFieldReader<TFastValue>> {
        let fast_field_slice_idx = self.fast_field_data(field, 0)?;
        let fast_field_slice_vals = self.fast_field_data(field, 1)?;
-        let idx_reader = FastFieldReader::open(fast_field_slice_idx)?;
-        let vals_reader: FastFieldReader<TFastValue> =
-            FastFieldReader::open(fast_field_slice_vals)?;
+        let idx_reader = BitpackedFastFieldReader::open(fast_field_slice_idx)?;
+        let vals_reader: BitpackedFastFieldReader<TFastValue> =
+            BitpackedFastFieldReader::open(fast_field_slice_vals)?;
        Ok(MultiValuedFastFieldReader::open(idx_reader, vals_reader))
    }

    /// Returns the `u64` fast field reader reader associated to `field`.
    ///
    /// If `field` is not a u64 fast field, this method returns an Error.
-    pub fn u64(&self, field: Field) -> crate::Result<FastFieldReader<u64>> {
+    pub fn u64(&self, field: Field) -> crate::Result<DynamicFastFieldReader<u64>> {
        self.check_type(field, FastType::U64, Cardinality::SingleValue)?;
        self.typed_fast_field_reader(field)
    }
@@ -129,14 +132,14 @@ impl FastFieldReaders {
    /// field is effectively of type `u64` or not.
    ///
    /// If not, the fastfield reader will returns the u64-value associated to the original FastValue.
-    pub fn u64_lenient(&self, field: Field) -> crate::Result<FastFieldReader<u64>> {
+    pub fn u64_lenient(&self, field: Field) -> crate::Result<DynamicFastFieldReader<u64>> {
        self.typed_fast_field_reader(field)
    }

    /// Returns the `i64` fast field reader reader associated to `field`.
    ///
    /// If `field` is not a i64 fast field, this method returns an Error.
-    pub fn i64(&self, field: Field) -> crate::Result<FastFieldReader<i64>> {
+    pub fn i64(&self, field: Field) -> crate::Result<impl FastFieldReader<i64>> {
        self.check_type(field, FastType::I64, Cardinality::SingleValue)?;
        self.typed_fast_field_reader(field)
    }
@@ -144,7 +147,7 @@ impl FastFieldReaders {
    /// Returns the `i64` fast field reader reader associated to `field`.
    ///
    /// If `field` is not a i64 fast field, this method returns an Error.
-    pub fn date(&self, field: Field) -> crate::Result<FastFieldReader<crate::DateTime>> {
+    pub fn date(&self, field: Field) -> crate::Result<impl FastFieldReader<crate::DateTime>> {
        self.check_type(field, FastType::Date, Cardinality::SingleValue)?;
        self.typed_fast_field_reader(field)
    }
@@ -152,7 +155,7 @@ impl FastFieldReaders {
    /// Returns the `f64` fast field reader reader associated to `field`.
    ///
    /// If `field` is not a f64 fast field, this method returns an Error.
-    pub fn f64(&self, field: Field) -> crate::Result<FastFieldReader<f64>> {
+    pub fn f64(&self, field: Field) -> crate::Result<impl FastFieldReader<f64>> {
        self.check_type(field, FastType::F64, Cardinality::SingleValue)?;
        self.typed_fast_field_reader(field)
    }
@@ -213,7 +216,7 @@ impl FastFieldReaders {
                )));
            }
            let fast_field_idx_file = self.fast_field_data(field, 0)?;
-            let idx_reader = FastFieldReader::open(fast_field_idx_file)?;
+            let idx_reader = BitpackedFastFieldReader::open(fast_field_idx_file)?;
            let data = self.fast_field_data(field, 1)?;
            BytesFastFieldReader::open(idx_reader, data)
        } else {
--- a/src/fastfield/serializer.rs
+++ b/src/fastfield/serializer.rs
@@ -7,10 +7,10 @@ use std::io::{self, Write};
 use tantivy_bitpacker::compute_num_bits;
 use tantivy_bitpacker::BitPacker;

-/// `FastFieldSerializer` is in charge of serializing
+/// `CompositeFastFieldSerializer` is in charge of serializing
 /// fastfields on disk.
 ///
-/// Fast fields are encoded using bit-packing.
+/// Fast fields have differnt encodings like bit-packing.
 ///
 /// `FastFieldWriter`s are in charge of pushing the data to
 /// the serializer.
@@ -27,16 +27,16 @@ use tantivy_bitpacker::BitPacker;
 /// * ...
 /// * `close_field()`
 /// * `close()`
-pub struct FastFieldSerializer {
+pub struct CompositeFastFieldSerializer {
    composite_write: CompositeWrite<WritePtr>,
 }

-impl FastFieldSerializer {
+impl CompositeFastFieldSerializer {
    /// Constructor
-    pub fn from_write(write: WritePtr) -> io::Result<FastFieldSerializer> {
+    pub fn from_write(write: WritePtr) -> io::Result<CompositeFastFieldSerializer> {
        // just making room for the pointer to header.
        let composite_write = CompositeWrite::wrap(write);
-        Ok(FastFieldSerializer { composite_write })
+        Ok(CompositeFastFieldSerializer { composite_write })
    }

    /// Start serializing a new u64 fast field
@@ -45,7 +45,7 @@ impl FastFieldSerializer {
        field: Field,
        min_value: u64,
        max_value: u64,
-    ) -> io::Result<FastSingleFieldSerializer<'_, CountingWriter<WritePtr>>> {
+    ) -> io::Result<DynamicFastFieldSerializer<'_, CountingWriter<WritePtr>>> {
        self.new_u64_fast_field_with_idx(field, min_value, max_value, 0)
    }

@@ -56,9 +56,9 @@ impl FastFieldSerializer {
        min_value: u64,
        max_value: u64,
        idx: usize,
-    ) -> io::Result<FastSingleFieldSerializer<'_, CountingWriter<WritePtr>>> {
+    ) -> io::Result<DynamicFastFieldSerializer<'_, CountingWriter<WritePtr>>> {
        let field_write = self.composite_write.for_field_with_idx(field, idx);
-        FastSingleFieldSerializer::open(field_write, min_value, max_value)
+        DynamicFastFieldSerializer::open(field_write, min_value, max_value)
    }

    /// Start serializing a new [u8] fast field
@@ -79,14 +79,111 @@ impl FastFieldSerializer {
    }
 }

-pub struct FastSingleFieldSerializer<'a, W: Write> {
+#[derive(Debug, Clone)]
+pub struct EstimationStats {
+    min_value: u64,
+    max_value: u64,
+}
+/// The FastFieldSerializer trait is the common interface
+/// implemented by every fastfield serializer variant.
+///
+/// `DynamicFastFieldSerializer` is the enum wrapping all variants.
+/// It is used to create an serializer instance.
+pub trait FastFieldSerializer {
+    /// add value to serializer
+    fn add_val(&mut self, val: u64) -> io::Result<()>;
+    /// finish serializing a field.
+    fn close_field(self) -> io::Result<()>;
+}
+
+/// The FastFieldSerializerEstimate trait is required on all variants
+/// of fast field compressions, to decide which one to choose.
+pub trait FastFieldSerializerEstimate {
+    /// returns an estimate of the compression ratio.
+    fn estimate(
+        /*fastfield_accessor: impl FastFieldReader<u64>,*/ stats: EstimationStats,
+    ) -> (f32, &'static str);
+    /// the unique name of the compressor
+    fn name() -> &'static str;
+}
+
+pub enum DynamicFastFieldSerializer<'a, W: Write> {
+    Bitpacked(BitpackedFastFieldSerializer<'a, W>),
+}
+
+impl<'a, W: Write> DynamicFastFieldSerializer<'a, W> {
+    /// Creates a new fast field serializer.
+    ///
+    /// The serializer in fact encode the values by bitpacking
+    /// `(val - min_value)`.
+    ///
+    /// It requires a `min_value` and a `max_value` to compute
+    /// compute the minimum number of bits required to encode
+    /// values.
+    pub fn open(
+        write: &'a mut W,
+        min_value: u64,
+        max_value: u64,
+    ) -> io::Result<DynamicFastFieldSerializer<'a, W>> {
+        let stats = EstimationStats {
+            min_value,
+            max_value,
+        };
+        let (_ratio, name) = (
+            BitpackedFastFieldSerializer::<Vec<u8>>::estimate(stats),
+            BitpackedFastFieldSerializer::<Vec<u8>>::name(),
+        );
+        Self::open_from_name(write, min_value, max_value, name)
+    }
+
+    /// Creates a new fast field serializer.
+    ///
+    /// The serializer in fact encode the values by bitpacking
+    /// `(val - min_value)`.
+    ///
+    /// It requires a `min_value` and a `max_value` to compute
+    /// compute the minimum number of bits required to encode
+    /// values.
+    pub fn open_from_name(
+        write: &'a mut W,
+        min_value: u64,
+        max_value: u64,
+        name: &str,
+    ) -> io::Result<DynamicFastFieldSerializer<'a, W>> {
+        // Weirdly the W generic on BitpackedFastFieldSerializer needs to be set,
+        // although name() doesn't use it
+        let variant = if name == BitpackedFastFieldSerializer::<Vec<u8>>::name() {
+            DynamicFastFieldSerializer::Bitpacked(BitpackedFastFieldSerializer::open(
+                write, min_value, max_value,
+            )?)
+        } else {
+            panic!("unknown fastfield serializer {}", name);
+        };
+
+        Ok(variant)
+    }
+}
+impl<'a, W: Write> FastFieldSerializer for DynamicFastFieldSerializer<'a, W> {
+    fn add_val(&mut self, val: u64) -> io::Result<()> {
+        match self {
+            Self::Bitpacked(serializer) => serializer.add_val(val),
+        }
+    }
+    fn close_field(self) -> io::Result<()> {
+        match self {
+            Self::Bitpacked(serializer) => serializer.close_field(),
+        }
+    }
+}
+
+pub struct BitpackedFastFieldSerializer<'a, W: Write> {
    bit_packer: BitPacker,
    write: &'a mut W,
    min_value: u64,
    num_bits: u8,
 }

-impl<'a, W: Write> FastSingleFieldSerializer<'a, W> {
+impl<'a, W: Write> BitpackedFastFieldSerializer<'a, W> {
    /// Creates a new fast field serializer.
    ///
    /// The serializer in fact encode the values by bitpacking
@@ -99,34 +196,51 @@ impl<'a, W: Write> FastSingleFieldSerializer<'a, W> {
        write: &'a mut W,
        min_value: u64,
        max_value: u64,
-    ) -> io::Result<FastSingleFieldSerializer<'a, W>> {
+    ) -> io::Result<BitpackedFastFieldSerializer<'a, W>> {
        assert!(min_value <= max_value);
        min_value.serialize(write)?;
        let amplitude = max_value - min_value;
        amplitude.serialize(write)?;
        let num_bits = compute_num_bits(amplitude);
        let bit_packer = BitPacker::new();
-        Ok(FastSingleFieldSerializer {
+        Ok(BitpackedFastFieldSerializer {
            bit_packer,
            write,
            min_value,
            num_bits,
        })
    }
+}

+impl<'a, W: 'a + Write> FastFieldSerializer for BitpackedFastFieldSerializer<'a, W> {
    /// Pushes a new value to the currently open u64 fast field.
-    pub fn add_val(&mut self, val: u64) -> io::Result<()> {
+    fn add_val(&mut self, val: u64) -> io::Result<()> {
        let val_to_write: u64 = val - self.min_value;
        self.bit_packer
            .write(val_to_write, self.num_bits, &mut self.write)?;
        Ok(())
    }
-
-    pub fn close_field(mut self) -> io::Result<()> {
+    fn close_field(mut self) -> io::Result<()> {
        self.bit_packer.close(&mut self.write)
    }
 }

+impl<'a, W: 'a + Write> FastFieldSerializerEstimate for BitpackedFastFieldSerializer<'a, W> {
+    fn estimate(
+        /*_fastfield_accessor: impl FastFieldReader<u64>, */ stats: EstimationStats,
+    ) -> (f32, &'static str) {
+        let amplitude = stats.max_value - stats.min_value;
+        let num_bits = compute_num_bits(amplitude);
+        let num_bits_uncompressed = 64;
+        let ratio = num_bits as f32 / num_bits_uncompressed as f32;
+        let name = Self::name();
+        (ratio, name)
+    }
+    fn name() -> &'static str {
+        "Bitpacked"
+    }
+}
+
 pub struct FastBytesFieldSerializer<'a, W: Write> {
    write: &'a mut W,
 }
--- a/src/fastfield/writer.rs
+++ b/src/fastfield/writer.rs
@@ -1,6 +1,7 @@
 use super::multivalued::MultiValuedFastFieldWriter;
 use crate::common;
-use crate::fastfield::{BytesFastFieldWriter, FastFieldSerializer};
+use crate::fastfield::serializer::FastFieldSerializer;
+use crate::fastfield::{BytesFastFieldWriter, CompositeFastFieldSerializer};
 use crate::indexer::doc_id_mapping::DocIdMapping;
 use crate::postings::UnorderedTermId;
 use crate::schema::{Cardinality, Document, Field, FieldEntry, FieldType, Schema};
@@ -148,7 +149,7 @@ impl FastFieldsWriter {
    /// order to the fast field serializer.
    pub fn serialize(
        &self,
-        serializer: &mut FastFieldSerializer,
+        serializer: &mut CompositeFastFieldSerializer,
        mapping: &HashMap<Field, FnvHashMap<UnorderedTermId, TermOrdinal>>,
        doc_id_map: Option<&DocIdMapping>,
    ) -> io::Result<()> {
@@ -272,7 +273,7 @@ impl IntFastFieldWriter {
    /// Push the fast fields value to the `FastFieldWriter`.
    pub fn serialize(
        &self,
-        serializer: &mut FastFieldSerializer,
+        serializer: &mut CompositeFastFieldSerializer,
        doc_id_map: Option<&DocIdMapping>,
    ) -> io::Result<()> {
        let (min, max) = if self.val_min > self.val_max {
--- a/src/indexer/doc_id_mapping.rs
+++ b/src/indexer/doc_id_mapping.rs
@@ -8,7 +8,6 @@ use crate::{
    DocId, IndexSortByField, Order, TantivyError,
 };
 use std::cmp::Reverse;
-
 /// Struct to provide mapping from old doc_id to new doc_id and vice versa
 pub struct DocIdMapping {
    new_doc_id_to_old: Vec<DocId>,
@@ -92,6 +91,7 @@ pub(crate) fn get_doc_id_mapping_from_field(

 #[cfg(test)]
 mod tests_indexsorting {
+    use crate::fastfield::FastFieldReader;
    use crate::{collector::TopDocs, query::QueryParser, schema::*};
    use crate::{schema::Schema, DocAddress};
    use crate::{Index, IndexSettings, IndexSortByField, Order};
@@ -175,6 +175,7 @@ mod tests_indexsorting {
                        field: "my_number".to_string(),
                        order: Order::Asc,
                    }),
+                    ..Default::default()
                }),
                option.clone(),
            );
@@ -206,6 +207,7 @@ mod tests_indexsorting {
                        field: "my_number".to_string(),
                        order: Order::Desc,
                    }),
+                    ..Default::default()
                }),
                option.clone(),
            );
@@ -264,6 +266,7 @@ mod tests_indexsorting {
                    field: "my_number".to_string(),
                    order: Order::Asc,
                }),
+                ..Default::default()
            }),
            get_text_options(),
        );
@@ -288,6 +291,7 @@ mod tests_indexsorting {
                    field: "my_number".to_string(),
                    order: Order::Desc,
                }),
+                ..Default::default()
            }),
            get_text_options(),
        );
@@ -322,6 +326,7 @@ mod tests_indexsorting {
                    field: "my_number".to_string(),
                    order: Order::Asc,
                }),
+                ..Default::default()
            }),
            get_text_options(),
        );
@@ -352,6 +357,7 @@ mod tests_indexsorting {
                    field: "my_number".to_string(),
                    order: Order::Desc,
                }),
+                ..Default::default()
            }),
            get_text_options(),
        );
@@ -387,6 +393,7 @@ mod tests_indexsorting {
                    field: "my_number".to_string(),
                    order: Order::Asc,
                }),
+                ..Default::default()
            }),
            get_text_options(),
        );
--- a/src/indexer/index_writer.rs
+++ b/src/indexer/index_writer.rs
@@ -945,7 +945,7 @@ mod tests {
        let index_writer = index.writer(3_000_000).unwrap();
        assert_eq!(
            format!("{:?}", index_writer.get_merge_policy()),
-            "LogMergePolicy { min_merge_size: 8, max_merge_size: 10000000, min_layer_size: 10000, \
+            "LogMergePolicy { min_num_segments: 8, max_docs_before_merge: 10000000, min_layer_size: 10000, \
             level_log_size: 0.75 }"
        );
        let merge_policy = Box::new(NoMergePolicy::default());
--- a/src/indexer/log_merge_policy.rs
+++ b/src/indexer/log_merge_policy.rs
@@ -1,19 +1,20 @@
 use super::merge_policy::{MergeCandidate, MergePolicy};
 use crate::core::SegmentMeta;
+use itertools::Itertools;
 use std::cmp;
 use std::f64;

 const DEFAULT_LEVEL_LOG_SIZE: f64 = 0.75;
 const DEFAULT_MIN_LAYER_SIZE: u32 = 10_000;
-const DEFAULT_MIN_MERGE_SIZE: usize = 8;
-const DEFAULT_MAX_MERGE_SIZE: usize = 10_000_000;
+const DEFAULT_MIN_NUM_SEGMENTS_IN_MERGE: usize = 8;
+const DEFAULT_MAX_DOCS_BEFORE_MERGE: usize = 10_000_000;

 /// `LogMergePolicy` tries to merge segments that have a similar number of
 /// documents.
 #[derive(Debug, Clone)]
 pub struct LogMergePolicy {
-    min_merge_size: usize,
-    max_merge_size: usize,
+    min_num_segments: usize,
+    max_docs_before_merge: usize,
    min_layer_size: u32,
    level_log_size: f64,
 }
@@ -23,15 +24,16 @@ impl LogMergePolicy {
        cmp::max(self.min_layer_size, size)
    }

-    /// Set the minimum number of segment that may be merge together.
-    pub fn set_min_merge_size(&mut self, min_merge_size: usize) {
-        self.min_merge_size = min_merge_size;
+    /// Set the minimum number of segments that may be merged together.
+    pub fn set_min_num_segments(&mut self, min_num_segments: usize) {
+        self.min_num_segments = min_num_segments;
    }

    /// Set the maximum number docs in a segment for it to be considered for
-    /// merging.
-    pub fn set_max_merge_size(&mut self, max_merge_size: usize) {
-        self.max_merge_size = max_merge_size;
+    /// merging. A segment can still reach more than max_docs, by merging many
+    /// smaller ones.
+    pub fn set_max_docs_before_merge(&mut self, max_docs_merge_size: usize) {
+        self.max_docs_before_merge = max_docs_merge_size;
    }

    /// Set the minimum segment size under which all segment belong
@@ -42,7 +44,7 @@ impl LogMergePolicy {

    /// Set the ratio between two consecutive levels.
    ///
-    /// Segment are group in levels according to their sizes.
+    /// Segments are grouped in levels according to their sizes.
    /// These levels are defined as intervals of exponentially growing sizes.
    /// level_log_size define the factor by which one should multiply the limit
    /// to reach a level, in order to get the limit to reach the following
@@ -54,52 +56,43 @@ impl LogMergePolicy {

 impl MergePolicy for LogMergePolicy {
    fn compute_merge_candidates(&self, segments: &[SegmentMeta]) -> Vec<MergeCandidate> {
-        let mut size_sorted_tuples = segments
+        let mut size_sorted_segments = segments
            .iter()
-            .map(SegmentMeta::num_docs)
-            .enumerate()
-            .filter(|(_, s)| s <= &(self.max_merge_size as u32))
-            .collect::<Vec<(usize, u32)>>();
+            .filter(|segment_meta| segment_meta.num_docs() <= (self.max_docs_before_merge as u32))
+            .collect::<Vec<&SegmentMeta>>();

-        size_sorted_tuples.sort_by(|x, y| y.1.cmp(&(x.1)));
-
-        if size_sorted_tuples.len() <= 1 {
-            return Vec::new();
-        }
-
-        let size_sorted_log_tuples: Vec<_> = size_sorted_tuples
-            .into_iter()
-            .map(|(ind, num_docs)| (ind, f64::from(self.clip_min_size(num_docs)).log2()))
-            .collect();
-
-        if let Some(&(first_ind, first_score)) = size_sorted_log_tuples.first() {
-            let mut current_max_log_size = first_score;
-            let mut levels = vec![vec![first_ind]];
-            for &(ind, score) in (&size_sorted_log_tuples).iter().skip(1) {
-                if score < (current_max_log_size - self.level_log_size) {
-                    current_max_log_size = score;
-                    levels.push(Vec::new());
-                }
-                levels.last_mut().unwrap().push(ind);
-            }
-            levels
-                .iter()
-                .filter(|level| level.len() >= self.min_merge_size)
-                .map(|ind_vec| {
-                    MergeCandidate(ind_vec.iter().map(|&ind| segments[ind].id()).collect())
-                })
-                .collect()
-        } else {
+        if size_sorted_segments.len() <= 1 {
            return vec![];
        }
+        size_sorted_segments.sort_by_key(|seg| std::cmp::Reverse(seg.num_docs()));
+
+        let mut current_max_log_size = f64::MAX;
+        let mut levels = vec![];
+        for (_, merge_group) in &size_sorted_segments.into_iter().group_by(|segment| {
+            let segment_log_size = f64::from(self.clip_min_size(segment.num_docs())).log2();
+            if segment_log_size < (current_max_log_size - self.level_log_size) {
+                // update current_max_log_size to create a new group
+                current_max_log_size = segment_log_size;
+            }
+            // return current_max_log_size to be grouped to the current group
+            current_max_log_size
+        }) {
+            levels.push(merge_group.collect::<Vec<&SegmentMeta>>());
+        }
+
+        levels
+            .iter()
+            .filter(|level| level.len() >= self.min_num_segments)
+            .map(|segments| MergeCandidate(segments.iter().map(|&seg| seg.id()).collect()))
+            .collect()
    }
 }

 impl Default for LogMergePolicy {
    fn default() -> LogMergePolicy {
        LogMergePolicy {
-            min_merge_size: DEFAULT_MIN_MERGE_SIZE,
-            max_merge_size: DEFAULT_MAX_MERGE_SIZE,
+            min_num_segments: DEFAULT_MIN_NUM_SEGMENTS_IN_MERGE,
+            max_docs_before_merge: DEFAULT_MAX_DOCS_BEFORE_MERGE,
            min_layer_size: DEFAULT_MIN_LAYER_SIZE,
            level_log_size: DEFAULT_LEVEL_LOG_SIZE,
        }
@@ -109,16 +102,79 @@ impl Default for LogMergePolicy {
 #[cfg(test)]
 mod tests {
    use super::*;
-    use crate::core::{SegmentId, SegmentMeta, SegmentMetaInventory};
-    use crate::indexer::merge_policy::MergePolicy;
+    use crate::{
+        core::{SegmentId, SegmentMeta, SegmentMetaInventory},
+        schema,
+    };
+    use crate::{indexer::merge_policy::MergePolicy, schema::INDEXED};
    use once_cell::sync::Lazy;

    static INVENTORY: Lazy<SegmentMetaInventory> = Lazy::new(SegmentMetaInventory::default);

+    use crate::Index;
+
+    #[test]
+    fn create_index_test_max_merge_issue_1035() {
+        let mut schema_builder = schema::Schema::builder();
+        let int_field = schema_builder.add_u64_field("intval", INDEXED);
+        let schema = schema_builder.build();
+
+        let index = Index::create_in_ram(schema);
+
+        {
+            let mut log_merge_policy = LogMergePolicy::default();
+            log_merge_policy.set_min_num_segments(1);
+            log_merge_policy.set_max_docs_before_merge(1);
+            log_merge_policy.set_min_layer_size(0);
+
+            let mut index_writer = index.writer_for_tests().unwrap();
+            index_writer.set_merge_policy(Box::new(log_merge_policy));
+
+            // after every commit the merge checker is started, it will merge only segments with 1
+            // element in it because of the max_merge_size.
+            index_writer.add_document(doc!(int_field=>1_u64));
+            assert!(index_writer.commit().is_ok());
+
+            index_writer.add_document(doc!(int_field=>2_u64));
+            assert!(index_writer.commit().is_ok());
+
+            index_writer.add_document(doc!(int_field=>3_u64));
+            assert!(index_writer.commit().is_ok());
+
+            index_writer.add_document(doc!(int_field=>4_u64));
+            assert!(index_writer.commit().is_ok());
+
+            index_writer.add_document(doc!(int_field=>5_u64));
+            assert!(index_writer.commit().is_ok());
+
+            index_writer.add_document(doc!(int_field=>6_u64));
+            assert!(index_writer.commit().is_ok());
+
+            index_writer.add_document(doc!(int_field=>7_u64));
+            assert!(index_writer.commit().is_ok());
+
+            index_writer.add_document(doc!(int_field=>8_u64));
+            assert!(index_writer.commit().is_ok());
+        }
+
+        let _segment_ids = index
+            .searchable_segment_ids()
+            .expect("Searchable segments failed.");
+
+        let reader = index.reader().unwrap();
+        let searcher = reader.searcher();
+        let segment_readers = searcher.segment_readers();
+        for segment in segment_readers {
+            if segment.num_docs() > 2 {
+                panic!("segment can't have more than two segments");
+            } // don't know how to wait for the merge, then it could be a simple eq
+        }
+    }
+
    fn test_merge_policy() -> LogMergePolicy {
        let mut log_merge_policy = LogMergePolicy::default();
-        log_merge_policy.set_min_merge_size(3);
-        log_merge_policy.set_max_merge_size(100_000);
+        log_merge_policy.set_min_num_segments(3);
+        log_merge_policy.set_max_docs_before_merge(100_000);
        log_merge_policy.set_min_layer_size(2);
        log_merge_policy
    }
--- a/src/indexer/merger.rs
+++ b/src/indexer/merger.rs
@@ -1,6 +1,8 @@
 use super::doc_id_mapping::DocIdMapping;
 use crate::error::DataCorruption;
+use crate::fastfield::CompositeFastFieldSerializer;
 use crate::fastfield::DeleteBitSet;
+use crate::fastfield::DynamicFastFieldReader;
 use crate::fastfield::FastFieldReader;
 use crate::fastfield::FastFieldSerializer;
 use crate::fastfield::MultiValuedFastFieldReader;
@@ -87,7 +89,7 @@ pub struct IndexMerger {
 }

 fn compute_min_max_val(
-    u64_reader: &FastFieldReader<u64>,
+    u64_reader: &impl FastFieldReader<u64>,
    max_doc: DocId,
    delete_bitset_opt: Option<&DeleteBitSet>,
 ) -> Option<(u64, u64)> {
@@ -183,6 +185,10 @@ impl IndexMerger {
                readers.push(reader);
            }
        }
+        if let Some(sort_by_field) = index_settings.sort_by_field.as_ref() {
+            readers = Self::sort_readers_by_min_sort_field(readers, sort_by_field)?;
+        }
+        // sort segments by their natural sort setting
        if max_doc >= MAX_DOC_LIMIT {
            let err_msg = format!(
                "The segment resulting from this merge would have {} docs,\
@@ -192,13 +198,37 @@ impl IndexMerger {
            return Err(crate::TantivyError::InvalidArgument(err_msg));
        }
        Ok(IndexMerger {
-            schema,
            index_settings,
+            schema,
            readers,
            max_doc,
        })
    }

+    fn sort_readers_by_min_sort_field(
+        readers: Vec<SegmentReader>,
+        sort_by_field: &IndexSortByField,
+    ) -> crate::Result<Vec<SegmentReader>> {
+        // presort the readers by their min_values, so that when they are disjunct, we can use
+        // the regular merge logic (implicitly sorted)
+        let mut readers_with_min_sort_values = readers
+            .into_iter()
+            .map(|reader| {
+                let accessor = Self::get_sort_field_accessor(&reader, &sort_by_field)?;
+                Ok((reader, accessor.min_value()))
+            })
+            .collect::<crate::Result<Vec<_>>>()?;
+        if sort_by_field.order.is_asc() {
+            readers_with_min_sort_values.sort_by_key(|(_, min_val)| *min_val);
+        } else {
+            readers_with_min_sort_values.sort_by_key(|(_, min_val)| std::cmp::Reverse(*min_val));
+        }
+        Ok(readers_with_min_sort_values
+            .into_iter()
+            .map(|(reader, _)| reader)
+            .collect())
+    }
+
    fn write_fieldnorms(
        &self,
        mut fieldnorms_serializer: FieldNormsSerializer,
@@ -209,9 +239,14 @@ impl IndexMerger {
        for field in fields {
            fieldnorms_data.clear();
            if let Some(doc_id_mapping) = doc_id_mapping {
+                let fieldnorms_readers: Vec<FieldNormReader> = self
+                    .readers
+                    .iter()
+                    .map(|reader| reader.get_fieldnorms_reader(field))
+                    .collect::<Result<_, _>>()?;
                for (doc_id, reader_with_ordinal) in doc_id_mapping {
                    let fieldnorms_reader =
-                        reader_with_ordinal.reader.get_fieldnorms_reader(field)?;
+                        &fieldnorms_readers[reader_with_ordinal.ordinal as usize];
                    let fieldnorm_id = fieldnorms_reader.fieldnorm_id(*doc_id);
                    fieldnorms_data.push(fieldnorm_id);
                }
@@ -232,7 +267,7 @@ impl IndexMerger {

    fn write_fast_fields(
        &self,
-        fast_field_serializer: &mut FastFieldSerializer,
+        fast_field_serializer: &mut CompositeFastFieldSerializer,
        mut term_ord_mappings: HashMap<Field, TermOrdinalMapping>,
        doc_id_mapping: &Option<Vec<(DocId, SegmentReaderWithOrdinal)>>,
    ) -> crate::Result<()> {
@@ -282,11 +317,11 @@ impl IndexMerger {
    fn write_single_fast_field(
        &self,
        field: Field,
-        fast_field_serializer: &mut FastFieldSerializer,
+        fast_field_serializer: &mut CompositeFastFieldSerializer,
        doc_id_mapping: &Option<Vec<(DocId, SegmentReaderWithOrdinal)>>,
    ) -> crate::Result<()> {
        let (min_value, max_value) = self.readers.iter().map(|reader|{
-                let u64_reader: FastFieldReader<u64> = reader
+                let u64_reader: DynamicFastFieldReader<u64> = reader
                .fast_fields()
                .typed_fast_field_reader(field)
                .expect("Failed to find a reader for single fast field. This is a tantivy bug and it should never happen.");
@@ -301,7 +336,7 @@ impl IndexMerger {
            .readers
            .iter()
            .map(|reader| {
-               let u64_reader: FastFieldReader<u64> = reader
+               let u64_reader: DynamicFastFieldReader<u64> = reader
                    .fast_fields()
                    .typed_fast_field_reader(field)
                    .expect("Failed to find a reader for single fast field. This is a tantivy bug and it should never happen.");
@@ -329,7 +364,7 @@ impl IndexMerger {
            let u64_readers = self.readers.iter()
                .filter(|reader|reader.max_doc() != reader.delete_bitset().map(|bit_set|bit_set.len() as u32).unwrap_or(0))
                .map(|reader|{
-                let u64_reader: FastFieldReader<u64> = reader
+                let u64_reader: DynamicFastFieldReader<u64> = reader
                .fast_fields()
                .typed_fast_field_reader(field)
                .expect("Failed to find a reader for single fast field. This is a tantivy bug and it should never happen.");
@@ -355,6 +390,60 @@ impl IndexMerger {
        }
    }

+    /// Checks if the readers are disjunct for their sort property and in the correct order to be
+    /// able to just stack them.
+    pub(crate) fn is_disjunct_and_sorted_on_sort_property(
+        &self,
+        sort_by_field: &IndexSortByField,
+    ) -> crate::Result<bool> {
+        let reader_and_field_accessors = self.get_reader_with_sort_field_accessor(sort_by_field)?;
+
+        let everything_is_in_order = reader_and_field_accessors
+            .into_iter()
+            .map(|reader| reader.1)
+            .tuple_windows()
+            .all(|(field_accessor1, field_accessor2)| {
+                if sort_by_field.order.is_asc() {
+                    field_accessor1.max_value() <= field_accessor2.min_value()
+                } else {
+                    field_accessor1.min_value() >= field_accessor2.max_value()
+                }
+            });
+        Ok(everything_is_in_order)
+    }
+
+    pub(crate) fn get_sort_field_accessor(
+        reader: &SegmentReader,
+        sort_by_field: &IndexSortByField,
+    ) -> crate::Result<impl FastFieldReader<u64>> {
+        let field_id = expect_field_id_for_sort_field(&reader.schema(), &sort_by_field)?; // for now expect fastfield, but not strictly required
+        let value_accessor = reader.fast_fields().u64_lenient(field_id)?;
+        Ok(value_accessor)
+    }
+    /// Collecting value_accessors into a vec to bind the lifetime.
+    pub(crate) fn get_reader_with_sort_field_accessor<'a, 'b>(
+        &'a self,
+        sort_by_field: &'b IndexSortByField,
+    ) -> crate::Result<
+        Vec<(
+            SegmentReaderWithOrdinal<'a>,
+            impl FastFieldReader<u64> + Clone,
+        )>,
+    > {
+        let reader_and_field_accessors = self
+            .readers
+            .iter()
+            .enumerate()
+            .map(Into::into)
+            .map(|reader_with_ordinal: SegmentReaderWithOrdinal| {
+                let value_accessor =
+                    Self::get_sort_field_accessor(reader_with_ordinal.reader, sort_by_field)?;
+                Ok((reader_with_ordinal, value_accessor))
+            })
+            .collect::<crate::Result<Vec<_>>>()?;
+        Ok(reader_and_field_accessors)
+    }
+
    /// Generates the doc_id mapping where position in the vec=new
    /// doc_id.
    /// ReaderWithOrdinal will include the ordinal position of the
@@ -363,42 +452,26 @@ impl IndexMerger {
        &self,
        sort_by_field: &IndexSortByField,
    ) -> crate::Result<Vec<(DocId, SegmentReaderWithOrdinal)>> {
-        let reader_and_field_accessors = self
-            .readers
-            .iter()
-            .enumerate()
-            .map(|reader| {
-                let reader_with_ordinal: SegmentReaderWithOrdinal = reader.into();
-                let field_id = expect_field_id_for_sort_field(
-                    &reader_with_ordinal.reader.schema(),
-                    &sort_by_field,
-                )?; // for now expect fastfield, but not strictly required
-                let value_accessor = reader_with_ordinal
-                    .reader
-                    .fast_fields()
-                    .u64_lenient(field_id)?;
-                Ok((reader_with_ordinal, value_accessor))
-            })
-            .collect::<crate::Result<Vec<_>>>()?; // Collecting to bind the lifetime of value_accessor into the vec, or can't be used as a reference.
-                                                  // Loading the field accessor on demand causes a 15x regression
+        let reader_and_field_accessors = self.get_reader_with_sort_field_accessor(sort_by_field)?;
+        // Loading the field accessor on demand causes a 15x regression

        // create iterators over segment/sort_accessor/doc_id  tuple
-        let doc_id_reader_pair = reader_and_field_accessors
-            .iter()
-            .map(|reader_and_field_accessor| {
-                reader_and_field_accessor
-                    .0
-                    .reader
-                    .doc_ids_alive()
-                    .map(move |doc_id| {
-                        (
-                            doc_id,
-                            reader_and_field_accessor.0,
-                            &reader_and_field_accessor.1,
-                        )
-                    })
-            })
-            .collect::<Vec<_>>();
+        let doc_id_reader_pair =
+            reader_and_field_accessors
+                .iter()
+                .map(|reader_and_field_accessor| {
+                    reader_and_field_accessor
+                        .0
+                        .reader
+                        .doc_ids_alive()
+                        .map(move |doc_id| {
+                            (
+                                doc_id,
+                                reader_and_field_accessor.0,
+                                &reader_and_field_accessor.1,
+                            )
+                        })
+                });

        // create iterator tuple of (old doc_id, reader) in order of the new doc_ids
        let sorted_doc_ids: Vec<(DocId, SegmentReaderWithOrdinal)> = doc_id_reader_pair
@@ -425,7 +498,7 @@ impl IndexMerger {
    // is used to index the reader_and_field_accessors vec.
    fn write_1_n_fast_field_idx_generic(
        field: Field,
-        fast_field_serializer: &mut FastFieldSerializer,
+        fast_field_serializer: &mut CompositeFastFieldSerializer,
        doc_id_mapping: &Option<Vec<(DocId, SegmentReaderWithOrdinal)>>,
        reader_and_field_accessors: &[(&SegmentReader, impl MultiValueLength)],
    ) -> crate::Result<()> {
@@ -480,7 +553,7 @@ impl IndexMerger {
    fn write_multi_value_fast_field_idx(
        &self,
        field: Field,
-        fast_field_serializer: &mut FastFieldSerializer,
+        fast_field_serializer: &mut CompositeFastFieldSerializer,
        doc_id_mapping: &Option<Vec<(DocId, SegmentReaderWithOrdinal)>>,
    ) -> crate::Result<()> {
        let reader_and_field_accessors = self.readers.iter().map(|reader|{
@@ -502,7 +575,7 @@ impl IndexMerger {
        &self,
        field: Field,
        term_ordinal_mappings: &TermOrdinalMapping,
-        fast_field_serializer: &mut FastFieldSerializer,
+        fast_field_serializer: &mut CompositeFastFieldSerializer,
        doc_id_mapping: &Option<Vec<(DocId, SegmentReaderWithOrdinal)>>,
    ) -> crate::Result<()> {
        // Multifastfield consists in 2 fastfields.
@@ -565,7 +638,7 @@ impl IndexMerger {
    fn write_multi_fast_field(
        &self,
        field: Field,
-        fast_field_serializer: &mut FastFieldSerializer,
+        fast_field_serializer: &mut CompositeFastFieldSerializer,
        doc_id_mapping: &Option<Vec<(DocId, SegmentReaderWithOrdinal)>>,
    ) -> crate::Result<()> {
        // Multifastfield consists in 2 fastfields.
@@ -652,7 +725,7 @@ impl IndexMerger {
    fn write_bytes_fast_field(
        &self,
        field: Field,
-        fast_field_serializer: &mut FastFieldSerializer,
+        fast_field_serializer: &mut CompositeFastFieldSerializer,
        doc_id_mapping: &Option<Vec<(DocId, SegmentReaderWithOrdinal)>>,
    ) -> crate::Result<()> {
        let reader_and_field_accessors = self
@@ -798,13 +871,11 @@ impl IndexMerger {
            let mut total_doc_freq = 0;

            // Let's compute the list of non-empty posting lists
-            for heap_item in merged_terms.current_kvs() {
-                let segment_ord = heap_item.segment_ord;
-                let term_info = heap_item.streamer.value();
-                let segment_reader = &self.readers[heap_item.segment_ord];
+            for (segment_ord, term_info) in merged_terms.current_segment_ordinals_and_term_infos() {
+                let segment_reader = &self.readers[segment_ord];
                let inverted_index: &InvertedIndexReader = &*field_readers[segment_ord];
                let segment_postings = inverted_index
-                    .read_postings_from_terminfo(term_info, segment_postings_option)?;
+                    .read_postings_from_terminfo(&term_info, segment_postings_option)?;
                let delete_bitset_opt = segment_reader.delete_bitset();
                let doc_freq = if let Some(delete_bitset) = delete_bitset_opt {
                    segment_postings.doc_freq_given_deletes(delete_bitset)
@@ -943,7 +1014,23 @@ impl IndexMerger {
        } else {
            for reader in &self.readers {
                let store_reader = reader.get_store_reader()?;
-                if reader.num_deleted_docs() > 0 {
+                if reader.num_deleted_docs() > 0
+                    // If there is not enough data in the store, we avoid stacking in order to
+                    // avoid creating many small blocks in the doc store. Once we have 5 full blocks,
+                    // we start stacking. In the worst case 2/7 of the blocks would be very small.
+                    // [segment 1 - {1 doc}][segment 2 - {fullblock * 5}{1doc}]
+                    // => 5 * full blocks, 2 * 1 document blocks
+                    //
+                    // In a more realistic scenario the segments are of the same size, so 1/6 of
+                    // the doc stores would be on average half full, given total randomness (which
+                    // is not the case here, but not sure how it behaves exactly).
+                    //
+                    // https://github.com/tantivy-search/tantivy/issues/1053
+                    //
+                    // take 7 in order to not walk over all checkpoints.
+                    || store_reader.block_checkpoints().take(7).count() < 6
+                    || store_reader.compressor() != store_writer.compressor()
+                {
                    for doc_bytes_res in store_reader.iter_raw(reader.delete_bitset()) {
                        let doc_bytes = doc_bytes_res?;
                        store_writer.store_bytes(&doc_bytes)?;
@@ -965,7 +1052,13 @@ impl SerializableSegment for IndexMerger {
    ) -> crate::Result<u32> {
        let doc_id_mapping = if let Some(sort_by_field) = self.index_settings.sort_by_field.as_ref()
        {
-            Some(self.generate_doc_id_mapping(sort_by_field)?)
+            // If the documents are already sorted and stackable, we ignore the mapping and execute
+            // it as if there was no sorting
+            if self.is_disjunct_and_sorted_on_sort_property(sort_by_field)? {
+                None
+            } else {
+                Some(self.generate_doc_id_mapping(sort_by_field)?)
+            }
        } else {
            None
        };
@@ -1000,6 +1093,7 @@ mod tests {
    use crate::collector::tests::{BytesFastFieldTestCollector, FastFieldTestCollector};
    use crate::collector::{Count, FacetCollector};
    use crate::core::Index;
+    use crate::fastfield::FastFieldReader;
    use crate::query::AllQuery;
    use crate::query::BooleanQuery;
    use crate::query::Scorer;
@@ -1477,31 +1571,65 @@ mod tests {
    }
    #[test]
    fn test_merge_facets_sort_none() {
-        test_merge_facets(None)
+        test_merge_facets(None, true)
    }

    #[test]
    fn test_merge_facets_sort_asc() {
-        // the data is already sorted asc, so this should have no effect, but go through the docid
-        // mapping code
-        test_merge_facets(Some(IndexSettings {
-            sort_by_field: Some(IndexSortByField {
-                field: "intval".to_string(),
-                order: Order::Asc,
+        // In the merge case this will go through the docid mapping code
+        test_merge_facets(
+            Some(IndexSettings {
+                sort_by_field: Some(IndexSortByField {
+                    field: "intval".to_string(),
+                    order: Order::Desc,
+                }),
+                ..Default::default()
            }),
-        }));
+            true,
+        );
+        // In the merge case this will not go through the docid mapping code, because the data is
+        // sorted and disjunct
+        test_merge_facets(
+            Some(IndexSettings {
+                sort_by_field: Some(IndexSortByField {
+                    field: "intval".to_string(),
+                    order: Order::Desc,
+                }),
+                ..Default::default()
+            }),
+            false,
+        );
    }

    #[test]
    fn test_merge_facets_sort_desc() {
-        test_merge_facets(Some(IndexSettings {
-            sort_by_field: Some(IndexSortByField {
-                field: "intval".to_string(),
-                order: Order::Desc,
+        // In the merge case this will go through the docid mapping code
+        test_merge_facets(
+            Some(IndexSettings {
+                sort_by_field: Some(IndexSortByField {
+                    field: "intval".to_string(),
+                    order: Order::Desc,
+                }),
+                ..Default::default()
            }),
-        }));
+            true,
+        );
+        // In the merge case this will not go through the docid mapping code, because the data is
+        // sorted and disjunct
+        test_merge_facets(
+            Some(IndexSettings {
+                sort_by_field: Some(IndexSortByField {
+                    field: "intval".to_string(),
+                    order: Order::Desc,
+                }),
+                ..Default::default()
+            }),
+            false,
+        );
    }
-    fn test_merge_facets(index_settings: Option<IndexSettings>) {
+    // force_segment_value_overlap forces the int value for sorting to have overlapping min and max
+    // ranges between segments so that merge algorithm can't apply certain optimizations
+    fn test_merge_facets(index_settings: Option<IndexSettings>, force_segment_value_overlap: bool) {
        let mut schema_builder = schema::Schema::builder();
        let facet_field = schema_builder.add_facet_field("facet", INDEXED);
        let int_options = IntOptions::default()
@@ -1518,32 +1646,47 @@ mod tests {
        let mut int_val = 0;
        {
            let mut index_writer = index.writer_for_tests().unwrap();
-            let mut index_doc = |index_writer: &mut IndexWriter, doc_facets: &[&str]| {
-                let mut doc = Document::default();
-                for facet in doc_facets {
-                    doc.add_facet(facet_field, Facet::from(facet));
-                }
-                doc.add_u64(int_field, int_val);
-                int_val += 1;
-                index_writer.add_document(doc);
-            };
+            let index_doc =
+                |index_writer: &mut IndexWriter, doc_facets: &[&str], int_val: &mut u64| {
+                    let mut doc = Document::default();
+                    for facet in doc_facets {
+                        doc.add_facet(facet_field, Facet::from(facet));
+                    }
+                    doc.add_u64(int_field, *int_val);
+                    *int_val += 1;
+                    index_writer.add_document(doc);
+                };

-            index_doc(&mut index_writer, &["/top/a/firstdoc", "/top/b"]);
-            index_doc(&mut index_writer, &["/top/a/firstdoc", "/top/b", "/top/c"]);
-            index_doc(&mut index_writer, &["/top/a", "/top/b"]);
-            index_doc(&mut index_writer, &["/top/a"]);
+            index_doc(
+                &mut index_writer,
+                &["/top/a/firstdoc", "/top/b"],
+                &mut int_val,
+            );
+            index_doc(
+                &mut index_writer,
+                &["/top/a/firstdoc", "/top/b", "/top/c"],
+                &mut int_val,
+            );
+            index_doc(&mut index_writer, &["/top/a", "/top/b"], &mut int_val);
+            index_doc(&mut index_writer, &["/top/a"], &mut int_val);

-            index_doc(&mut index_writer, &["/top/b", "/top/d"]);
-            index_doc(&mut index_writer, &["/top/d"]);
-            index_doc(&mut index_writer, &["/top/e"]);
+            index_doc(&mut index_writer, &["/top/b", "/top/d"], &mut int_val);
+            if force_segment_value_overlap {
+                index_doc(&mut index_writer, &["/top/d"], &mut 0);
+                index_doc(&mut index_writer, &["/top/e"], &mut 10);
+                index_writer.commit().expect("committed");
+                index_doc(&mut index_writer, &["/top/a"], &mut 5); // 5 is between 0 - 10 so the segments don' have disjunct ranges
+            } else {
+                index_doc(&mut index_writer, &["/top/d"], &mut int_val);
+                index_doc(&mut index_writer, &["/top/e"], &mut int_val);
+                index_writer.commit().expect("committed");
+                index_doc(&mut index_writer, &["/top/a"], &mut int_val);
+            }
+            index_doc(&mut index_writer, &["/top/b"], &mut int_val);
+            index_doc(&mut index_writer, &["/top/c"], &mut int_val);
            index_writer.commit().expect("committed");

-            index_doc(&mut index_writer, &["/top/a"]);
-            index_doc(&mut index_writer, &["/top/b"]);
-            index_doc(&mut index_writer, &["/top/c"]);
-            index_writer.commit().expect("committed");
-
-            index_doc(&mut index_writer, &["/top/e", "/top/f"]);
+            index_doc(&mut index_writer, &["/top/e", "/top/f"], &mut int_val);
            index_writer.commit().expect("committed");
        }

@@ -1828,7 +1971,7 @@ mod tests {

        // Make sure we'll attempt to merge every created segment
        let mut policy = crate::indexer::LogMergePolicy::default();
-        policy.set_min_merge_size(2);
+        policy.set_min_num_segments(2);
        writer.set_merge_policy(Box::new(policy));

        for i in 0..100 {
--- a/src/indexer/merger_sorted_index_test.rs
+++ b/src/indexer/merger_sorted_index_test.rs
@@ -1,5 +1,6 @@
 #[cfg(test)]
 mod tests {
+    use crate::fastfield::FastFieldReader;
    use crate::{
        collector::TopDocs,
        schema::{Cardinality, TextFieldIndexing},
@@ -39,6 +40,7 @@ mod tests {
            let mut index_writer = index.writer_for_tests().unwrap();

            index_writer.add_document(doc!(int_field=>3_u64, facet_field=> Facet::from("/crime")));
+            index_writer.add_document(doc!(int_field=>6_u64, facet_field=> Facet::from("/crime")));

            assert!(index_writer.commit().is_ok());
            index_writer.add_document(doc!(int_field=>5_u64, facet_field=> Facet::from("/fanta")));
@@ -58,7 +60,12 @@ mod tests {
        index
    }

-    fn create_test_index(index_settings: Option<IndexSettings>) -> Index {
+    // force_disjunct_segment_sort_values forces the field, by which the index is sorted have disjunct
+    // ranges between segments, e.g. values in segment [1-3] [10 - 20] [50 - 500]
+    fn create_test_index(
+        index_settings: Option<IndexSettings>,
+        force_disjunct_segment_sort_values: bool,
+    ) -> Index {
        let mut schema_builder = schema::Schema::builder();
        let int_options = IntOptions::default()
            .set_fast(Cardinality::SingleValue)
@@ -92,6 +99,7 @@ mod tests {
        {
            let mut index_writer = index.writer_for_tests().unwrap();

+            // segment 1 - range 1-3
            index_writer.add_document(doc!(int_field=>1_u64));
            index_writer.add_document(
                doc!(int_field=>3_u64, multi_numbers => 3_u64, multi_numbers => 4_u64, bytes_field => vec![1, 2, 3], text_field => "some text", facet_field=> Facet::from("/book/crime")),
@@ -102,13 +110,26 @@ mod tests {
            );

            assert!(index_writer.commit().is_ok());
+            // segment 2 - range 1-20 , with force_disjunct_segment_sort_values 10-20
            index_writer.add_document(doc!(int_field=>20_u64, multi_numbers => 20_u64));
-            index_writer.add_document(doc!(int_field=>1_u64, text_field=> "deleteme", facet_field=> Facet::from("/book/crime")));
+
+            let in_val = if force_disjunct_segment_sort_values {
+                10_u64
+            } else {
+                1
+            };
+            index_writer.add_document(doc!(int_field=>in_val, text_field=> "deleteme", facet_field=> Facet::from("/book/crime")));
            assert!(index_writer.commit().is_ok());
-            index_writer.add_document(
-                doc!(int_field=>10_u64, multi_numbers => 10_u64, multi_numbers => 11_u64, text_field=> "blubber", facet_field=> Facet::from("/book/fantasy")),
+            // segment 3 - range 5-1000, with force_disjunct_segment_sort_values 50-1000
+            let int_vals = if force_disjunct_segment_sort_values {
+                [100_u64, 50]
+            } else {
+                [10, 5]
+            };
+            index_writer.add_document( // position of this doc after delete in desc sorting = [2], in disjunct case [1]
+                doc!(int_field=>int_vals[0], multi_numbers => 10_u64, multi_numbers => 11_u64, text_field=> "blubber", facet_field=> Facet::from("/book/fantasy")),
            );
-            index_writer.add_document(doc!(int_field=>5_u64, text_field=> "deleteme"));
+            index_writer.add_document(doc!(int_field=>int_vals[1], text_field=> "deleteme"));
            index_writer.add_document(
                doc!(int_field=>1_000u64, multi_numbers => 1001_u64, multi_numbers => 1002_u64, bytes_field => vec![5, 5],text_field => "the biggest num")
            );
@@ -136,17 +157,30 @@ mod tests {
                field: "intval".to_string(),
                order: Order::Desc,
            }),
+            ..Default::default()
        }));
    }

    #[test]
-    fn test_merge_sorted_index_desc() {
-        let index = create_test_index(Some(IndexSettings {
-            sort_by_field: Some(IndexSortByField {
-                field: "intval".to_string(),
-                order: Order::Desc,
+    fn test_merge_sorted_index_desc_not_disjunct() {
+        test_merge_sorted_index_desc_(false);
+    }
+    #[test]
+    fn test_merge_sorted_index_desc_disjunct() {
+        test_merge_sorted_index_desc_(true);
+    }
+
+    fn test_merge_sorted_index_desc_(force_disjunct_segment_sort_values: bool) {
+        let index = create_test_index(
+            Some(IndexSettings {
+                sort_by_field: Some(IndexSortByField {
+                    field: "intval".to_string(),
+                    order: Order::Desc,
+                }),
+                ..Default::default()
            }),
-        }));
+            force_disjunct_segment_sort_values,
+        );

        let int_field = index.schema().get_field("intval").unwrap();
        let reader = index.reader().unwrap();
@@ -160,8 +194,13 @@ mod tests {
        assert_eq!(fast_field.get(5u32), 1u64);
        assert_eq!(fast_field.get(4u32), 2u64);
        assert_eq!(fast_field.get(3u32), 3u64);
-        assert_eq!(fast_field.get(2u32), 10u64);
-        assert_eq!(fast_field.get(1u32), 20u64);
+        if force_disjunct_segment_sort_values {
+            assert_eq!(fast_field.get(2u32), 20u64);
+            assert_eq!(fast_field.get(1u32), 100u64);
+        } else {
+            assert_eq!(fast_field.get(2u32), 10u64);
+            assert_eq!(fast_field.get(1u32), 20u64);
+        }
        assert_eq!(fast_field.get(0u32), 1_000u64);

        // test new field norm mapping
@@ -169,8 +208,13 @@ mod tests {
            let my_text_field = index.schema().get_field("text_field").unwrap();
            let fieldnorm_reader = segment_reader.get_fieldnorms_reader(my_text_field).unwrap();
            assert_eq!(fieldnorm_reader.fieldnorm(0), 3); // the biggest num
-            assert_eq!(fieldnorm_reader.fieldnorm(1), 0);
-            assert_eq!(fieldnorm_reader.fieldnorm(2), 1); // blubber
+            if force_disjunct_segment_sort_values {
+                assert_eq!(fieldnorm_reader.fieldnorm(1), 1); // blubber
+                assert_eq!(fieldnorm_reader.fieldnorm(2), 0);
+            } else {
+                assert_eq!(fieldnorm_reader.fieldnorm(1), 0);
+                assert_eq!(fieldnorm_reader.fieldnorm(2), 1); // blubber
+            }
            assert_eq!(fieldnorm_reader.fieldnorm(3), 2); // some text
            assert_eq!(fieldnorm_reader.fieldnorm(5), 0);
        }
@@ -191,13 +235,22 @@ mod tests {
            };

            assert_eq!(do_search("some"), vec![3]);
-            assert_eq!(do_search("blubber"), vec![2]);
+            if force_disjunct_segment_sort_values {
+                assert_eq!(do_search("blubber"), vec![1]);
+            } else {
+                assert_eq!(do_search("blubber"), vec![2]);
+            }
            assert_eq!(do_search("biggest"), vec![0]);
        }

        // access doc store
        {
-            let doc = searcher.doc(DocAddress::new(0, 2)).unwrap();
+            let blubber_pos = if force_disjunct_segment_sort_values {
+                1
+            } else {
+                2
+            };
+            let doc = searcher.doc(DocAddress::new(0, blubber_pos)).unwrap();
            assert_eq!(
                doc.get_first(my_text_field).unwrap().text(),
                Some("blubber")
@@ -209,12 +262,16 @@ mod tests {

    #[test]
    fn test_merge_sorted_index_asc() {
-        let index = create_test_index(Some(IndexSettings {
-            sort_by_field: Some(IndexSortByField {
-                field: "intval".to_string(),
-                order: Order::Asc,
+        let index = create_test_index(
+            Some(IndexSettings {
+                sort_by_field: Some(IndexSortByField {
+                    field: "intval".to_string(),
+                    order: Order::Asc,
+                }),
+                ..Default::default()
            }),
-        }));
+            false,
+        );

        let int_field = index.schema().get_field("intval").unwrap();
        let multi_numbers = index.schema().get_field("multi_numbers").unwrap();
@@ -315,7 +372,6 @@ mod bench_sorted_index_merge {
    use crate::IndexSortByField;
    use crate::IndexWriter;
    use crate::Order;
-    use futures::executor::block_on;
    use test::{self, Bencher};
    fn create_index(sort_by_field: Option<IndexSortByField>) -> Index {
        let mut schema_builder = Schema::builder();
@@ -323,12 +379,12 @@ mod bench_sorted_index_merge {
            .set_fast(Cardinality::SingleValue)
            .set_indexed();
        let int_field = schema_builder.add_u64_field("intval", int_options);
-        let int_field = schema_builder.add_u64_field("intval", int_options);
        let schema = schema_builder.build();

-        let index_builder = Index::builder()
-            .schema(schema)
-            .settings(IndexSettings { sort_by_field });
+        let index_builder = Index::builder().schema(schema).settings(IndexSettings {
+            sort_by_field,
+            ..Default::default()
+        });
        let index = index_builder.create_in_ram().unwrap();

        {
@@ -366,7 +422,7 @@ mod bench_sorted_index_merge {
        b.iter(|| {

            let sorted_doc_ids = doc_id_mapping.iter().map(|(doc_id, reader)|{
-            let u64_reader: FastFieldReader<u64> = reader
+            let u64_reader: FastFieldReader<u64> = reader.reader
                .fast_fields()
                .typed_fast_field_reader(field)
                .expect("Failed to find a reader for single fast field. This is a tantivy bug and it should never happen.");
@@ -391,7 +447,7 @@ mod bench_sorted_index_merge {
            order: Order::Desc,
        };
        let index = create_index(Some(sort_by_field.clone()));
-        let field = index.schema().get_field("intval").unwrap();
+        //let field = index.schema().get_field("intval").unwrap();
        let segments = index.searchable_segments().unwrap();
        let merger: IndexMerger =
            IndexMerger::open(index.schema(), index.settings().clone(), &segments[..])?;
--- a/src/indexer/segment_serializer.rs
+++ b/src/indexer/segment_serializer.rs
@@ -1,6 +1,6 @@
 use crate::core::Segment;
 use crate::core::SegmentComponent;
-use crate::fastfield::FastFieldSerializer;
+use crate::fastfield::CompositeFastFieldSerializer;
 use crate::fieldnorm::FieldNormsSerializer;
 use crate::postings::InvertedIndexSerializer;
 use crate::store::StoreWriter;
@@ -10,7 +10,7 @@ use crate::store::StoreWriter;
 pub struct SegmentSerializer {
    segment: Segment,
    pub(crate) store_writer: StoreWriter,
-    fast_field_serializer: FastFieldSerializer,
+    fast_field_serializer: CompositeFastFieldSerializer,
    fieldnorms_serializer: Option<FieldNormsSerializer>,
    postings_serializer: InvertedIndexSerializer,
 }
@@ -33,15 +33,16 @@ impl SegmentSerializer {
        let store_write = segment.open_write(store_component)?;

        let fast_field_write = segment.open_write(SegmentComponent::FastFields)?;
-        let fast_field_serializer = FastFieldSerializer::from_write(fast_field_write)?;
+        let fast_field_serializer = CompositeFastFieldSerializer::from_write(fast_field_write)?;

        let fieldnorms_write = segment.open_write(SegmentComponent::FieldNorms)?;
        let fieldnorms_serializer = FieldNormsSerializer::from_write(fieldnorms_write)?;

        let postings_serializer = InvertedIndexSerializer::open(&mut segment)?;
+        let compressor = segment.index().settings().docstore_compression;
        Ok(SegmentSerializer {
            segment,
-            store_writer: StoreWriter::new(store_write),
+            store_writer: StoreWriter::new(store_write, compressor),
            fast_field_serializer,
            fieldnorms_serializer: Some(fieldnorms_serializer),
            postings_serializer,
@@ -67,7 +68,7 @@ impl SegmentSerializer {
    }

    /// Accessor to the `FastFieldSerializer`.
-    pub fn get_fast_field_serializer(&mut self) -> &mut FastFieldSerializer {
+    pub fn get_fast_field_serializer(&mut self) -> &mut CompositeFastFieldSerializer {
        &mut self.fast_field_serializer
    }

--- a/src/indexer/segment_writer.rs
+++ b/src/indexer/segment_writer.rs
@@ -345,8 +345,11 @@ fn write(
        let store_write = serializer
            .segment_mut()
            .open_write(SegmentComponent::Store)?;
-        let old_store_writer =
-            std::mem::replace(&mut serializer.store_writer, StoreWriter::new(store_write));
+        let compressor = serializer.segment().index().settings().docstore_compression;
+        let old_store_writer = std::mem::replace(
+            &mut serializer.store_writer,
+            StoreWriter::new(store_write, compressor),
+        );
        old_store_writer.close()?;
        let store_read = StoreReader::open(
            serializer
@@ -357,7 +360,6 @@ fn write(
            let doc_bytes = store_read.get_document_bytes(*old_doc_id)?;
            serializer.get_store_writer().store_bytes(&doc_bytes)?;
        }
-        // TODO delete temp store
    }
    serializer.close()?;
    Ok(())
--- a/src/lib.rs
+++ b/src/lib.rs
@@ -178,7 +178,7 @@ use once_cell::sync::Lazy;
 use serde::{Deserialize, Serialize};

 /// Index format version.
-const INDEX_FORMAT_VERSION: u32 = 3;
+const INDEX_FORMAT_VERSION: u32 = 4;

 /// Structure version for the index.
 #[derive(Clone, PartialEq, Eq, Serialize, Deserialize)]
@@ -187,7 +187,6 @@ pub struct Version {
    minor: u32,
    patch: u32,
    index_format_version: u32,
-    store_compression: String,
 }

 impl fmt::Debug for Version {
@@ -201,14 +200,13 @@ static VERSION: Lazy<Version> = Lazy::new(|| Version {
    minor: env!("CARGO_PKG_VERSION_MINOR").parse().unwrap(),
    patch: env!("CARGO_PKG_VERSION_PATCH").parse().unwrap(),
    index_format_version: INDEX_FORMAT_VERSION,
-    store_compression: crate::store::COMPRESSION.to_string(),
 });

 impl ToString for Version {
    fn to_string(&self) -> String {
        format!(
-            "tantivy v{}.{}.{}, index_format v{}, store_compression: {}",
-            self.major, self.minor, self.patch, self.index_format_version, self.store_compression
+            "tantivy v{}.{}.{}, index_format v{}",
+            self.major, self.minor, self.patch, self.index_format_version
        )
    }
 }
@@ -293,6 +291,7 @@ mod tests {
    use crate::collector::tests::TEST_COLLECTOR_WITH_SCORE;
    use crate::core::SegmentReader;
    use crate::docset::{DocSet, TERMINATED};
+    use crate::fastfield::FastFieldReader;
    use crate::query::BooleanQuery;
    use crate::schema::*;
    use crate::DocAddress;
--- a/src/query/boolean_query/boolean_query.rs
+++ b/src/query/boolean_query/boolean_query.rs
@@ -6,7 +6,7 @@ use crate::query::Weight;
 use crate::schema::IndexRecordOption;
 use crate::schema::Term;
 use crate::Searcher;
-use std::collections::BTreeSet;
+use std::collections::BTreeMap;

 /// The boolean query returns a set of documents
 /// that matches the Boolean combination of constituent subqueries.
@@ -159,9 +159,9 @@ impl Query for BooleanQuery {
        Ok(Box::new(BooleanWeight::new(sub_weights, scoring_enabled)))
    }

-    fn query_terms(&self, term_set: &mut BTreeSet<Term>) {
+    fn query_terms(&self, terms: &mut BTreeMap<Term, bool>) {
        for (_occur, subquery) in &self.subqueries {
-            subquery.query_terms(term_set);
+            subquery.query_terms(terms);
        }
    }
 }
--- a/src/query/boost_query.rs
+++ b/src/query/boost_query.rs
@@ -2,7 +2,7 @@ use crate::fastfield::DeleteBitSet;
 use crate::query::explanation::does_not_match;
 use crate::query::{Explanation, Query, Scorer, Weight};
 use crate::{DocId, DocSet, Score, Searcher, SegmentReader, Term};
-use std::collections::BTreeSet;
+use std::collections::BTreeMap;
 use std::fmt;

 /// `BoostQuery` is a wrapper over a query used to boost its score.
@@ -48,8 +48,8 @@ impl Query for BoostQuery {
        Ok(boosted_weight)
    }

-    fn query_terms(&self, term_set: &mut BTreeSet<Term>) {
-        self.query.query_terms(term_set)
+    fn query_terms(&self, terms: &mut BTreeMap<Term, bool>) {
+        self.query.query_terms(terms)
    }
 }

--- a/src/query/mod.rs
+++ b/src/query/mod.rs
@@ -66,7 +66,7 @@ mod tests {
    use crate::schema::{Schema, TEXT};
    use crate::Index;
    use crate::Term;
-    use std::collections::BTreeSet;
+    use std::collections::BTreeMap;

    #[test]
    fn test_query_terms() {
@@ -78,49 +78,49 @@ mod tests {
        let term_a = Term::from_field_text(text_field, "a");
        let term_b = Term::from_field_text(text_field, "b");
        {
-            let mut terms_set: BTreeSet<Term> = BTreeSet::new();
+            let mut terms: BTreeMap<Term, bool> = Default::default();
            query_parser
                .parse_query("a")
                .unwrap()
-                .query_terms(&mut terms_set);
-            let terms: Vec<&Term> = terms_set.iter().collect();
-            assert_eq!(vec![&term_a], terms);
+                .query_terms(&mut terms);
+            let terms: Vec<(&Term, &bool)> = terms.iter().collect();
+            assert_eq!(vec![(&term_a, &false)], terms);
        }
        {
-            let mut terms_set: BTreeSet<Term> = BTreeSet::new();
+            let mut terms: BTreeMap<Term, bool> = Default::default();
            query_parser
                .parse_query("a b")
                .unwrap()
-                .query_terms(&mut terms_set);
-            let terms: Vec<&Term> = terms_set.iter().collect();
-            assert_eq!(vec![&term_a, &term_b], terms);
+                .query_terms(&mut terms);
+            let terms: Vec<(&Term, &bool)> = terms.iter().collect();
+            assert_eq!(vec![(&term_a, &false), (&term_b, &false)], terms);
        }
        {
-            let mut terms_set: BTreeSet<Term> = BTreeSet::new();
+            let mut terms: BTreeMap<Term, bool> = Default::default();
            query_parser
                .parse_query("\"a b\"")
                .unwrap()
-                .query_terms(&mut terms_set);
-            let terms: Vec<&Term> = terms_set.iter().collect();
-            assert_eq!(vec![&term_a, &term_b], terms);
+                .query_terms(&mut terms);
+            let terms: Vec<(&Term, &bool)> = terms.iter().collect();
+            assert_eq!(vec![(&term_a, &true), (&term_b, &true)], terms);
        }
        {
-            let mut terms_set: BTreeSet<Term> = BTreeSet::new();
+            let mut terms: BTreeMap<Term, bool> = Default::default();
            query_parser
                .parse_query("a a a a a")
                .unwrap()
-                .query_terms(&mut terms_set);
-            let terms: Vec<&Term> = terms_set.iter().collect();
-            assert_eq!(vec![&term_a], terms);
+                .query_terms(&mut terms);
+            let terms: Vec<(&Term, &bool)> = terms.iter().collect();
+            assert_eq!(vec![(&term_a, &false)], terms);
        }
        {
-            let mut terms_set: BTreeSet<Term> = BTreeSet::new();
+            let mut terms: BTreeMap<Term, bool> = Default::default();
            query_parser
                .parse_query("a -b")
                .unwrap()
-                .query_terms(&mut terms_set);
-            let terms: Vec<&Term> = terms_set.iter().collect();
-            assert_eq!(vec![&term_a, &term_b], terms);
+                .query_terms(&mut terms);
+            let terms: Vec<(&Term, &bool)> = terms.iter().collect();
+            assert_eq!(vec![(&term_a, &false), (&term_b, &false)], terms);
        }
    }
 }
--- a/src/query/more_like_this/more_like_this.rs
+++ b/src/query/more_like_this/more_like_this.rs
@@ -233,10 +233,9 @@ impl MoreLikeThis {
            }
            FieldType::U64(_) => {
                for field_value in field_values {
-                    let val = field_value
-                        .value()
-                        .u64_value()
-                        .ok_or(TantivyError::InvalidArgument("invalid value".to_string()))?;
+                    let val = field_value.value().u64_value().ok_or_else(|| {
+                        TantivyError::InvalidArgument("invalid value".to_string())
+                    })?;
                    if !self.is_noise_word(val.to_string()) {
                        let term = Term::from_field_u64(field, val);
                        *term_frequencies.entry(term).or_insert(0) += 1;
@@ -249,7 +248,7 @@ impl MoreLikeThis {
                    let val = field_value
                        .value()
                        .date_value()
-                        .ok_or(TantivyError::InvalidArgument("invalid value".to_string()))?
+                        .ok_or_else(|| TantivyError::InvalidArgument("invalid value".to_string()))?
                        .timestamp();
                    if !self.is_noise_word(val.to_string()) {
                        let term = Term::from_field_i64(field, val);
@@ -259,10 +258,9 @@ impl MoreLikeThis {
            }
            FieldType::I64(_) => {
                for field_value in field_values {
-                    let val = field_value
-                        .value()
-                        .i64_value()
-                        .ok_or(TantivyError::InvalidArgument("invalid value".to_string()))?;
+                    let val = field_value.value().i64_value().ok_or_else(|| {
+                        TantivyError::InvalidArgument("invalid value".to_string())
+                    })?;
                    if !self.is_noise_word(val.to_string()) {
                        let term = Term::from_field_i64(field, val);
                        *term_frequencies.entry(term).or_insert(0) += 1;
@@ -271,10 +269,9 @@ impl MoreLikeThis {
            }
            FieldType::F64(_) => {
                for field_value in field_values {
-                    let val = field_value
-                        .value()
-                        .f64_value()
-                        .ok_or(TantivyError::InvalidArgument("invalid value".to_string()))?;
+                    let val = field_value.value().f64_value().ok_or_else(|| {
+                        TantivyError::InvalidArgument("invalid value".to_string())
+                    })?;
                    if !self.is_noise_word(val.to_string()) {
                        let term = Term::from_field_f64(field, val);
                        *term_frequencies.entry(term).or_insert(0) += 1;
@@ -306,7 +303,7 @@ impl MoreLikeThis {
        {
            return true;
        }
-        return self.stop_words.contains(&word);
+        self.stop_words.contains(&word)
    }

    /// Couputes the score for each term while ignoring not useful terms
--- a/src/query/phrase_query/phrase_query.rs
+++ b/src/query/phrase_query/phrase_query.rs
@@ -1,3 +1,5 @@
+use std::collections::BTreeMap;
+
 use super::PhraseWeight;
 use crate::core::searcher::Searcher;
 use crate::query::bm25::Bm25Weight;
@@ -5,7 +7,6 @@ use crate::query::Query;
 use crate::query::Weight;
 use crate::schema::IndexRecordOption;
 use crate::schema::{Field, Term};
-use std::collections::BTreeSet;

 /// `PhraseQuery` matches a specific sequence of words.
 ///
@@ -113,9 +114,9 @@ impl Query for PhraseQuery {
        Ok(Box::new(phrase_weight))
    }

-    fn query_terms(&self, term_set: &mut BTreeSet<Term>) {
-        for (_, query_term) in &self.phrase_terms {
-            term_set.insert(query_term.clone());
+    fn query_terms(&self, terms: &mut BTreeMap<Term, bool>) {
+        for (_, term) in &self.phrase_terms {
+            terms.insert(term.clone(), true);
        }
    }
 }
--- a/src/query/query.rs
+++ b/src/query/query.rs
@@ -4,7 +4,7 @@ use crate::query::Explanation;
 use crate::DocAddress;
 use crate::Term;
 use downcast_rs::impl_downcast;
-use std::collections::BTreeSet;
+use std::collections::BTreeMap;
 use std::fmt;

 /// The `Query` trait defines a set of documents and a scoring method
@@ -68,7 +68,10 @@ pub trait Query: QueryClone + Send + Sync + downcast_rs::Downcast + fmt::Debug {

    /// Extract all of the terms associated to the query and insert them in the
    /// term set given in arguments.
-    fn query_terms(&self, _term_set: &mut BTreeSet<Term>) {}
+    ///
+    /// Each term is associated with a boolean indicating whether
+    /// Positions are required or not.
+    fn query_terms(&self, _term_set: &mut BTreeMap<Term, bool>) {}
 }

 /// Implements `box_clone`.
@@ -95,8 +98,8 @@ impl Query for Box<dyn Query> {
        self.as_ref().count(searcher)
    }

-    fn query_terms(&self, term_set: &mut BTreeSet<Term<Vec<u8>>>) {
-        self.as_ref().query_terms(term_set);
+    fn query_terms(&self, terms: &mut BTreeMap<Term, bool>) {
+        self.as_ref().query_terms(terms);
    }
 }

--- a/src/query/query_parser/query_parser.rs
+++ b/src/query/query_parser/query_parser.rs
@@ -8,7 +8,7 @@ use crate::query::Query;
 use crate::query::RangeQuery;
 use crate::query::TermQuery;
 use crate::query::{AllQuery, BoostQuery};
-use crate::schema::{Facet, IndexRecordOption};
+use crate::schema::{Facet, FacetParseError, IndexRecordOption};
 use crate::schema::{Field, Schema};
 use crate::schema::{FieldType, Term};
 use crate::tokenizer::TokenizerManager;
@@ -68,6 +68,9 @@ pub enum QueryParserError {
    /// The format for the date field is not RFC 3339 compliant.
    #[error("The date field has an invalid format")]
    DateFormatError(chrono::ParseError),
+    /// The format for the facet field is invalid.
+    #[error("The facet field is malformed: {0}")]
+    FacetFormatError(FacetParseError),
 }

 impl From<ParseIntError> for QueryParserError {
@@ -88,6 +91,12 @@ impl From<chrono::ParseError> for QueryParserError {
    }
 }

+impl From<FacetParseError> for QueryParserError {
+    fn from(err: FacetParseError) -> QueryParserError {
+        QueryParserError::FacetFormatError(err)
+    }
+}
+
 /// Recursively remove empty clause from the AST
 ///
 /// Returns `None` iff the `logical_ast` ended up being empty.
@@ -358,10 +367,10 @@ impl QueryParser {
                    ))
                }
            }
-            FieldType::HierarchicalFacet(_) => {
-                let facet = Facet::from_text(phrase);
-                Ok(vec![(0, Term::from_field_text(field, facet.encoded_str()))])
-            }
+            FieldType::HierarchicalFacet(_) => match Facet::from_text(phrase) {
+                Ok(facet) => Ok(vec![(0, Term::from_field_text(field, facet.encoded_str()))]),
+                Err(e) => Err(QueryParserError::from(e)),
+            },
            FieldType::Bytes(_) => {
                let bytes = base64::decode(phrase).map_err(QueryParserError::ExpectedBase64)?;
                let term = Term::from_field_bytes(field, &bytes);
@@ -1027,6 +1036,19 @@ mod test {
            .is_ok());
    }

+    #[test]
+    pub fn test_query_parser_expected_facet() {
+        let query_parser = make_query_parser();
+        match query_parser.parse_query("facet:INVALID") {
+            Ok(_) => panic!("should never succeed"),
+            Err(e) => assert_eq!(
+                "The facet field is malformed: Failed to parse the facet string: 'INVALID'",
+                format!("{}", e)
+            ),
+        }
+        assert!(query_parser.parse_query("facet:\"/foo/bar\"").is_ok());
+    }
+
    #[test]
    pub fn test_query_parser_not_empty_but_no_tokens() {
        let query_parser = make_query_parser();
--- a/src/query/term_query/term_query.rs
+++ b/src/query/term_query/term_query.rs
@@ -5,7 +5,7 @@ use crate::query::{Explanation, Query};
 use crate::schema::IndexRecordOption;
 use crate::Searcher;
 use crate::Term;
-use std::collections::BTreeSet;
+use std::collections::BTreeMap;
 use std::fmt;

 /// A Term query matches all of the documents
@@ -127,7 +127,7 @@ impl Query for TermQuery {
            self.specialized_weight(searcher, scoring_enabled)?,
        ))
    }
-    fn query_terms(&self, term_set: &mut BTreeSet<Term>) {
-        term_set.insert(self.term.clone());
+    fn query_terms(&self, terms: &mut BTreeMap<Term, bool>) {
+        terms.insert(self.term.clone(), false);
    }
 }
--- a/src/schema/facet.rs
+++ b/src/schema/facet.rs
@@ -20,6 +20,14 @@ pub const FACET_SEP_BYTE: u8 = 0u8;
 /// representation of facets. (It is the null codepoint.)
 pub const FACET_SEP_CHAR: char = '\u{0}';

+/// An error enum for facet parser.
+#[derive(Debug, PartialEq, Eq, Error)]
+pub enum FacetParseError {
+    /// The facet text representation is unparsable.
+    #[error("Failed to parse the facet string: '{0}'")]
+    FacetParseError(String),
+}
+
 /// A Facet represent a point in a given hierarchy.
 ///
 /// They are typically represented similarly to a filepath.
@@ -75,11 +83,47 @@ impl Facet {
    /// It is conceptually, if one of the steps of this path
    /// contains a `/` or a `\`, it should be escaped
    /// using an anti-slash `/`.
-    pub fn from_text<T>(path: &T) -> Facet
+    pub fn from_text<T>(path: &T) -> Result<Facet, FacetParseError>
    where
        T: ?Sized + AsRef<str>,
    {
-        From::from(path)
+        #[derive(Copy, Clone)]
+        enum State {
+            Escaped,
+            Idle,
+        }
+        let path_ref = path.as_ref();
+        if path_ref.is_empty() {
+            return Err(FacetParseError::FacetParseError(path_ref.to_string()));
+        }
+        if !path_ref.starts_with('/') {
+            return Err(FacetParseError::FacetParseError(path_ref.to_string()));
+        }
+        let mut facet_encoded = String::new();
+        let mut state = State::Idle;
+        let path_bytes = path_ref.as_bytes();
+        let mut last_offset = 1;
+        for i in 1..path_bytes.len() {
+            let c = path_bytes[i];
+            match (state, c) {
+                (State::Idle, ESCAPE_BYTE) => {
+                    facet_encoded.push_str(&path_ref[last_offset..i]);
+                    last_offset = i + 1;
+                    state = State::Escaped
+                }
+                (State::Idle, SLASH_BYTE) => {
+                    facet_encoded.push_str(&path_ref[last_offset..i]);
+                    facet_encoded.push(FACET_SEP_CHAR);
+                    last_offset = i + 1;
+                }
+                (State::Escaped, _escaped_char) => {
+                    state = State::Idle;
+                }
+                (State::Idle, _any_char) => {}
+            }
+        }
+        facet_encoded.push_str(&path_ref[last_offset..]);
+        Ok(Facet(facet_encoded))
    }

    /// Returns a `Facet` from an iterator over the different
@@ -137,39 +181,7 @@ impl Borrow<str> for Facet {

 impl<'a, T: ?Sized + AsRef<str>> From<&'a T> for Facet {
    fn from(path_asref: &'a T) -> Facet {
-        #[derive(Copy, Clone)]
-        enum State {
-            Escaped,
-            Idle,
-        }
-        let path: &str = path_asref.as_ref();
-        assert!(!path.is_empty());
-        assert!(path.starts_with('/'));
-        let mut facet_encoded = String::new();
-        let mut state = State::Idle;
-        let path_bytes = path.as_bytes();
-        let mut last_offset = 1;
-        for i in 1..path_bytes.len() {
-            let c = path_bytes[i];
-            match (state, c) {
-                (State::Idle, ESCAPE_BYTE) => {
-                    facet_encoded.push_str(&path[last_offset..i]);
-                    last_offset = i + 1;
-                    state = State::Escaped
-                }
-                (State::Idle, SLASH_BYTE) => {
-                    facet_encoded.push_str(&path[last_offset..i]);
-                    facet_encoded.push(FACET_SEP_CHAR);
-                    last_offset = i + 1;
-                }
-                (State::Escaped, _escaped_char) => {
-                    state = State::Idle;
-                }
-                (State::Idle, _any_char) => {}
-            }
-        }
-        facet_encoded.push_str(&path[last_offset..]);
-        Facet(facet_encoded)
+        Facet::from_text(path_asref).unwrap()
    }
 }

@@ -226,7 +238,7 @@ impl Debug for Facet {
 #[cfg(test)]
 mod tests {

-    use super::Facet;
+    use super::{Facet, FacetParseError};

    #[test]
    fn test_root() {
@@ -288,4 +300,12 @@ mod tests {
        let facet = Facet::from_path(v.iter());
        assert_eq!(facet.to_path_string(), "/");
    }
+
+    #[test]
+    fn test_from_text() {
+        assert_eq!(
+            Err(FacetParseError::FacetParseError("INVALID".to_string())),
+            Facet::from_text("INVALID")
+        );
+    }
 }
--- a/src/schema/mod.rs
+++ b/src/schema/mod.rs
@@ -128,6 +128,7 @@ pub use self::schema::{Schema, SchemaBuilder};
 pub use self::value::Value;

 pub use self::facet::Facet;
+pub use self::facet::FacetParseError;
 pub(crate) use self::facet::FACET_SEP_BYTE;
 pub use self::facet_options::FacetOptions;

--- a/src/snippet/mod.rs
+++ b/src/snippet/mod.rs
@@ -7,7 +7,6 @@ use crate::{Document, Score};
 use htmlescape::encode_minimal;
 use std::cmp::Ordering;
 use std::collections::BTreeMap;
-use std::collections::BTreeSet;
 use std::ops::Range;

 const DEFAULT_MAX_NUM_CHARS: usize = 150;
@@ -239,10 +238,10 @@ impl SnippetGenerator {
        query: &dyn Query,
        field: Field,
    ) -> crate::Result<SnippetGenerator> {
-        let mut terms = BTreeSet::new();
+        let mut terms = BTreeMap::new();
        query.query_terms(&mut terms);
        let mut terms_text: BTreeMap<String, Score> = Default::default();
-        for term in terms {
+        for (term, _) in terms {
            if term.field() != field {
                continue;
            }
--- a/src/store/compression_brotli.rs
+++ b/src/store/compression_brotli.rs
@@ -1,10 +1,6 @@
 use std::io;

-/// Name of the compression scheme used in the doc store.
-///
-/// This name is appended to the version string of tantivy.
-pub const COMPRESSION: &'static str = "brotli";
-
+#[inline]
 pub fn compress(mut uncompressed: &[u8], compressed: &mut Vec<u8>) -> io::Result<()> {
    let mut params = brotli::enc::BrotliEncoderParams::default();
    params.quality = 5;
@@ -13,6 +9,7 @@ pub fn compress(mut uncompressed: &[u8], compressed: &mut Vec<u8>) -> io::Result
    Ok(())
 }

+#[inline]
 pub fn decompress(mut compressed: &[u8], decompressed: &mut Vec<u8>) -> io::Result<()> {
    decompressed.clear();
    brotli::BrotliDecompress(&mut compressed, decompressed)?;
--- a/src/store/compression_lz4.rs
+++ b/src/store/compression_lz4.rs
@@ -1,22 +0,0 @@
-use std::io::{self, Read, Write};
-
-/// Name of the compression scheme used in the doc store.
-///
-/// This name is appended to the version string of tantivy.
-pub const COMPRESSION: &str = "lz4";
-
-pub fn compress(uncompressed: &[u8], compressed: &mut Vec<u8>) -> io::Result<()> {
-    compressed.clear();
-    let mut encoder = lz4::EncoderBuilder::new().build(compressed)?;
-    encoder.write_all(&uncompressed)?;
-    let (_, encoder_result) = encoder.finish();
-    encoder_result?;
-    Ok(())
-}
-
-pub fn decompress(compressed: &[u8], decompressed: &mut Vec<u8>) -> io::Result<()> {
-    decompressed.clear();
-    let mut decoder = lz4::Decoder::new(compressed)?;
-    decoder.read_to_end(decompressed)?;
-    Ok(())
-}
--- a/src/store/compression_lz4_block.rs
+++ b/src/store/compression_lz4_block.rs
@@ -2,38 +2,46 @@ use std::io::{self};

 use core::convert::TryInto;
 use lz4_flex::{compress_into, decompress_into};
-/// Name of the compression scheme used in the doc store.
-///
-/// This name is appended to the version string of tantivy.
-pub const COMPRESSION: &str = "lz4_block";

+#[inline]
 pub fn compress(uncompressed: &[u8], compressed: &mut Vec<u8>) -> io::Result<()> {
    compressed.clear();
+    let maximum_ouput_size = lz4_flex::block::get_maximum_output_size(uncompressed.len());
+    compressed.reserve(maximum_ouput_size);

-    compressed.extend_from_slice(&[0, 0, 0, 0]);
-    compress_into(uncompressed, compressed);
+    unsafe {
+        compressed.set_len(maximum_ouput_size + 4);
+    }
+    let bytes_written = compress_into(uncompressed, compressed, 4)
+        .map_err(|err| io::Error::new(io::ErrorKind::InvalidData, err.to_string()))?;
    let num_bytes = uncompressed.len() as u32;
    compressed[0..4].copy_from_slice(&num_bytes.to_le_bytes());
+    unsafe {
+        compressed.set_len(bytes_written + 4);
+    }
    Ok(())
 }

+#[inline]
 pub fn decompress(compressed: &[u8], decompressed: &mut Vec<u8>) -> io::Result<()> {
    decompressed.clear();
-    //next lz4_flex version will support slice as input parameter.
-    //this will make the usage much less ugly
    let uncompressed_size_bytes: &[u8; 4] = compressed
        .get(..4)
        .ok_or(io::ErrorKind::InvalidData)?
        .try_into()
        .unwrap();
    let uncompressed_size = u32::from_le_bytes(*uncompressed_size_bytes) as usize;
-    // reserve more than required, because blocked writes may write out of bounds, will be improved
-    // with lz4_flex 1.0
-    decompressed.reserve(uncompressed_size + 4 + 24);
+    decompressed.reserve(uncompressed_size);
    unsafe {
        decompressed.set_len(uncompressed_size);
    }
-    decompress_into(&compressed[4..], decompressed)
+    let bytes_written = decompress_into(&compressed[4..], decompressed, 0)
        .map_err(|err| io::Error::new(io::ErrorKind::InvalidData, err.to_string()))?;
+    if bytes_written != uncompressed_size {
+        return Err(io::Error::new(
+            io::ErrorKind::InvalidData,
+            "doc store block not completely decompressed, data corruption".to_string(),
+        ));
+    }
    Ok(())
 }
--- a/src/store/compression_snap.rs
+++ b/src/store/compression_snap.rs
@@ -1,10 +1,6 @@
 use std::io::{self, Read, Write};

-/// Name of the compression scheme used in the doc store.
-///
-/// This name is appended to the version string of tantivy.
-pub const COMPRESSION: &str = "snappy";
-
+#[inline]
 pub fn compress(uncompressed: &[u8], compressed: &mut Vec<u8>) -> io::Result<()> {
    compressed.clear();
    let mut encoder = snap::write::FrameEncoder::new(compressed);
@@ -13,6 +9,7 @@ pub fn compress(uncompressed: &[u8], compressed: &mut Vec<u8>) -> io::Result<()>
    Ok(())
 }

+#[inline]
 pub fn decompress(compressed: &[u8], decompressed: &mut Vec<u8>) -> io::Result<()> {
    decompressed.clear();
    snap::read::FrameDecoder::new(compressed).read_to_end(decompressed)?;
--- a/src/store/compressors.rs
+++ b/src/store/compressors.rs
@@ -0,0 +1,134 @@
+use serde::{Deserialize, Serialize};
+use std::io;
+
+pub trait StoreCompressor {
+    fn compress(&self, uncompressed: &[u8], compressed: &mut Vec<u8>) -> io::Result<()>;
+    fn decompress(&self, compressed: &[u8], decompressed: &mut Vec<u8>) -> io::Result<()>;
+    fn get_compressor_id() -> u8;
+}
+
+/// Compressor can be used on `IndexSettings` to choose
+/// the compressor used to compress the doc store.
+///
+/// The default is Lz4Block, but also depends on the enabled feature flags.
+#[derive(Clone, Debug, Copy, PartialEq, Eq, Serialize, Deserialize)]
+pub enum Compressor {
+    #[serde(rename = "lz4")]
+    /// Use the lz4 compressor (block format)
+    Lz4,
+    #[serde(rename = "brotli")]
+    /// Use the brotli compressor
+    Brotli,
+    #[serde(rename = "snappy")]
+    /// Use the snap compressor
+    Snappy,
+}
+
+impl Default for Compressor {
+    fn default() -> Self {
+        if cfg!(feature = "lz4-compression") {
+            Compressor::Lz4
+        } else if cfg!(feature = "brotli-compression") {
+            Compressor::Brotli
+        } else if cfg!(feature = "snappy-compression") {
+            Compressor::Snappy
+        } else {
+            panic!(
+                "all compressor feature flags like are disabled (e.g. lz4-compression), can't choose default compressor"
+            );
+        }
+    }
+}
+
+impl Compressor {
+    pub(crate) fn from_id(id: u8) -> Compressor {
+        match id {
+            1 => Compressor::Lz4,
+            2 => Compressor::Brotli,
+            3 => Compressor::Snappy,
+            _ => panic!("unknown compressor id {:?}", id),
+        }
+    }
+    pub(crate) fn get_id(&self) -> u8 {
+        match self {
+            Self::Lz4 => 1,
+            Self::Brotli => 2,
+            Self::Snappy => 3,
+        }
+    }
+    #[inline]
+    pub(crate) fn compress(&self, uncompressed: &[u8], compressed: &mut Vec<u8>) -> io::Result<()> {
+        match self {
+            Self::Lz4 => {
+                #[cfg(feature = "lz4-compression")]
+                {
+                    super::compression_lz4_block::compress(uncompressed, compressed)
+                }
+                #[cfg(not(feature = "lz4-compression"))]
+                {
+                    panic!("lz4-compression feature flag not activated");
+                }
+            }
+            Self::Brotli => {
+                #[cfg(feature = "brotli-compression")]
+                {
+                    super::compression_brotli::compress(uncompressed, compressed)
+                }
+                #[cfg(not(feature = "brotli-compression"))]
+                {
+                    panic!("brotli-compression-compression feature flag not activated");
+                }
+            }
+            Self::Snappy => {
+                #[cfg(feature = "snappy-compression")]
+                {
+                    super::compression_snap::compress(uncompressed, compressed)
+                }
+                #[cfg(not(feature = "snappy-compression"))]
+                {
+                    panic!("snappy-compression feature flag not activated");
+                }
+            }
+        }
+    }
+
+    #[inline]
+    pub(crate) fn decompress(
+        &self,
+        compressed: &[u8],
+        decompressed: &mut Vec<u8>,
+    ) -> io::Result<()> {
+        match self {
+            Self::Lz4 => {
+                #[cfg(feature = "lz4-compression")]
+                {
+                    super::compression_lz4_block::decompress(compressed, decompressed)
+                }
+                #[cfg(not(feature = "lz4-compression"))]
+                {
+                    panic!("lz4-compression feature flag not activated");
+                }
+            }
+            Self::Brotli => {
+                #[cfg(feature = "brotli-compression")]
+                {
+                    super::compression_brotli::decompress(compressed, decompressed)
+                }
+                #[cfg(not(feature = "brotli-compression"))]
+                {
+                    panic!("brotli-compression feature flag not activated");
+                }
+            }
+            Self::Snappy => {
+                #[cfg(feature = "snappy-compression")]
+                {
+                    super::compression_snap::decompress(compressed, decompressed)
+                }
+                #[cfg(not(feature = "snappy-compression"))]
+                {
+                    panic!("snappy-compression feature flag not activated");
+                }
+            }
+        }
+    }
+}
--- a/src/store/footer.rs
+++ b/src/store/footer.rs
@@ -0,0 +1,69 @@
+use crate::{
+    common::{BinarySerializable, FixedSize, HasLen},
+    directory::FileSlice,
+    store::Compressor,
+};
+use std::io;
+
+#[derive(Debug, Clone, PartialEq)]
+pub struct DocStoreFooter {
+    pub offset: u64,
+    pub compressor: Compressor,
+}
+
+/// Serialises the footer to a byte-array
+/// - offset : 8 bytes
+///-  compressor id: 1 byte
+/// - reserved for future use: 15 bytes
+impl BinarySerializable for DocStoreFooter {
+    fn serialize<W: io::Write>(&self, writer: &mut W) -> io::Result<()> {
+        BinarySerializable::serialize(&self.offset, writer)?;
+        BinarySerializable::serialize(&self.compressor.get_id(), writer)?;
+        writer.write_all(&[0; 15])?;
+        Ok(())
+    }
+
+    fn deserialize<R: io::Read>(reader: &mut R) -> io::Result<Self> {
+        let offset = u64::deserialize(reader)?;
+        let compressor_id = u8::deserialize(reader)?;
+        let mut skip_buf = [0; 15];
+        reader.read_exact(&mut skip_buf)?;
+        Ok(DocStoreFooter {
+            offset,
+            compressor: Compressor::from_id(compressor_id),
+        })
+    }
+}
+
+impl FixedSize for DocStoreFooter {
+    const SIZE_IN_BYTES: usize = 24;
+}
+
+impl DocStoreFooter {
+    pub fn new(offset: u64, compressor: Compressor) -> Self {
+        DocStoreFooter { offset, compressor }
+    }
+
+    pub fn extract_footer(file: FileSlice) -> io::Result<(DocStoreFooter, FileSlice)> {
+        if file.len() < DocStoreFooter::SIZE_IN_BYTES {
+            return Err(io::Error::new(
+                io::ErrorKind::UnexpectedEof,
+                format!(
+                    "File corrupted. The file is smaller than Footer::SIZE_IN_BYTES (len={}).",
+                    file.len()
+                ),
+            ));
+        }
+        let (body, footer_slice) = file.split_from_end(DocStoreFooter::SIZE_IN_BYTES);
+        let mut footer_bytes = footer_slice.read_bytes()?;
+        let footer = DocStoreFooter::deserialize(&mut footer_bytes)?;
+        Ok((footer, body))
+    }
+}
+
+#[test]
+fn doc_store_footer_test() {
+    // This test is just to safe guard changes on the footer.
+    // When the doc store footer is updated, make sure to update also the serialize/deserialize methods
+    assert_eq!(core::mem::size_of::<DocStoreFooter>(), 16);
+}
--- a/src/store/mod.rs
+++ b/src/store/mod.rs
@@ -33,72 +33,32 @@ and should rely on either

 !*/

+mod compressors;
+mod footer;
 mod index;
 mod reader;
 mod writer;
+pub use self::compressors::Compressor;
 pub use self::reader::StoreReader;
 pub use self::writer::StoreWriter;

-// compile_error doesn't scale very well, enum like feature flags would be great to have in Rust
-#[cfg(all(feature = "lz4", feature = "brotli"))]
-compile_error!("feature `lz4` or `brotli` must not be enabled together.");
-
-#[cfg(all(feature = "lz4_block", feature = "brotli"))]
-compile_error!("feature `lz4_block` or `brotli` must not be enabled together.");
-
-#[cfg(all(feature = "lz4_block", feature = "lz4"))]
-compile_error!("feature `lz4_block` or `lz4` must not be enabled together.");
-
-#[cfg(all(feature = "lz4_block", feature = "snap"))]
-compile_error!("feature `lz4_block` or `snap` must not be enabled together.");
-
-#[cfg(all(feature = "lz4", feature = "snap"))]
-compile_error!("feature `lz4` or `snap` must not be enabled together.");
-
-#[cfg(all(feature = "brotli", feature = "snap"))]
-compile_error!("feature `brotli` or `snap` must not be enabled together.");
-
-#[cfg(not(any(
-    feature = "lz4",
-    feature = "brotli",
-    feature = "lz4_flex",
-    feature = "snap"
-)))]
-compile_error!("all compressors are deactivated via feature-flags, check Cargo.toml for available decompressors.");
-
-#[cfg(feature = "lz4_flex")]
+#[cfg(feature = "lz4-compression")]
 mod compression_lz4_block;
-#[cfg(feature = "lz4_flex")]
-pub use self::compression_lz4_block::COMPRESSION;
-#[cfg(feature = "lz4_flex")]
-use self::compression_lz4_block::{compress, decompress};

-#[cfg(feature = "lz4")]
-mod compression_lz4;
-#[cfg(feature = "lz4")]
-pub use self::compression_lz4::COMPRESSION;
-#[cfg(feature = "lz4")]
-use self::compression_lz4::{compress, decompress};
-
-#[cfg(feature = "brotli")]
+#[cfg(feature = "brotli-compression")]
 mod compression_brotli;
-#[cfg(feature = "brotli")]
-pub use self::compression_brotli::COMPRESSION;
-#[cfg(feature = "brotli")]
-use self::compression_brotli::{compress, decompress};

-#[cfg(feature = "snap")]
+#[cfg(feature = "snappy-compression")]
 mod compression_snap;
-#[cfg(feature = "snap")]
-pub use self::compression_snap::COMPRESSION;
-#[cfg(feature = "snap")]
-use self::compression_snap::{compress, decompress};

 #[cfg(test)]
 pub mod tests {

+    use futures::executor::block_on;
+
    use super::*;
-    use crate::schema::{self, FieldValue, TextFieldIndexing};
+    use crate::fastfield::DeleteBitSet;
+    use crate::schema::{self, FieldValue, TextFieldIndexing, STORED, TEXT};
    use crate::schema::{Document, TextOptions};
    use crate::{
        directory::{Directory, RamDirectory, WritePtr},
@@ -107,28 +67,31 @@ pub mod tests {
    use crate::{schema::Schema, Index};
    use std::path::Path;

-    pub fn write_lorem_ipsum_store(writer: WritePtr, num_docs: usize) -> Schema {
-        let mut schema_builder = Schema::builder();
-        let field_body = schema_builder.add_text_field("body", TextOptions::default().set_stored());
-        let field_title =
-            schema_builder.add_text_field("title", TextOptions::default().set_stored());
-        let schema = schema_builder.build();
-        let lorem = String::from(
-            "Doc Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed \
+    const LOREM: &str = "Doc Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed \
             do eiusmod tempor incididunt ut labore et dolore magna aliqua. \
             Ut enim ad minim veniam, quis nostrud exercitation ullamco \
             laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure \
             dolor in reprehenderit in voluptate velit esse cillum dolore eu \
             fugiat nulla pariatur. Excepteur sint occaecat cupidatat non \
             proident, sunt in culpa qui officia deserunt mollit anim id est \
-             laborum.",
-        );
+             laborum.";
+
+    pub fn write_lorem_ipsum_store(
+        writer: WritePtr,
+        num_docs: usize,
+        compressor: Compressor,
+    ) -> Schema {
+        let mut schema_builder = Schema::builder();
+        let field_body = schema_builder.add_text_field("body", TextOptions::default().set_stored());
+        let field_title =
+            schema_builder.add_text_field("title", TextOptions::default().set_stored());
+        let schema = schema_builder.build();
        {
-            let mut store_writer = StoreWriter::new(writer);
+            let mut store_writer = StoreWriter::new(writer, compressor);
            for i in 0..num_docs {
                let mut fields: Vec<FieldValue> = Vec::new();
                {
-                    let field_value = FieldValue::new(field_body, From::from(lorem.clone()));
+                    let field_value = FieldValue::new(field_body, From::from(LOREM.to_string()));
                    fields.push(field_value);
                }
                {
@@ -145,16 +108,61 @@ pub mod tests {
        schema
    }

+    const NUM_DOCS: usize = 1_000;
    #[test]
-    fn test_store() -> crate::Result<()> {
+    fn test_doc_store_iter_with_delete_bug_1077() -> crate::Result<()> {
+        // this will cover deletion of the first element in a checkpoint
+        let deleted_docids = (200..300).collect::<Vec<_>>();
+        let delete_bitset = DeleteBitSet::for_test(&deleted_docids, NUM_DOCS as u32);
+
        let path = Path::new("store");
        let directory = RamDirectory::create();
        let store_wrt = directory.open_write(path)?;
-        let schema = write_lorem_ipsum_store(store_wrt, 1_000);
+        let schema = write_lorem_ipsum_store(store_wrt, NUM_DOCS, Compressor::Lz4);
        let field_title = schema.get_field("title").unwrap();
        let store_file = directory.open_read(path)?;
        let store = StoreReader::open(store_file)?;
-        for i in 0..1_000 {
+        for i in 0..NUM_DOCS as u32 {
+            assert_eq!(
+                *store
+                    .get(i)?
+                    .get_first(field_title)
+                    .unwrap()
+                    .text()
+                    .unwrap(),
+                format!("Doc {}", i)
+            );
+        }
+
+        for (_, doc) in store.iter(Some(&delete_bitset)).enumerate() {
+            let doc = doc?;
+            let title_content = doc.get_first(field_title).unwrap().text().unwrap();
+            if !title_content.starts_with("Doc ") {
+                panic!("unexpected title_content {}", title_content);
+            }
+
+            let id = title_content
+                .strip_prefix("Doc ")
+                .unwrap()
+                .parse::<u32>()
+                .unwrap();
+            if delete_bitset.is_deleted(id) {
+                panic!("unexpected deleted document {}", id);
+            }
+        }
+
+        Ok(())
+    }
+
+    fn test_store(compressor: Compressor) -> crate::Result<()> {
+        let path = Path::new("store");
+        let directory = RamDirectory::create();
+        let store_wrt = directory.open_write(path)?;
+        let schema = write_lorem_ipsum_store(store_wrt, NUM_DOCS, compressor);
+        let field_title = schema.get_field("title").unwrap();
+        let store_file = directory.open_read(path)?;
+        let store = StoreReader::open(store_file)?;
+        for i in 0..NUM_DOCS as u32 {
            assert_eq!(
                *store
                    .get(i)?
@@ -174,6 +182,22 @@ pub mod tests {
        Ok(())
    }

+    #[cfg(feature = "lz4-compression")]
+    #[test]
+    fn test_store_lz4_block() -> crate::Result<()> {
+        test_store(Compressor::Lz4)
+    }
+    #[cfg(feature = "snappy-compression")]
+    #[test]
+    fn test_store_snap() -> crate::Result<()> {
+        test_store(Compressor::Snappy)
+    }
+    #[cfg(feature = "brotli-compression")]
+    #[test]
+    fn test_store_brotli() -> crate::Result<()> {
+        test_store(Compressor::Brotli)
+    }
+
    #[test]
    fn test_store_with_delete() -> crate::Result<()> {
        let mut schema_builder = schema::Schema::builder();
@@ -214,6 +238,108 @@ pub mod tests {
        }
        Ok(())
    }
+
+    #[cfg(feature = "snappy-compression")]
+    #[cfg(feature = "lz4-compression")]
+    #[test]
+    fn test_merge_with_changed_compressor() -> crate::Result<()> {
+        let mut schema_builder = schema::Schema::builder();
+
+        let text_field = schema_builder.add_text_field("text_field", TEXT | STORED);
+        let schema = schema_builder.build();
+        let index_builder = Index::builder().schema(schema);
+
+        let mut index = index_builder.create_in_ram().unwrap();
+        index.settings_mut().docstore_compression = Compressor::Lz4;
+        {
+            let mut index_writer = index.writer_for_tests().unwrap();
+            // put enough data create enough blocks in the doc store to be considered for stacking
+            for _ in 0..200 {
+                index_writer.add_document(doc!(text_field=> LOREM));
+            }
+            assert!(index_writer.commit().is_ok());
+            for _ in 0..200 {
+                index_writer.add_document(doc!(text_field=> LOREM));
+            }
+            assert!(index_writer.commit().is_ok());
+        }
+        assert_eq!(
+            index.reader().unwrap().searcher().segment_readers()[0]
+                .get_store_reader()
+                .unwrap()
+                .compressor(),
+            Compressor::Lz4
+        );
+        // Change compressor, this disables stacking on merging
+        let index_settings = index.settings_mut();
+        index_settings.docstore_compression = Compressor::Snappy;
+        // Merging the segments
+        {
+            let segment_ids = index
+                .searchable_segment_ids()
+                .expect("Searchable segments failed.");
+            let mut index_writer = index.writer_for_tests().unwrap();
+            assert!(block_on(index_writer.merge(&segment_ids)).is_ok());
+            assert!(index_writer.wait_merging_threads().is_ok());
+        }
+
+        let searcher = index.reader().unwrap().searcher();
+        assert_eq!(searcher.segment_readers().len(), 1);
+        let reader = searcher.segment_readers().iter().last().unwrap();
+        let store = reader.get_store_reader().unwrap();
+
+        for doc in store.iter(reader.delete_bitset()).take(50) {
+            assert_eq!(
+                *doc?.get_first(text_field).unwrap().text().unwrap(),
+                LOREM.to_string()
+            );
+        }
+        assert_eq!(store.compressor(), Compressor::Snappy);
+
+        Ok(())
+    }
+
+    #[test]
+    fn test_merge_of_small_segments() -> crate::Result<()> {
+        let mut schema_builder = schema::Schema::builder();
+
+        let text_field = schema_builder.add_text_field("text_field", TEXT | STORED);
+        let schema = schema_builder.build();
+        let index_builder = Index::builder().schema(schema);
+
+        let index = index_builder.create_in_ram().unwrap();
+
+        {
+            let mut index_writer = index.writer_for_tests().unwrap();
+
+            index_writer.add_document(doc!(text_field=> "1"));
+            assert!(index_writer.commit().is_ok());
+            index_writer.add_document(doc!(text_field=> "2"));
+            assert!(index_writer.commit().is_ok());
+            index_writer.add_document(doc!(text_field=> "3"));
+            assert!(index_writer.commit().is_ok());
+            index_writer.add_document(doc!(text_field=> "4"));
+            assert!(index_writer.commit().is_ok());
+            index_writer.add_document(doc!(text_field=> "5"));
+            assert!(index_writer.commit().is_ok());
+        }
+        // Merging the segments
+        {
+            let segment_ids = index
+                .searchable_segment_ids()
+                .expect("Searchable segments failed.");
+            let mut index_writer = index.writer_for_tests().unwrap();
+            assert!(block_on(index_writer.merge(&segment_ids)).is_ok());
+            assert!(index_writer.wait_merging_threads().is_ok());
+        }
+
+        let searcher = index.reader().unwrap().searcher();
+        assert_eq!(searcher.segment_readers().len(), 1);
+        let reader = searcher.segment_readers().iter().last().unwrap();
+        let store = reader.get_store_reader().unwrap();
+        assert_eq!(store.block_checkpoints().count(), 1);
+        Ok(())
+    }
 }

 #[cfg(all(test, feature = "unstable"))]
@@ -222,6 +348,7 @@ mod bench {
    use super::tests::write_lorem_ipsum_store;
    use crate::directory::Directory;
    use crate::directory::RamDirectory;
+    use crate::store::Compressor;
    use crate::store::StoreReader;
    use std::path::Path;
    use test::Bencher;
@@ -232,7 +359,11 @@ mod bench {
        let directory = RamDirectory::create();
        let path = Path::new("store");
        b.iter(|| {
-            write_lorem_ipsum_store(directory.open_write(path).unwrap(), 1_000);
+            write_lorem_ipsum_store(
+                directory.open_write(path).unwrap(),
+                1_000,
+                Compressor::default(),
+            );
            directory.delete(path).unwrap();
        });
    }
@@ -241,11 +372,13 @@ mod bench {
    fn bench_store_decode(b: &mut Bencher) {
        let directory = RamDirectory::create();
        let path = Path::new("store");
-        write_lorem_ipsum_store(directory.open_write(path).unwrap(), 1_000);
+        write_lorem_ipsum_store(
+            directory.open_write(path).unwrap(),
+            1_000,
+            Compressor::default(),
+        );
        let store_file = directory.open_read(path).unwrap();
        let store = StoreReader::open(store_file).unwrap();
-        b.iter(|| {
-            store.get(12).unwrap();
-        });
+        b.iter(|| store.iter(None).collect::<Vec<_>>());
    }
 }
--- a/src/store/reader.rs
+++ b/src/store/reader.rs
@@ -1,15 +1,17 @@
-use super::decompress;
-use super::index::SkipIndex;
-use crate::common::{BinarySerializable, HasLen};
+use super::Compressor;
+use super::{footer::DocStoreFooter, index::SkipIndex};
 use crate::directory::{FileSlice, OwnedBytes};
 use crate::schema::Document;
 use crate::space_usage::StoreSpaceUsage;
 use crate::store::index::Checkpoint;
 use crate::DocId;
-use crate::{common::VInt, fastfield::DeleteBitSet};
+use crate::{
+    common::{BinarySerializable, HasLen, VInt},
+    error::DataCorruption,
+    fastfield::DeleteBitSet,
+};
 use lru::LruCache;
 use std::io;
-use std::mem::size_of;
 use std::sync::atomic::{AtomicUsize, Ordering};
 use std::sync::{Arc, Mutex};

@@ -21,6 +23,7 @@ type BlockCache = Arc<Mutex<LruCache<usize, Block>>>;

 /// Reads document off tantivy's [`Store`](./index.html)
 pub struct StoreReader {
+    compressor: Compressor,
    data: FileSlice,
    cache: BlockCache,
    cache_hits: Arc<AtomicUsize>,
@@ -32,11 +35,14 @@ pub struct StoreReader {
 impl StoreReader {
    /// Opens a store reader
    pub fn open(store_file: FileSlice) -> io::Result<StoreReader> {
-        let (data_file, offset_index_file) = split_file(store_file)?;
+        let (footer, data_and_offset) = DocStoreFooter::extract_footer(store_file)?;
+
+        let (data_file, offset_index_file) = data_and_offset.split(footer.offset as usize);
        let index_data = offset_index_file.read_bytes()?;
        let space_usage = StoreSpaceUsage::new(data_file.len(), offset_index_file.len());
        let skip_index = SkipIndex::open(index_data);
        Ok(StoreReader {
+            compressor: footer.compressor,
            data: data_file,
            cache: Arc::new(Mutex::new(LruCache::new(LRU_CACHE_CAPACITY))),
            cache_hits: Default::default(),
@@ -50,6 +56,10 @@ impl StoreReader {
        self.skip_index.checkpoints()
    }

+    pub(crate) fn compressor(&self) -> Compressor {
+        self.compressor
+    }
+
    fn block_checkpoint(&self, doc_id: DocId) -> Option<Checkpoint> {
        self.skip_index.seek(doc_id)
    }
@@ -72,7 +82,8 @@ impl StoreReader {

        let compressed_block = self.compressed_block(checkpoint)?;
        let mut decompressed_block = vec![];
-        decompress(compressed_block.as_slice(), &mut decompressed_block)?;
+        self.compressor
+            .decompress(compressed_block.as_slice(), &mut decompressed_block)?;

        let block = OwnedBytes::new(decompressed_block);
        self.cache
@@ -152,48 +163,55 @@ impl StoreReader {
        let mut curr_checkpoint = checkpoint_block_iter.next();
        let mut curr_block = curr_checkpoint
            .as_ref()
-            .map(|checkpoint| self.read_block(&checkpoint));
+            .map(|checkpoint| self.read_block(&checkpoint).map_err(|e| e.kind())); // map error in order to enable cloning
        let mut block_start_pos = 0;
        let mut num_skipped = 0;
+        let mut reset_block_pos = false;
        (0..last_docid)
            .filter_map(move |doc_id| {
                // filter_map is only used to resolve lifetime issues between the two closures on
                // the outer variables
-                let alive = delete_bitset.map_or(true, |bitset| bitset.is_alive(doc_id));
-                if !alive {
-                    // we keep the number of skipped documents to move forward in the map block
-                    num_skipped += 1;
-                }
+
                // check move to next checkpoint
-                let mut reset_block_pos = false;
                if doc_id >= curr_checkpoint.as_ref().unwrap().doc_range.end {
                    curr_checkpoint = checkpoint_block_iter.next();
                    curr_block = curr_checkpoint
                        .as_ref()
-                        .map(|checkpoint| self.read_block(&checkpoint));
+                        .map(|checkpoint| self.read_block(&checkpoint).map_err(|e| e.kind()));
                    reset_block_pos = true;
                    num_skipped = 0;
                }

+                let alive = delete_bitset.map_or(true, |bitset| bitset.is_alive(doc_id));
                if alive {
-                    let ret = Some((
-                        curr_block.as_ref().unwrap().as_ref().unwrap().clone(), // todo forward errors
-                        num_skipped,
-                        reset_block_pos,
-                    ));
+                    let ret = Some((curr_block.clone(), num_skipped, reset_block_pos));
                    // the map block will move over the num_skipped, so we reset to 0
                    num_skipped = 0;
+                    reset_block_pos = false;
                    ret
                } else {
+                    // we keep the number of skipped documents to move forward in the map block
+                    num_skipped += 1;
                    None
                }
            })
            .map(move |(block, num_skipped, reset_block_pos)| {
+                let block = block
+                    .ok_or_else(|| {
+                        DataCorruption::comment_only(
+                            "the current checkpoint in the doc store iterator is none, this should never happen",
+                        )
+                    })?
+                    .map_err(|error_kind| {
+                        std::io::Error::new(error_kind, "error when reading block in doc store")
+                    })?;
+                // this flag is set, when filter_map moved to the next block
                if reset_block_pos {
                    block_start_pos = 0;
                }
                let mut cursor = &block[block_start_pos..];
                let mut pos = 0;
+                // move forward 1 doc + num_skipped in block and return length of current doc
                let doc_length = loop {
                    let doc_length = VInt::deserialize(&mut cursor)?.val() as usize;
                    let num_bytes_read = block[block_start_pos..].len() - cursor.len();
@@ -220,14 +238,6 @@ impl StoreReader {
    }
 }

-fn split_file(data: FileSlice) -> io::Result<(FileSlice, FileSlice)> {
-    let (data, footer_len_bytes) = data.split_from_end(size_of::<u64>());
-    let serialized_offset: OwnedBytes = footer_len_bytes.read_bytes()?;
-    let mut serialized_offset_buf = serialized_offset.as_slice();
-    let offset = u64::deserialize(&mut serialized_offset_buf)? as usize;
-    Ok(data.split(offset))
-}
-
 #[cfg(test)]
 mod tests {
    use super::*;
@@ -245,7 +255,7 @@ mod tests {
        let directory = RamDirectory::create();
        let path = Path::new("store");
        let writer = directory.open_write(path)?;
-        let schema = write_lorem_ipsum_store(writer, 500);
+        let schema = write_lorem_ipsum_store(writer, 500, Compressor::default());
        let title = schema.get_field("title").unwrap();
        let store_file = directory.open_read(path)?;
        let store = StoreReader::open(store_file)?;
@@ -300,7 +310,7 @@ mod tests {
                .unwrap()
                .peek_lru()
                .map(|(&k, _)| k as usize),
-            Some(9249)
+            Some(9210)
        );

        Ok(())
--- a/src/store/writer.rs
+++ b/src/store/writer.rs
@@ -1,6 +1,6 @@
-use super::compress;
 use super::index::SkipIndexBuilder;
 use super::StoreReader;
+use super::{compressors::Compressor, footer::DocStoreFooter};
 use crate::common::CountingWriter;
 use crate::common::{BinarySerializable, VInt};
 use crate::directory::TerminatingWrite;
@@ -21,6 +21,7 @@ const BLOCK_SIZE: usize = 16_384;
 /// The skip list index on the other hand, is built in memory.
 ///
 pub struct StoreWriter {
+    compressor: Compressor,
    doc: DocId,
    first_doc_in_block: DocId,
    offset_index_writer: SkipIndexBuilder,
@@ -34,8 +35,9 @@ impl StoreWriter {
    ///
    /// The store writer will writes blocks on disc as
    /// document are added.
-    pub fn new(writer: WritePtr) -> StoreWriter {
+    pub fn new(writer: WritePtr, compressor: Compressor) -> StoreWriter {
        StoreWriter {
+            compressor,
            doc: 0,
            first_doc_in_block: 0,
            offset_index_writer: SkipIndexBuilder::new(),
@@ -45,6 +47,10 @@ impl StoreWriter {
        }
    }

+    pub(crate) fn compressor(&self) -> Compressor {
+        self.compressor
+    }
+
    /// The memory used (inclusive childs)
    pub fn mem_usage(&self) -> usize {
        self.intermediary_buffer.capacity() + self.current_block.capacity()
@@ -125,7 +131,8 @@ impl StoreWriter {
    fn write_and_compress_block(&mut self) -> io::Result<()> {
        assert!(self.doc > 0);
        self.intermediary_buffer.clear();
-        compress(&self.current_block[..], &mut self.intermediary_buffer)?;
+        self.compressor
+            .compress(&self.current_block[..], &mut self.intermediary_buffer)?;
        let start_offset = self.writer.written_bytes() as usize;
        self.writer.write_all(&self.intermediary_buffer)?;
        let end_offset = self.writer.written_bytes() as usize;
@@ -147,8 +154,9 @@ impl StoreWriter {
            self.write_and_compress_block()?;
        }
        let header_offset: u64 = self.writer.written_bytes() as u64;
+        let footer = DocStoreFooter::new(header_offset, self.compressor);
        self.offset_index_writer.write(&mut self.writer)?;
-        header_offset.serialize(&mut self.writer)?;
+        footer.serialize(&mut self.writer)?;
        self.writer.terminate()
    }
 }
--- a/src/termdict/fst_termdict/streamer.rs
+++ b/src/termdict/fst_termdict/streamer.rs
@@ -78,8 +78,8 @@ pub struct TermStreamer<'a, A = AlwaysMatch>
 where
    A: Automaton,
 {
-    fst_map: &'a TermDictionary,
-    stream: Stream<'a, A>,
+    pub(crate) fst_map: &'a TermDictionary,
+    pub(crate) stream: Stream<'a, A>,
    term_ord: TermOrdinal,
    current_key: Vec<u8>,
    current_value: TermInfo,
--- a/src/termdict/merger.rs
+++ b/src/termdict/merger.rs
@@ -1,32 +1,11 @@
+use crate::postings::TermInfo;
+use crate::termdict::TermDictionary;
 use crate::termdict::TermOrdinal;
 use crate::termdict::TermStreamer;
-use std::cmp::Ordering;
-use std::collections::BinaryHeap;
-
-pub struct HeapItem<'a> {
-    pub streamer: TermStreamer<'a>,
-    pub segment_ord: usize,
-}
-
-impl<'a> PartialEq for HeapItem<'a> {
-    fn eq(&self, other: &Self) -> bool {
-        self.segment_ord == other.segment_ord
-    }
-}
-
-impl<'a> Eq for HeapItem<'a> {}
-
-impl<'a> PartialOrd for HeapItem<'a> {
-    fn partial_cmp(&self, other: &HeapItem<'a>) -> Option<Ordering> {
-        Some(self.cmp(other))
-    }
-}
-
-impl<'a> Ord for HeapItem<'a> {
-    fn cmp(&self, other: &HeapItem<'a>) -> Ordering {
-        (&other.streamer.key(), &other.segment_ord).cmp(&(&self.streamer.key(), &self.segment_ord))
-    }
-}
+use tantivy_fst::map::OpBuilder;
+use tantivy_fst::map::Union;
+use tantivy_fst::raw::IndexedValue;
+use tantivy_fst::Streamer;

 /// Given a list of sorted term streams,
 /// returns an iterator over sorted unique terms.
@@ -34,61 +13,50 @@ impl<'a> Ord for HeapItem<'a> {
 /// The item yield is actually a pair with
 /// - the term
 /// - a slice with the ordinal of the segments containing
-/// the terms.
+/// the term.
 pub struct TermMerger<'a> {
-    heap: BinaryHeap<HeapItem<'a>>,
-    current_streamers: Vec<HeapItem<'a>>,
+    dictionaries: Vec<&'a TermDictionary>,
+    union: Union<'a>,
+    current_key: Vec<u8>,
+    current_segment_and_term_ordinals: Vec<IndexedValue>,
 }

 impl<'a> TermMerger<'a> {
    /// Stream of merged term dictionary
    ///
    pub fn new(streams: Vec<TermStreamer<'a>>) -> TermMerger<'a> {
+        let mut op_builder = OpBuilder::new();
+        let mut dictionaries = vec![];
+        for streamer in streams {
+            op_builder.push(streamer.stream);
+            dictionaries.push(streamer.fst_map);
+        }
        TermMerger {
-            heap: BinaryHeap::new(),
-            current_streamers: streams
-                .into_iter()
-                .enumerate()
-                .map(|(ord, streamer)| HeapItem {
-                    streamer,
-                    segment_ord: ord,
-                })
-                .collect(),
+            dictionaries,
+            union: op_builder.union(),
+            current_key: vec![],
+            current_segment_and_term_ordinals: vec![],
        }
    }

-    pub(crate) fn matching_segments<'b: 'a>(
-        &'b self,
-    ) -> impl 'b + Iterator<Item = (usize, TermOrdinal)> {
-        self.current_streamers
+    pub fn matching_segments<'b: 'a>(&'b self) -> impl 'b + Iterator<Item = (usize, TermOrdinal)> {
+        self.current_segment_and_term_ordinals
            .iter()
-            .map(|heap_item| (heap_item.segment_ord, heap_item.streamer.term_ord()))
-    }
-
-    fn advance_segments(&mut self) {
-        let streamers = &mut self.current_streamers;
-        let heap = &mut self.heap;
-        for mut heap_item in streamers.drain(..) {
-            if heap_item.streamer.advance() {
-                heap.push(heap_item);
-            }
-        }
+            .map(|iv| (iv.index, iv.value))
    }

    /// Advance the term iterator to the next term.
    /// Returns true if there is indeed another term
    /// False if there is none.
    pub fn advance(&mut self) -> bool {
-        self.advance_segments();
-        if let Some(head) = self.heap.pop() {
-            self.current_streamers.push(head);
-            while let Some(next_streamer) = self.heap.peek() {
-                if self.current_streamers[0].streamer.key() != next_streamer.streamer.key() {
-                    break;
-                }
-                let next_heap_it = self.heap.pop().unwrap(); // safe : we peeked beforehand
-                self.current_streamers.push(next_heap_it);
-            }
+        if let Some((k, values)) = self.union.next() {
+            self.current_key.clear();
+            self.current_key.extend_from_slice(k);
+            self.current_segment_and_term_ordinals.clear();
+            self.current_segment_and_term_ordinals
+                .extend_from_slice(values);
+            self.current_segment_and_term_ordinals
+                .sort_by_key(|iv| iv.index);
            true
        } else {
            false
@@ -101,16 +69,85 @@ impl<'a> TermMerger<'a> {
    /// iff advance() has been called before
    /// and "true" was returned.
    pub fn key(&self) -> &[u8] {
-        self.current_streamers[0].streamer.key()
+        &self.current_key
    }

-    /// Returns the sorted list of segment ordinals
-    /// that include the current term.
+    /// Iterator over (segment ordinal, TermInfo) pairs iterator sorted by the ordinal.
    ///
    /// This method may be called
    /// iff advance() has been called before
    /// and "true" was returned.
-    pub fn current_kvs(&self) -> &[HeapItem<'a>] {
-        &self.current_streamers[..]
+    pub fn current_segment_ordinals_and_term_infos<'b: 'a>(
+        &'b self,
+    ) -> impl 'b + Iterator<Item = (usize, TermInfo)> {
+        self.current_segment_and_term_ordinals
+            .iter()
+            .map(move |iv| {
+                (
+                    iv.index,
+                    self.dictionaries[iv.index].term_info_from_ord(iv.value),
+                )
+            })
+    }
+}
+
+#[cfg(all(test, feature = "unstable"))]
+mod bench {
+    use super::TermMerger;
+    use crate::directory::FileSlice;
+    use crate::postings::TermInfo;
+    use crate::termdict::{TermDictionary, TermDictionaryBuilder};
+    use rand::distributions::Alphanumeric;
+    use rand::{thread_rng, Rng};
+    use test::{self, Bencher};
+
+    fn make_term_info(term_ord: u64) -> TermInfo {
+        let offset = |term_ord: u64| (term_ord * 100 + term_ord * term_ord) as usize;
+        TermInfo {
+            doc_freq: term_ord as u32,
+            postings_range: offset(term_ord)..offset(term_ord + 1),
+            positions_range: offset(term_ord)..offset(term_ord + 1),
+        }
+    }
+
+    /// Create a dictionary of random strings.
+    fn rand_dict(num_terms: usize) -> crate::Result<TermDictionary> {
+        let buffer: Vec<u8> = {
+            let mut terms = vec![];
+            for _i in 0..num_terms {
+                let rand_string: String = thread_rng()
+                    .sample_iter(&Alphanumeric)
+                    .take(thread_rng().gen_range(30..42))
+                    .map(char::from)
+                    .collect();
+                terms.push(rand_string);
+            }
+            terms.sort();
+
+            let mut term_dictionary_builder = TermDictionaryBuilder::create(Vec::new())?;
+            for i in 0..num_terms {
+                term_dictionary_builder.insert(terms[i].as_bytes(), &make_term_info(i as u64))?;
+            }
+            term_dictionary_builder.finish()?
+        };
+        let file = FileSlice::from(buffer);
+        TermDictionary::open(file)
+    }
+
+    #[bench]
+    fn bench_termmerger(b: &mut Bencher) -> crate::Result<()> {
+        let dict1 = rand_dict(100_000)?;
+        let dict2 = rand_dict(100_000)?;
+        b.iter(|| -> crate::Result<u32> {
+            let stream1 = dict1.stream()?;
+            let stream2 = dict2.stream()?;
+            let mut merger = TermMerger::new(vec![stream1, stream2]);
+            let mut count = 0;
+            while merger.advance() {
+                count += 1;
+            }
+            Ok(count)
+        });
+        Ok(())
    }
 }
Author	SHA1	Message	Date
Paul Masurel	6847af74ad	Hotfix 0.15.2	2021-06-16 22:15:55 +09:00
Andre-Philippe Paquet	5baa91fdf3	fix store reader iterator, take 2	2021-06-16 22:13:19 +09:00
PSeitz	5209238c1b	use github actions for tests	2021-06-14 12:51:46 +02:00
Paul Masurel	7ef25ec400	Bump to 0.15.1 to publish bugfix	2021-06-14 18:45:38 +09:00
PSeitz	221e7cbb55	Merge pull request #1076 from appaquet/fix/store-reader-iterator Fix panic in store reader raw document iterator during segment merge	2021-06-14 11:22:58 +02:00
Pascal Seitz	873ac1a3ac	cleanup import	2021-06-14 10:31:45 +02:00
Pascal Seitz	ebe55a7ae1	refactor test, fixes #1077 replace test with smaller test in doc_store	2021-06-14 10:10:05 +02:00
Bernard Swart	9f32d40b27	Misspelling of misspelled was fixed (#1078 )	2021-06-14 16:29:12 +09:00
Andre-Philippe Paquet	8ae10a930a	fix formatting	2021-06-13 17:23:40 -04:00
Andre-Philippe Paquet	473a346814	remove debugging	2021-06-13 16:49:44 -04:00
Andre-Philippe Paquet	3a8a0fe79a	add fuzzy merge test	2021-06-13 16:42:24 -04:00
Andre-Philippe Paquet	511dc8f87f	fix store reader iterator	2021-06-13 16:00:13 -04:00
Paul Masurel	3901295329	Bumped query-grammar version	2021-06-07 10:00:14 +09:00
Paul Masurel	f5918c6c74	Completed bitpacker README	2021-06-07 09:57:17 +09:00
Paul Masurel	abe6b4baec	Bumped tantivy version to 0.15	2021-06-07 09:52:48 +09:00
Paul Masurel	6e4b61154f	Issue/1070 (#1071 ) Add a boolean flag in the Query::query_terms informing on whether position information is required. Closes #1070	2021-06-03 22:33:20 +09:00
PSeitz	2aad0ced77	add inline to bitpacker (#1064 )	2021-05-31 23:15:41 +09:00
Stéphane Campinas	41ea14840d	add benchmark of term streams merge (#1024 ) * add benchmark of term streams merge * use union based on FST for merging the term dictionaries * Rename TermMerger benchmark	2021-05-31 23:15:01 +09:00
PSeitz	dff0ffd38a	prepare for multiple fastfield codecs (#1063 ) * prepare for multiple fastfield codecs prepare for multiple fastfield codecs by wrapping the codecs in an enum #1042 * add FastFieldSerializer trait, add DynamicFastFieldSerializer add FastFieldSerializer trait add DynamicFastFieldSerializer enum to wrap all implementors of the FastFieldSerializer trait * add estimation for fastfield bitpacker	2021-05-31 23:14:14 +09:00
PSeitz	8d32c3ba3a	Change Footer version handling, Make compression dynamic (#1060 ) Change Footer version handling, Make compression dynamic Change Footer version handling Simplify version handling by switching to JSON instead of binary serialization. fixes #1058 Make compression dynamic Instead of choosing the compression during compile time via a feature flag, you can now have multiple compression algorithms enabled and decide during runtime which one to choose via IndexSettings. Changing the compression algorithm on an index is also supported. The information which algorithm was used in the doc store is stored in the DocStoreFooter. The default is the lz4 block format. fixes #904 Handle merging of different compressors Fix feature flag names Add doc store test for all compressors	2021-05-28 14:57:20 +09:00
Moriyoshi Koizumi	4afba005f9	Provide a means to deal with malformed facet text representation for the query parser (#1056 ) * Provide a means to deal with malformed facet text representation for the query parser. * Specific error enum for the facet parse error.	2021-05-27 12:16:49 +09:00
PSeitz	85fb0cc20a	cache field norm reader in merge (#1061 )	2021-05-25 21:48:02 +09:00
PSeitz	5ef2d56ec2	Avoid docstore stacking for small segments, fixes #1053 (#1055 )	2021-05-24 15:38:49 +09:00
Paul Masurel	fd8e5bdf57	Rename more like this	2021-05-21 16:32:39 +09:00
PSeitz	4f8481a1e4	Detect if segments are stackackable with sorting, fixes #1038 (#1054 ) * Detect if segments are stackackable with sorting, fixes #1038 Detect if segments are stackable when their data ranges on the sort property are disjunct. Presort segments by thei min value on merge, to enable easier stacking. * move code to function	2021-05-21 15:23:17 +09:00
PSeitz	bcd72e5c14	fix and refactor log merge policy, fixes #1035 (#1043 ) * fix and refactor log merge policy, fixes #1035 fixes a bug in log merge policy where an index was wrongly referenced by its index * cleanup * fix sort order, improve method names * use itertools groupby, fix serialization test * minor improvments * update names	2021-05-19 10:48:46 +09:00
PSeitz	249bc6cf72	upgrade lz4_flex to 0.8 (#1049 ) * upgrade lz4_flex to 0.8 * fix set_len	2021-05-19 10:46:01 +09:00
PSeitz	1c0af5765d	fix doc store iter error handling, fixes #1047 (#1051 )	2021-05-18 21:43:57 +09:00
Paul Masurel	7ba771ed1b	Replaced RawDocument by OwnedBytes (#1046 )	2021-05-18 14:33:36 +09:00