From a1782dd17281335a2c51bc98e0eb8f2e0f1fcbae Mon Sep 17 00:00:00 2001 From: PSeitz Date: Wed, 25 Aug 2021 07:55:50 +0100 Subject: [PATCH] Update index_sorting.md --- doc/src/index_sorting.md | 15 +++++++++++++-- 1 file changed, 13 insertions(+), 2 deletions(-) diff --git a/doc/src/index_sorting.md b/doc/src/index_sorting.md index 7da55e75b..0f7cf3c0d 100644 --- a/doc/src/index_sorting.md +++ b/doc/src/index_sorting.md @@ -1,9 +1,10 @@ - [Index Sorting](#index-sorting) - + [Motivation](#motivation) + + [Why Sorting](#why-sorting) * [Compression](#compression) * [Top-N Optimization](#top-n-optimization) * [Pruning](#pruning) + * [Other](#other) + [Usage](#usage) # Index Sorting @@ -32,8 +33,12 @@ Let's say we want all documents and want to apply the filter `>= 2010-08-11`. Wh Note: Tantivy 0.16 does not do this optimization yet. +###### Other? + +In principle there are many algorithms possible that exploit the monotonically increasing nature. (aggregations maybe?) + ## Usage -The index sorting can be configured setting `sort_by_field` on `IndexSettings` and passing it to a `IndexBuilder`. As of tantvy 0.16 only fast fields are allowed to be used. +The index sorting can be configured setting [`sort_by_field`](https://github.com/tantivy-search/tantivy/blob/000d76b11a139a84b16b9b95060a1c93e8b9851c/src/core/index_meta.rs#L238) on `IndexSettings` and passing it to a `IndexBuilder`. As of tantvy 0.16 only fast fields are allowed to be used. ``` let settings = IndexSettings { @@ -48,3 +53,9 @@ index_builder = index_builder.settings(settings); let index = index_builder.create_in_ram().unwrap(); ``` +## Implementation details + +Sorting an index is applied in the serialization step. In general there are two serialization steps: [Finishing a single segment](https://github.com/tantivy-search/tantivy/blob/000d76b11a139a84b16b9b95060a1c93e8b9851c/src/indexer/segment_writer.rs#L338) and [merging multiple segments](https://github.com/tantivy-search/tantivy/blob/000d76b11a139a84b16b9b95060a1c93e8b9851c/src/indexer/merger.rs#L1073). + +In both cases we generate a docid mapping reflecting the sort. This mapping is used when serializing the different components (doc store, fastfields, posting list, normfield, facets). +