From 798f7dbf679ff767227dfed44c6eb4bfa647a241 Mon Sep 17 00:00:00 2001 From: Pascal Seitz Date: Mon, 23 Aug 2021 17:36:41 +0100 Subject: [PATCH] add sorting to book --- doc/src/SUMMARY.md | 1 + doc/src/index_sorting.md | 55 ++++++++++++++++++++++++++++++++++++++++ 2 files changed, 56 insertions(+) create mode 100644 doc/src/index_sorting.md diff --git a/doc/src/SUMMARY.md b/doc/src/SUMMARY.md index a280d19b7..ee9bfc7cc 100644 --- a/doc/src/SUMMARY.md +++ b/doc/src/SUMMARY.md @@ -7,6 +7,7 @@ - [Segments](./basis.md) - [Defining your schema](./schema.md) - [Facetting](./facetting.md) +- [Index Sorting](./index_sorting.md) - [Innerworkings](./innerworkings.md) - [Inverted index](./inverted_index.md) - [Best practise](./inverted_index.md) diff --git a/doc/src/index_sorting.md b/doc/src/index_sorting.md new file mode 100644 index 000000000..502268150 --- /dev/null +++ b/doc/src/index_sorting.md @@ -0,0 +1,55 @@ + +- [Index Sorting](#index-sorting) + + [Motivation](#motivation) + * [Compression](#compression) + * [Top-N Optimization](#top-n-optimization) + * [Pruning](#pruning) + + [Usage](#usage) + +# Index Sorting + +Tantivy allows you to sort the index according to a property. + +### Motivation + +Presorting an index has several advantages: + +- Compression +- Top-N Optimization +- Pruning + +##### Compression + +When data is sorted it is easier to compress the data. E.g. the numbers sequence [5, 2, 3, 1, 4] would be sorted to [1, 2, 3, 4, 5]. +If we apply delta encoding this list would be unsorted [5, -3, 1, -2, 3] vs. [1, 1, 1, 1, 1]. +Compression is mainly affected on the fast field of the sorted property, every thing else is likely unaffected. + +##### Top-N Optimization + +When data is presorted by a field and search queries request sorting by the same field, we can leverage the natural order of the documents. +E.g. if the data is sorted by timestamp and want the top n newest docs containing a term, we can simply leveraging the order of the docids. + +Note: Tantivy 0.16 does not do this optimization yet. + +##### Pruning + +Let's say we want all documents and want to apply the filter `>= 2010-08-11`. When the data is sorted, we could make a lookup in the fast field to find the docid range and use this as the filter. + +Note: Tantivy 0.16 does not do this optimization yet. + +### Usage +The index sorting can be configured setting `sort_by_field` on `IndexSettings` and passing it to a `IndexBuilder`. As of tantvy 0.16 only fast fields are allowed to be used. + +``` +let settings = IndexSettings { + sort_by_field: Some(IndexSortByField { + field: "intval".to_string(), + order: Order::Desc, + }), + ..Default::default() +}; +let mut index_builder = Index::builder().schema(schema); +index_builder = index_builder.settings(settings); +let index = index_builder.create_in_ram().unwrap(); +``` +