Merge pull request #1146 from tantivy-search/sorting_doc

add sorting to book
This commit is contained in:
PSeitz
2021-08-23 17:37:54 +01:00
committed by GitHub
2 changed files with 56 additions and 0 deletions

View File

@@ -7,6 +7,7 @@
- [Segments](./basis.md)
- [Defining your schema](./schema.md)
- [Facetting](./facetting.md)
- [Index Sorting](./index_sorting.md)
- [Innerworkings](./innerworkings.md)
- [Inverted index](./inverted_index.md)
- [Best practise](./inverted_index.md)

55
doc/src/index_sorting.md Normal file
View File

@@ -0,0 +1,55 @@
- [Index Sorting](#index-sorting)
+ [Motivation](#motivation)
* [Compression](#compression)
* [Top-N Optimization](#top-n-optimization)
* [Pruning](#pruning)
+ [Usage](#usage)
# Index Sorting
Tantivy allows you to sort the index according to a property.
### Motivation
Presorting an index has several advantages:
- Compression
- Top-N Optimization
- Pruning
##### Compression
When data is sorted it is easier to compress the data. E.g. the numbers sequence [5, 2, 3, 1, 4] would be sorted to [1, 2, 3, 4, 5].
If we apply delta encoding this list would be unsorted [5, -3, 1, -2, 3] vs. [1, 1, 1, 1, 1].
Compression is mainly affected on the fast field of the sorted property, every thing else is likely unaffected.
##### Top-N Optimization
When data is presorted by a field and search queries request sorting by the same field, we can leverage the natural order of the documents.
E.g. if the data is sorted by timestamp and want the top n newest docs containing a term, we can simply leveraging the order of the docids.
Note: Tantivy 0.16 does not do this optimization yet.
##### Pruning
Let's say we want all documents and want to apply the filter `>= 2010-08-11`. When the data is sorted, we could make a lookup in the fast field to find the docid range and use this as the filter.
Note: Tantivy 0.16 does not do this optimization yet.
### Usage
The index sorting can be configured setting `sort_by_field` on `IndexSettings` and passing it to a `IndexBuilder`. As of tantvy 0.16 only fast fields are allowed to be used.
```
let settings = IndexSettings {
sort_by_field: Some(IndexSortByField {
field: "intval".to_string(),
order: Order::Desc,
}),
..Default::default()
};
let mut index_builder = Index::builder().schema(schema);
index_builder = index_builder.settings(settings);
let index = index_builder.create_in_ram().unwrap();
```