mirror of
https://github.com/quickwit-oss/tantivy.git
synced 2026-01-04 08:12:54 +00:00
Update index_sorting.md
This commit is contained in:
@@ -1,9 +1,10 @@
|
||||
|
||||
- [Index Sorting](#index-sorting)
|
||||
+ [Motivation](#motivation)
|
||||
+ [Why Sorting](#why-sorting)
|
||||
* [Compression](#compression)
|
||||
* [Top-N Optimization](#top-n-optimization)
|
||||
* [Pruning](#pruning)
|
||||
* [Other](#other)
|
||||
+ [Usage](#usage)
|
||||
|
||||
# Index Sorting
|
||||
@@ -32,8 +33,12 @@ Let's say we want all documents and want to apply the filter `>= 2010-08-11`. Wh
|
||||
|
||||
Note: Tantivy 0.16 does not do this optimization yet.
|
||||
|
||||
###### Other?
|
||||
|
||||
In principle there are many algorithms possible that exploit the monotonically increasing nature. (aggregations maybe?)
|
||||
|
||||
## Usage
|
||||
The index sorting can be configured setting `sort_by_field` on `IndexSettings` and passing it to a `IndexBuilder`. As of tantvy 0.16 only fast fields are allowed to be used.
|
||||
The index sorting can be configured setting [`sort_by_field`](https://github.com/tantivy-search/tantivy/blob/000d76b11a139a84b16b9b95060a1c93e8b9851c/src/core/index_meta.rs#L238) on `IndexSettings` and passing it to a `IndexBuilder`. As of tantvy 0.16 only fast fields are allowed to be used.
|
||||
|
||||
```
|
||||
let settings = IndexSettings {
|
||||
@@ -48,3 +53,9 @@ index_builder = index_builder.settings(settings);
|
||||
let index = index_builder.create_in_ram().unwrap();
|
||||
```
|
||||
|
||||
## Implementation details
|
||||
|
||||
Sorting an index is applied in the serialization step. In general there are two serialization steps: [Finishing a single segment](https://github.com/tantivy-search/tantivy/blob/000d76b11a139a84b16b9b95060a1c93e8b9851c/src/indexer/segment_writer.rs#L338) and [merging multiple segments](https://github.com/tantivy-search/tantivy/blob/000d76b11a139a84b16b9b95060a1c93e8b9851c/src/indexer/merger.rs#L1073).
|
||||
|
||||
In both cases we generate a docid mapping reflecting the sort. This mapping is used when serializing the different components (doc store, fastfields, posting list, normfield, facets).
|
||||
|
||||
|
||||
Reference in New Issue
Block a user