Compare commits

...

19 Commits

Author SHA1 Message Date
Paul Masurel
784717749f Removing unused imports. 2021-02-05 23:04:17 +09:00
Paul Masurel
945bcc5bd3 Bump tantivy-grammar version 2021-02-05 22:58:21 +09:00
Paul Masurel
51aa9c319e Bumped version to 0.14 2021-02-05 22:55:26 +09:00
Paul Masurel
74d8d2946b Merge pull request #980 from lengyijun/patch-8
Update segment_postings.rs
2021-02-05 22:52:29 +09:00
lyj
0a160cc16e Update segment_postings.rs 2021-02-05 21:32:25 +08:00
Paul Masurel
f099f97daa Merge pull request #979 from slckl/main
FacetCounts are now pub use in tantivy::collector (Closes #978)
2021-02-05 17:05:20 +09:00
alif
769e9ba14d added simple docs to FacetCounts now-public API 2021-02-05 09:18:20 +02:00
alif
a482c0e966 pub use FacetCounts in tantivy::collector module 2021-02-05 09:00:48 +02:00
Paul Masurel
86d92a72e7 Renaming MultiValueIntFastField* to MultiValuedIntFastField* 2021-01-21 22:47:00 +09:00
Paul Masurel
ef618a5999 Made fast field reader clonable. 2021-01-21 22:15:24 +09:00
Paul Masurel
94d3d7a89a Rename FastFieldReaders::load_all 2021-01-21 18:38:48 +09:00
Paul Masurel
aa9e79f957 Clippy warnings. 2021-01-21 18:23:20 +09:00
Paul Masurel
84a2f534db Merge pull request #976 from tantivy-search/issue/fastfield_no_load
Fast field are not loaded on the opening of a segment.
2021-01-21 18:14:55 +09:00
Paul Masurel
1b4be24dca Fast field are not loaded on the opening of a segment.
They are instead loaded lazily when they are request.
2021-01-21 18:13:08 +09:00
Paul Masurel
824ccc37ae Merge pull request #975 from jamescorbett/patch-1
Change from serde::export to std::marker
2021-01-12 10:04:23 +09:00
Paul Masurel
5231651020 Closes #974 2021-01-12 10:03:37 +09:00
James Corbett
fa2c6f80c7 Change from serde::export to std::marker
For some reason under a docker build I get a build error under docker only saying that `serde::export` is private. This fixes it for me.

```
error[E0603]: module `export` is private
   --> /usr/local/cargo/registry/src/github.com-1ecc6299db9ec823/tantivy-0.13.2/src/collector/top_collector.rs:5:12
    |
5   | use serde::export::PhantomData;
    |            ^^^^^^ private module
    |
note: the module `export` is defined here
   --> /usr/local/cargo/registry/src/github.com-1ecc6299db9ec823/serde-1.0.119/src/lib.rs:275:5
    |
275 | use self::__private as export;
    |     ^^^^^^^^^^^^^^^^^^^^^^^^^
```
2021-01-12 00:25:54 +00:00
Paul Masurel
43c7b3bfec Bugfix in the RAMDirectory.
There was a state where the meta.json was empty.
2021-01-11 14:11:42 +09:00
Paul Masurel
b17a10546a Minor change in unit test. 2021-01-11 11:33:59 +09:00
29 changed files with 273 additions and 386 deletions

View File

@@ -1,22 +1,23 @@
Tantivy 0.14.0 Tantivy 0.14.0
========================= =========================
- Remove dependency to atomicwrites #833 .Implemented by @pmasurel upon suggestion and research from @asafigan). - Remove dependency to atomicwrites #833 .Implemented by @fulmicoton upon suggestion and research from @asafigan).
- Migrated tantivy error from the now deprecated `failure` crate to `thiserror` #760. (@hirevo) - Migrated tantivy error from the now deprecated `failure` crate to `thiserror` #760. (@hirevo)
- API Change. Accessing the typed value off a `Schema::Value` now returns an Option instead of panicking if the type does not match. - API Change. Accessing the typed value off a `Schema::Value` now returns an Option instead of panicking if the type does not match.
- Large API Change in the Directory API. Tantivy used to assume that all files could be somehow memory mapped. After this change, Directory return a `FileSlice` that can be reduced and eventually read into an `OwnedBytes` object. Long and blocking io operation are still required by they do not span over the entire file. - Large API Change in the Directory API. Tantivy used to assume that all files could be somehow memory mapped. After this change, Directory return a `FileSlice` that can be reduced and eventually read into an `OwnedBytes` object. Long and blocking io operation are still required by they do not span over the entire file.
- Added support for Brotli compression in the DocStore. (@ppodolsky) - Added support for Brotli compression in the DocStore. (@ppodolsky)
- Added helper for building intersections and unions in BooleanQuery (@guilload) - Added helper for building intersections and unions in BooleanQuery (@guilload)
- Bugfix in `Query::explain` - Bugfix in `Query::explain`
- Removed dependency on `notify` #924. Replaced with `FileWatcher` struct that polls meta file every 500ms in background thread. (@halvorboe @guilload) - Removed dependency on `notify` #924. Replaced with `FileWatcher` struct that polls meta file every 500ms in background thread. (@halvorboe @guilload)
- Added `FilterCollector`, which wraps another collector and filters docs using a predicate over a fast field (@barrotsteindev) - Added `FilterCollector`, which wraps another collector and filters docs using a predicate over a fast field (@barrotsteindev)
- Simplified the encoding of the skip reader struct. BlockWAND max tf is now encoded over a single byte. (@pmasurel) - Simplified the encoding of the skip reader struct. BlockWAND max tf is now encoded over a single byte. (@fulmicoton)
- `FilterCollector` now supports all Fast Field value types (@barrotsteindev) - `FilterCollector` now supports all Fast Field value types (@barrotsteindev)
- FastField are not all loaded when opening the segment reader. (@fulmicoton)
This version breaks compatibility and requires users to reindex everything. This version breaks compatibility and requires users to reindex everything.
Tantivy 0.13.2 Tantivy 0.13.2
=================== ===================
Bugfix. Acquiring a facet reader on a segment that does not contain any Bugfix. Acquiring a facet reader on a segment that does not contain any
doc with this facet returns `None`. (#896) doc with this facet returns `None`. (#896)
Tantivy 0.13.1 Tantivy 0.13.1
@@ -27,7 +28,7 @@ Updated misc dependency versions.
Tantivy 0.13.0 Tantivy 0.13.0
====================== ======================
Tantivy 0.13 introduce a change in the index format that will require Tantivy 0.13 introduce a change in the index format that will require
you to reindex your index (BlockWAND information are added in the skiplist). you to reindex your index (BlockWAND information are added in the skiplist).
The index size increase is minor as this information is only added for The index size increase is minor as this information is only added for
full blocks. full blocks.
If you have a massive index for which reindexing is not an option, please contact me If you have a massive index for which reindexing is not an option, please contact me
@@ -36,7 +37,7 @@ so that we can discuss possible solutions.
- Bugfix in `FuzzyTermQuery` not matching terms by prefix when it should (@Peachball) - Bugfix in `FuzzyTermQuery` not matching terms by prefix when it should (@Peachball)
- Relaxed constraints on the custom/tweak score functions. At the segment level, they can be mut, and they are not required to be Sync + Send. - Relaxed constraints on the custom/tweak score functions. At the segment level, they can be mut, and they are not required to be Sync + Send.
- `MMapDirectory::open` does not return a `Result` anymore. - `MMapDirectory::open` does not return a `Result` anymore.
- Change in the DocSet and Scorer API. (@fulmicoton). - Change in the DocSet and Scorer API. (@fulmicoton).
A freshly created DocSet point directly to their first doc. A sentinel value called TERMINATED marks the end of a DocSet. A freshly created DocSet point directly to their first doc. A sentinel value called TERMINATED marks the end of a DocSet.
`.advance()` returns the new DocId. `Scorer::skip(target)` has been replaced by `Scorer::seek(target)` and returns the resulting DocId. `.advance()` returns the new DocId. `Scorer::skip(target)` has been replaced by `Scorer::seek(target)` and returns the resulting DocId.
As a result, iterating through DocSet now looks as follows As a result, iterating through DocSet now looks as follows
@@ -50,7 +51,7 @@ while doc != TERMINATED {
The change made it possible to greatly simplify a lot of the docset's code. The change made it possible to greatly simplify a lot of the docset's code.
- Misc internal optimization and introduction of the `Scorer::for_each_pruning` function. (@fulmicoton) - Misc internal optimization and introduction of the `Scorer::for_each_pruning` function. (@fulmicoton)
- Added an offset option to the Top(.*)Collectors. (@robyoung) - Added an offset option to the Top(.*)Collectors. (@robyoung)
- Added Block WAND. Performance on TOP-K on term-unions should be greatly increased. (@fulmicoton, and special thanks - Added Block WAND. Performance on TOP-K on term-unions should be greatly increased. (@fulmicoton, and special thanks
to the PISA team for answering all my questions!) to the PISA team for answering all my questions!)
Tantivy 0.12.0 Tantivy 0.12.0
@@ -58,14 +59,14 @@ Tantivy 0.12.0
- Removing static dispatch in tokenizers for simplicity. (#762) - Removing static dispatch in tokenizers for simplicity. (#762)
- Added backward iteration for `TermDictionary` stream. (@halvorboe) - Added backward iteration for `TermDictionary` stream. (@halvorboe)
- Fixed a performance issue when searching for the posting lists of a missing term (@audunhalland) - Fixed a performance issue when searching for the posting lists of a missing term (@audunhalland)
- Added a configurable maximum number of docs (10M by default) for a segment to be considered for merge (@hntd187, landed by @halvorboe #713) - Added a configurable maximum number of docs (10M by default) for a segment to be considered for merge (@hntd187, landed by @halvorboe #713)
- Important Bugfix #777, causing tantivy to retain memory mapping. (diagnosed by @poljar) - Important Bugfix #777, causing tantivy to retain memory mapping. (diagnosed by @poljar)
- Added support for field boosting. (#547, @fulmicoton) - Added support for field boosting. (#547, @fulmicoton)
## How to update? ## How to update?
Crates relying on custom tokenizer, or registering tokenizer in the manager will require some Crates relying on custom tokenizer, or registering tokenizer in the manager will require some
minor changes. Check https://github.com/tantivy-search/tantivy/blob/master/examples/custom_tokenizer.rs minor changes. Check https://github.com/tantivy-search/tantivy/blob/main/examples/custom_tokenizer.rs
to check for some code sample. to check for some code sample.
Tantivy 0.11.3 Tantivy 0.11.3
@@ -101,7 +102,7 @@ Tantivy 0.11.0
## How to update? ## How to update?
- The index format is changed. You are required to reindex your data to use tantivy 0.11. - The index format is changed. You are required to reindex your data to use tantivy 0.11.
- `Box<dyn BoxableTokenizer>` has been replaced by a `BoxedTokenizer` struct. - `Box<dyn BoxableTokenizer>` has been replaced by a `BoxedTokenizer` struct.
- Regex are now compiled when the `RegexQuery` instance is built. As a result, it can now return - Regex are now compiled when the `RegexQuery` instance is built. As a result, it can now return
an error and handling the `Result` is required. an error and handling the `Result` is required.
@@ -125,26 +126,26 @@ Tantivy 0.10.0
*Tantivy 0.10.0 index format is compatible with the index format in 0.9.0.* *Tantivy 0.10.0 index format is compatible with the index format in 0.9.0.*
- Added an API to easily tweak or entirely replace the - Added an API to easily tweak or entirely replace the
default score. See `TopDocs::tweak_score`and `TopScore::custom_score` (@pmasurel) default score. See `TopDocs::tweak_score`and `TopScore::custom_score` (@fulmicoton)
- Added an ASCII folding filter (@drusellers) - Added an ASCII folding filter (@drusellers)
- Bugfix in `query.count` in presence of deletes (@pmasurel) - Bugfix in `query.count` in presence of deletes (@fulmicoton)
- Added `.explain(...)` in `Query` and `Weight` to (@pmasurel) - Added `.explain(...)` in `Query` and `Weight` to (@fulmicoton)
- Added an efficient way to `delete_all_documents` in `IndexWriter` (@petr-tik). - Added an efficient way to `delete_all_documents` in `IndexWriter` (@petr-tik).
All segments are simply removed. All segments are simply removed.
Minor Minor
--------- ---------
- Switched to Rust 2018 (@uvd) - Switched to Rust 2018 (@uvd)
- Small simplification of the code. - Small simplification of the code.
Calling .freq() or .doc() when .advance() has never been called Calling .freq() or .doc() when .advance() has never been called
on segment postings should panic from now on. on segment postings should panic from now on.
- Tokens exceeding `u16::max_value() - 4` chars are discarded silently instead of panicking. - Tokens exceeding `u16::max_value() - 4` chars are discarded silently instead of panicking.
- Fast fields are now preloaded when the `SegmentReader` is created. - Fast fields are now preloaded when the `SegmentReader` is created.
- `IndexMeta` is now public. (@hntd187) - `IndexMeta` is now public. (@hntd187)
- `IndexWriter` `add_document`, `delete_term`. `IndexWriter` is `Sync`, making it possible to use it with a ` - `IndexWriter` `add_document`, `delete_term`. `IndexWriter` is `Sync`, making it possible to use it with a `
Arc<RwLock<IndexWriter>>`. `add_document` and `delete_term` can Arc<RwLock<IndexWriter>>`. `add_document` and `delete_term` can
only require a read lock. (@pmasurel) only require a read lock. (@fulmicoton)
- Introducing `Opstamp` as an expressive type alias for `u64`. (@petr-tik) - Introducing `Opstamp` as an expressive type alias for `u64`. (@petr-tik)
- Stamper now relies on `AtomicU64` on all platforms (@petr-tik) - Stamper now relies on `AtomicU64` on all platforms (@petr-tik)
- Bugfix - Files get deleted slightly earlier - Bugfix - Files get deleted slightly earlier
@@ -158,7 +159,7 @@ Your program should be usable as is.
Fast fields used to be accessed directly from the `SegmentReader`. Fast fields used to be accessed directly from the `SegmentReader`.
The API changed, you are now required to acquire your fast field reader via the The API changed, you are now required to acquire your fast field reader via the
`segment_reader.fast_fields()`, and use one of the typed method: `segment_reader.fast_fields()`, and use one of the typed method:
- `.u64()`, `.i64()` if your field is single-valued ; - `.u64()`, `.i64()` if your field is single-valued ;
- `.u64s()`, `.i64s()` if your field is multi-valued ; - `.u64s()`, `.i64s()` if your field is multi-valued ;
- `.bytes()` if your field is bytes fast field. - `.bytes()` if your field is bytes fast field.
@@ -167,16 +168,16 @@ The API changed, you are now required to acquire your fast field reader via the
Tantivy 0.9.0 Tantivy 0.9.0
===================== =====================
*0.9.0 index format is not compatible with the *0.9.0 index format is not compatible with the
previous index format.* previous index format.*
- MAJOR BUGFIX : - MAJOR BUGFIX :
Some `Mmap` objects were being leaked, and would never get released. (@fulmicoton) Some `Mmap` objects were being leaked, and would never get released. (@fulmicoton)
- Removed most unsafe (@fulmicoton) - Removed most unsafe (@fulmicoton)
- Indexer memory footprint improved. (VInt comp, inlining the first block. (@fulmicoton) - Indexer memory footprint improved. (VInt comp, inlining the first block. (@fulmicoton)
- Stemming in other language possible (@pentlander) - Stemming in other language possible (@pentlander)
- Segments with no docs are deleted earlier (@barrotsteindev) - Segments with no docs are deleted earlier (@barrotsteindev)
- Added grouped add and delete operations. - Added grouped add and delete operations.
They are guaranteed to happen together (i.e. they cannot be split by a commit). They are guaranteed to happen together (i.e. they cannot be split by a commit).
In addition, adds are guaranteed to happen on the same segment. (@elbow-jason) In addition, adds are guaranteed to happen on the same segment. (@elbow-jason)
- Removed `INT_STORED` and `INT_INDEXED`. It is now possible to use `STORED` and `INDEXED` - Removed `INT_STORED` and `INT_INDEXED`. It is now possible to use `STORED` and `INDEXED`
for int fields. (@fulmicoton) for int fields. (@fulmicoton)
@@ -190,26 +191,26 @@ tantivy 0.9 brought some API breaking change.
To update from tantivy 0.8, you will need to go through the following steps. To update from tantivy 0.8, you will need to go through the following steps.
- `schema::INT_INDEXED` and `schema::INT_STORED` should be replaced by `schema::INDEXED` and `schema::INT_STORED`. - `schema::INT_INDEXED` and `schema::INT_STORED` should be replaced by `schema::INDEXED` and `schema::INT_STORED`.
- The index now does not hold the pool of searcher anymore. You are required to create an intermediary object called - The index now does not hold the pool of searcher anymore. You are required to create an intermediary object called
`IndexReader` for this. `IndexReader` for this.
```rust ```rust
// create the reader. You typically need to create 1 reader for the entire // create the reader. You typically need to create 1 reader for the entire
// lifetime of you program. // lifetime of you program.
let reader = index.reader()?; let reader = index.reader()?;
// Acquire a searcher (previously `index.searcher()`) is now written: // Acquire a searcher (previously `index.searcher()`) is now written:
let searcher = reader.searcher(); let searcher = reader.searcher();
// With the default setting of the reader, you are not required to // With the default setting of the reader, you are not required to
// call `index.load_searchers()` anymore. // call `index.load_searchers()` anymore.
// //
// The IndexReader will pick up that change automatically, regardless // The IndexReader will pick up that change automatically, regardless
// of whether the update was done in a different process or not. // of whether the update was done in a different process or not.
// If this behavior is not wanted, you can create your reader with // If this behavior is not wanted, you can create your reader with
// the `ReloadPolicy::Manual`, and manually decide when to reload the index // the `ReloadPolicy::Manual`, and manually decide when to reload the index
// by calling `reader.reload()?`. // by calling `reader.reload()?`.
``` ```
@@ -224,7 +225,7 @@ Tantivy 0.8.1
===================== =====================
Hotfix of #476. Hotfix of #476.
Merge was reflecting deletes before commit was passed. Merge was reflecting deletes before commit was passed.
Thanks @barrotsteindev for reporting the bug. Thanks @barrotsteindev for reporting the bug.
@@ -232,7 +233,7 @@ Tantivy 0.8.0
===================== =====================
*No change in the index format* *No change in the index format*
- API Breaking change in the collector API. (@jwolfe, @fulmicoton) - API Breaking change in the collector API. (@jwolfe, @fulmicoton)
- Multithreaded search (@jwolfe, @fulmicoton) - Multithreaded search (@jwolfe, @fulmicoton)
Tantivy 0.7.1 Tantivy 0.7.1
@@ -260,7 +261,7 @@ Tantivy 0.6.1
- Exclusive `field:{startExcl to endExcl}` - Exclusive `field:{startExcl to endExcl}`
- Mixed `field:[startIncl to endExcl}` and vice versa - Mixed `field:[startIncl to endExcl}` and vice versa
- Unbounded `field:[start to *]`, `field:[* to end]` - Unbounded `field:[start to *]`, `field:[* to end]`
Tantivy 0.6 Tantivy 0.6
========================== ==========================
@@ -268,10 +269,10 @@ Tantivy 0.6
Special thanks to @drusellers and @jason-wolfe for their contributions Special thanks to @drusellers and @jason-wolfe for their contributions
to this release! to this release!
- Removed C code. Tantivy is now pure Rust. (@pmasurel) - Removed C code. Tantivy is now pure Rust. (@fulmicoton)
- BM25 (@pmasurel) - BM25 (@fulmicoton)
- Approximate field norms encoded over 1 byte. (@pmasurel) - Approximate field norms encoded over 1 byte. (@fulmicoton)
- Compiles on stable rust (@pmasurel) - Compiles on stable rust (@fulmicoton)
- Add &[u8] fastfield for associating arbitrary bytes to each document (@jason-wolfe) (#270) - Add &[u8] fastfield for associating arbitrary bytes to each document (@jason-wolfe) (#270)
- Completely uncompressed - Completely uncompressed
- Internally: One u64 fast field for indexes, one fast field for the bytes themselves. - Internally: One u64 fast field for indexes, one fast field for the bytes themselves.
@@ -279,7 +280,7 @@ to this release!
- Add Stopword Filter support (@drusellers) - Add Stopword Filter support (@drusellers)
- Add a FuzzyTermQuery (@drusellers) - Add a FuzzyTermQuery (@drusellers)
- Add a RegexQuery (@drusellers) - Add a RegexQuery (@drusellers)
- Various performance improvements (@pmasurel)_ - Various performance improvements (@fulmicoton)_
Tantivy 0.5.2 Tantivy 0.5.2

View File

@@ -1,6 +1,6 @@
[package] [package]
name = "tantivy" name = "tantivy"
version = "0.14.0-dev" version = "0.14.0"
authors = ["Paul Masurel <paul.masurel@gmail.com>"] authors = ["Paul Masurel <paul.masurel@gmail.com>"]
license = "MIT" license = "MIT"
categories = ["database-implementations", "data-structures"] categories = ["database-implementations", "data-structures"]
@@ -33,7 +33,7 @@ levenshtein_automata = "0.2"
uuid = { version = "0.8", features = ["v4", "serde"] } uuid = { version = "0.8", features = ["v4", "serde"] }
crossbeam = "0.8" crossbeam = "0.8"
futures = {version = "0.3", features=["thread-pool"] } futures = {version = "0.3", features=["thread-pool"] }
tantivy-query-grammar = { version="0.14.0-dev", path="./query-grammar" } tantivy-query-grammar = { version="0.14.0", path="./query-grammar" }
stable_deref_trait = "1" stable_deref_trait = "1"
rust-stemmers = "1" rust-stemmers = "1"
downcast-rs = "1" downcast-rs = "1"

View File

@@ -1,9 +1,9 @@
[![Build Status](https://travis-ci.org/tantivy-search/tantivy.svg?branch=master)](https://travis-ci.org/tantivy-search/tantivy) [![Build Status](https://travis-ci.org/tantivy-search/tantivy.svg?branch=main)](https://travis-ci.org/tantivy-search/tantivy)
[![codecov](https://codecov.io/gh/tantivy-search/tantivy/branch/master/graph/badge.svg)](https://codecov.io/gh/tantivy-search/tantivy) [![codecov](https://codecov.io/gh/tantivy-search/tantivy/branch/main/graph/badge.svg)](https://codecov.io/gh/tantivy-search/tantivy)
[![Join the chat at https://gitter.im/tantivy-search/tantivy](https://badges.gitter.im/tantivy-search/tantivy.svg)](https://gitter.im/tantivy-search/tantivy?utm_source=badge&utm_medium=badge&utm_campaign=pr-badge&utm_content=badge) [![Join the chat at https://gitter.im/tantivy-search/tantivy](https://badges.gitter.im/tantivy-search/tantivy.svg)](https://gitter.im/tantivy-search/tantivy?utm_source=badge&utm_medium=badge&utm_campaign=pr-badge&utm_content=badge)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT) [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
[![Build status](https://ci.appveyor.com/api/projects/status/r7nb13kj23u8m9pj/branch/master?svg=true)](https://ci.appveyor.com/project/fulmicoton/tantivy/branch/master) [![Build status](https://ci.appveyor.com/api/projects/status/r7nb13kj23u8m9pj/branch/main?svg=true)](https://ci.appveyor.com/project/fulmicoton/tantivy/branch/main)
[![Crates.io](https://img.shields.io/crates/v/tantivy.svg)](https://crates.io/crates/tantivy) [![Crates.io](https://img.shields.io/crates/v/tantivy.svg)](https://crates.io/crates/tantivy)
![Tantivy](https://tantivy-search.github.io/logo/tantivy-logo.png) ![Tantivy](https://tantivy-search.github.io/logo/tantivy-logo.png)

View File

@@ -14,7 +14,7 @@ use tantivy::fastfield::FastFieldReader;
use tantivy::query::QueryParser; use tantivy::query::QueryParser;
use tantivy::schema::Field; use tantivy::schema::Field;
use tantivy::schema::{Schema, FAST, INDEXED, TEXT}; use tantivy::schema::{Schema, FAST, INDEXED, TEXT};
use tantivy::{doc, Index, Score, SegmentReader, TantivyError}; use tantivy::{doc, Index, Score, SegmentReader};
#[derive(Default)] #[derive(Default)]
struct Stats { struct Stats {
@@ -72,16 +72,7 @@ impl Collector for StatsCollector {
_segment_local_id: u32, _segment_local_id: u32,
segment_reader: &SegmentReader, segment_reader: &SegmentReader,
) -> tantivy::Result<StatsSegmentCollector> { ) -> tantivy::Result<StatsSegmentCollector> {
let fast_field_reader = segment_reader let fast_field_reader = segment_reader.fast_fields().u64(self.field)?;
.fast_fields()
.u64(self.field)
.ok_or_else(|| {
let field_name = segment_reader.schema().get_field_name(self.field);
TantivyError::SchemaError(format!(
"Field {:?} is not a u64 fast field.",
field_name
))
})?;
Ok(StatsSegmentCollector { Ok(StatsSegmentCollector {
fast_field_reader, fast_field_reader,
stats: Stats::default(), stats: Stats::default(),

View File

@@ -1,6 +1,6 @@
[package] [package]
name = "tantivy-query-grammar" name = "tantivy-query-grammar"
version = "0.14.0-dev" version = "0.14.0"
authors = ["Paul Masurel <paul.masurel@gmail.com>"] authors = ["Paul Masurel <paul.masurel@gmail.com>"]
license = "MIT" license = "MIT"
categories = ["database-implementations", "data-structures"] categories = ["database-implementations", "data-structures"]

View File

@@ -398,6 +398,8 @@ impl<'a> Iterator for FacetChildIterator<'a> {
} }
impl FacetCounts { impl FacetCounts {
/// Returns an iterator over all of the facet count pairs inside this result.
/// See the documentation for `FacetCollector` for a usage example.
pub fn get<T>(&self, facet_from: T) -> FacetChildIterator<'_> pub fn get<T>(&self, facet_from: T) -> FacetChildIterator<'_>
where where
Facet: From<T>, Facet: From<T>,
@@ -417,6 +419,8 @@ impl FacetCounts {
FacetChildIterator { underlying } FacetChildIterator { underlying }
} }
/// Returns a vector of top `k` facets with their counts, sorted highest-to-lowest by counts.
/// See the documentation for `FacetCollector` for a usage example.
pub fn top_k<T>(&self, facet: T, k: usize) -> Vec<(&Facet, u64)> pub fn top_k<T>(&self, facet: T, k: usize) -> Vec<(&Facet, u64)>
where where
Facet: From<T>, Facet: From<T>,

View File

@@ -124,13 +124,7 @@ where
let fast_field_reader = segment_reader let fast_field_reader = segment_reader
.fast_fields() .fast_fields()
.typed_fast_field_reader(self.field) .typed_fast_field_reader(self.field)?;
.ok_or_else(|| {
TantivyError::SchemaError(format!(
"{:?} is not declared as a fast field in the schema.",
self.field
))
})?;
let segment_collector = self let segment_collector = self
.collector .collector

View File

@@ -109,6 +109,7 @@ pub use self::tweak_score_top_collector::{ScoreSegmentTweaker, ScoreTweaker};
mod facet_collector; mod facet_collector;
pub use self::facet_collector::FacetCollector; pub use self::facet_collector::FacetCollector;
pub use self::facet_collector::FacetCounts;
use crate::query::Weight; use crate::query::Weight;
mod docset_collector; mod docset_collector;

View File

@@ -240,12 +240,7 @@ impl Collector for BytesFastFieldTestCollector {
_segment_local_id: u32, _segment_local_id: u32,
segment_reader: &SegmentReader, segment_reader: &SegmentReader,
) -> crate::Result<BytesFastFieldSegmentCollector> { ) -> crate::Result<BytesFastFieldSegmentCollector> {
let reader = segment_reader let reader = segment_reader.fast_fields().bytes(self.field)?;
.fast_fields()
.bytes(self.field)
.ok_or_else(|| {
crate::TantivyError::InvalidArgument("Field is not a bytes fast field.".to_string())
})?;
Ok(BytesFastFieldSegmentCollector { Ok(BytesFastFieldSegmentCollector {
vals: Vec::new(), vals: Vec::new(),
reader, reader,

View File

@@ -2,9 +2,9 @@ use crate::DocAddress;
use crate::DocId; use crate::DocId;
use crate::SegmentLocalId; use crate::SegmentLocalId;
use crate::SegmentReader; use crate::SegmentReader;
use serde::export::PhantomData;
use std::cmp::Ordering; use std::cmp::Ordering;
use std::collections::BinaryHeap; use std::collections::BinaryHeap;
use std::marker::PhantomData;
/// Contains a feature (field, score, etc.) of a document along with the document address. /// Contains a feature (field, score, etc.) of a document along with the document address.
/// ///

View File

@@ -146,15 +146,14 @@ impl CustomScorer<u64> for ScorerByField {
type Child = ScorerByFastFieldReader; type Child = ScorerByFastFieldReader;
fn segment_scorer(&self, segment_reader: &SegmentReader) -> crate::Result<Self::Child> { fn segment_scorer(&self, segment_reader: &SegmentReader) -> crate::Result<Self::Child> {
let ff_reader = segment_reader // We interpret this field as u64, regardless of its type, that way,
// we avoid needless conversion. Regardless of the fast field type, the
// mapping is monotonic, so it is sufficient to compute our top-K docs.
//
// The conversion will then happen only on the top-K docs.
let ff_reader: FastFieldReader<u64> = segment_reader
.fast_fields() .fast_fields()
.u64_lenient(self.field) .typed_fast_field_reader(self.field)?;
.ok_or_else(|| {
crate::TantivyError::SchemaError(format!(
"Field requested ({:?}) is not a fast field.",
self.field
))
})?;
Ok(ScorerByFastFieldReader { ff_reader }) Ok(ScorerByFastFieldReader { ff_reader })
} }
} }
@@ -232,7 +231,7 @@ impl TopDocs {
/// # let title = schema_builder.add_text_field("title", TEXT); /// # let title = schema_builder.add_text_field("title", TEXT);
/// # let rating = schema_builder.add_u64_field("rating", FAST); /// # let rating = schema_builder.add_u64_field("rating", FAST);
/// # let schema = schema_builder.build(); /// # let schema = schema_builder.build();
/// # /// #
/// # let index = Index::create_in_ram(schema); /// # let index = Index::create_in_ram(schema);
/// # let mut index_writer = index.writer_with_num_threads(1, 10_000_000)?; /// # let mut index_writer = index.writer_with_num_threads(1, 10_000_000)?;
/// # index_writer.add_document(doc!(title => "The Name of the Wind", rating => 92u64)); /// # index_writer.add_document(doc!(title => "The Name of the Wind", rating => 92u64));
@@ -262,7 +261,7 @@ impl TopDocs {
/// let top_books_by_rating = TopDocs /// let top_books_by_rating = TopDocs
/// ::with_limit(10) /// ::with_limit(10)
/// .order_by_u64_field(rating_field); /// .order_by_u64_field(rating_field);
/// ///
/// // ... and here are our documents. Note this is a simple vec. /// // ... and here are our documents. Note this is a simple vec.
/// // The `u64` in the pair is the value of our fast field for /// // The `u64` in the pair is the value of our fast field for
/// // each documents. /// // each documents.
@@ -272,13 +271,13 @@ impl TopDocs {
/// // query. /// // query.
/// let resulting_docs: Vec<(u64, DocAddress)> = /// let resulting_docs: Vec<(u64, DocAddress)> =
/// searcher.search(query, &top_books_by_rating)?; /// searcher.search(query, &top_books_by_rating)?;
/// ///
/// Ok(resulting_docs) /// Ok(resulting_docs)
/// } /// }
/// ``` /// ```
/// ///
/// # See also /// # See also
/// ///
/// To confortably work with `u64`s, `i64`s, `f64`s, or `date`s, please refer to /// To confortably work with `u64`s, `i64`s, `f64`s, or `date`s, please refer to
/// [.order_by_fast_field(...)](#method.order_by_fast_field) method. /// [.order_by_fast_field(...)](#method.order_by_fast_field) method.
pub fn order_by_u64_field( pub fn order_by_u64_field(
@@ -290,7 +289,7 @@ impl TopDocs {
/// Set top-K to rank documents by a given fast field. /// Set top-K to rank documents by a given fast field.
/// ///
/// If the field is not a fast field, or its field type does not match the generic type, this method does not panic, /// If the field is not a fast field, or its field type does not match the generic type, this method does not panic,
/// but an explicit error will be returned at the moment of collection. /// but an explicit error will be returned at the moment of collection.
/// ///
/// Note that this method is a generic. The requested fast field type will be often /// Note that this method is a generic. The requested fast field type will be often
@@ -314,7 +313,7 @@ impl TopDocs {
/// # let title = schema_builder.add_text_field("company", TEXT); /// # let title = schema_builder.add_text_field("company", TEXT);
/// # let rating = schema_builder.add_i64_field("revenue", FAST); /// # let rating = schema_builder.add_i64_field("revenue", FAST);
/// # let schema = schema_builder.build(); /// # let schema = schema_builder.build();
/// # /// #
/// # let index = Index::create_in_ram(schema); /// # let index = Index::create_in_ram(schema);
/// # let mut index_writer = index.writer_with_num_threads(1, 10_000_000)?; /// # let mut index_writer = index.writer_with_num_threads(1, 10_000_000)?;
/// # index_writer.add_document(doc!(title => "MadCow Inc.", rating => 92_000_000i64)); /// # index_writer.add_document(doc!(title => "MadCow Inc.", rating => 92_000_000i64));
@@ -343,7 +342,7 @@ impl TopDocs {
/// let top_company_by_revenue = TopDocs /// let top_company_by_revenue = TopDocs
/// ::with_limit(2) /// ::with_limit(2)
/// .order_by_fast_field(revenue_field); /// .order_by_fast_field(revenue_field);
/// ///
/// // ... and here are our documents. Note this is a simple vec. /// // ... and here are our documents. Note this is a simple vec.
/// // The `i64` in the pair is the value of our fast field for /// // The `i64` in the pair is the value of our fast field for
/// // each documents. /// // each documents.
@@ -353,7 +352,7 @@ impl TopDocs {
/// // query. /// // query.
/// let resulting_docs: Vec<(i64, DocAddress)> = /// let resulting_docs: Vec<(i64, DocAddress)> =
/// searcher.search(query, &top_company_by_revenue)?; /// searcher.search(query, &top_company_by_revenue)?;
/// ///
/// Ok(resulting_docs) /// Ok(resulting_docs)
/// } /// }
/// ``` /// ```
@@ -392,7 +391,7 @@ impl TopDocs {
/// ///
/// In the following example will will tweak our ranking a bit by /// In the following example will will tweak our ranking a bit by
/// boosting popular products a notch. /// boosting popular products a notch.
/// ///
/// In more serious application, this tweaking could involved running a /// In more serious application, this tweaking could involved running a
/// learning-to-rank model over various features /// learning-to-rank model over various features
/// ///
@@ -523,7 +522,7 @@ impl TopDocs {
/// # let index = Index::create_in_ram(schema); /// # let index = Index::create_in_ram(schema);
/// # let mut index_writer = index.writer_with_num_threads(1, 10_000_000)?; /// # let mut index_writer = index.writer_with_num_threads(1, 10_000_000)?;
/// # let product_name = index.schema().get_field("product_name").unwrap(); /// # let product_name = index.schema().get_field("product_name").unwrap();
/// # /// #
/// let popularity: Field = index.schema().get_field("popularity").unwrap(); /// let popularity: Field = index.schema().get_field("popularity").unwrap();
/// let boosted: Field = index.schema().get_field("boosted").unwrap(); /// let boosted: Field = index.schema().get_field("boosted").unwrap();
/// # index_writer.add_document(doc!(boosted=>1u64, product_name => "The Diary of Muadib", popularity => 1u64)); /// # index_writer.add_document(doc!(boosted=>1u64, product_name => "The Diary of Muadib", popularity => 1u64));
@@ -557,7 +556,7 @@ impl TopDocs {
/// segment_reader.fast_fields().u64(popularity).unwrap(); /// segment_reader.fast_fields().u64(popularity).unwrap();
/// let boosted_reader = /// let boosted_reader =
/// segment_reader.fast_fields().u64(boosted).unwrap(); /// segment_reader.fast_fields().u64(boosted).unwrap();
/// ///
/// // We can now define our actual scoring function /// // We can now define our actual scoring function
/// move |doc: DocId| { /// move |doc: DocId| {
/// let popularity: u64 = popularity_reader.get(doc); /// let popularity: u64 = popularity_reader.get(doc);
@@ -994,9 +993,7 @@ mod tests {
let segment = searcher.segment_reader(0); let segment = searcher.segment_reader(0);
let top_collector = TopDocs::with_limit(4).order_by_u64_field(size); let top_collector = TopDocs::with_limit(4).order_by_u64_field(size);
let err = top_collector.for_segment(0, segment).err().unwrap(); let err = top_collector.for_segment(0, segment).err().unwrap();
assert!( assert!(matches!(err, crate::TantivyError::SchemaError(_)));
matches!(err, crate::TantivyError::SchemaError(msg) if msg == "Field requested (Field(0)) is not a fast field.")
);
Ok(()) Ok(())
} }

View File

@@ -35,12 +35,21 @@ fn load_metas(
inventory: &SegmentMetaInventory, inventory: &SegmentMetaInventory,
) -> crate::Result<IndexMeta> { ) -> crate::Result<IndexMeta> {
let meta_data = directory.atomic_read(&META_FILEPATH)?; let meta_data = directory.atomic_read(&META_FILEPATH)?;
let meta_string = String::from_utf8_lossy(&meta_data); let meta_string = String::from_utf8(meta_data).map_err(|_utf8_err| {
error!("Meta data is not valid utf8.");
DataCorruption::new(
META_FILEPATH.to_path_buf(),
"Meta file does not contain valid utf8 file.".to_string(),
)
})?;
IndexMeta::deserialize(&meta_string, &inventory) IndexMeta::deserialize(&meta_string, &inventory)
.map_err(|e| { .map_err(|e| {
DataCorruption::new( DataCorruption::new(
META_FILEPATH.to_path_buf(), META_FILEPATH.to_path_buf(),
format!("Meta file cannot be deserialized. {:?}.", e), format!(
"Meta file cannot be deserialized. {:?}. Content: {:?}",
e, meta_string
),
) )
}) })
.map_err(From::from) .map_err(From::from)

View File

@@ -114,12 +114,7 @@ impl SegmentReader {
field_entry.name() field_entry.name()
))); )));
} }
let term_ords_reader = self.fast_fields().u64s(field).ok_or_else(|| { let term_ords_reader = self.fast_fields().u64s(field)?;
DataCorruption::comment_only(format!(
"Cannot find data for hierarchical facet {:?}",
field_entry.name()
))
})?;
let termdict = self let termdict = self
.termdict_composite .termdict_composite
.open_read(field) .open_read(field)
@@ -183,8 +178,10 @@ impl SegmentReader {
let fast_fields_data = segment.open_read(SegmentComponent::FASTFIELDS)?; let fast_fields_data = segment.open_read(SegmentComponent::FASTFIELDS)?;
let fast_fields_composite = CompositeFile::open(&fast_fields_data)?; let fast_fields_composite = CompositeFile::open(&fast_fields_data)?;
let fast_field_readers = let fast_field_readers = Arc::new(FastFieldReaders::new(
Arc::new(FastFieldReaders::load_all(&schema, &fast_fields_composite)?); schema.clone(),
fast_fields_composite,
)?);
let fieldnorm_data = segment.open_read(SegmentComponent::FIELDNORMS)?; let fieldnorm_data = segment.open_read(SegmentComponent::FIELDNORMS)?;
let fieldnorm_readers = FieldNormReaders::open(fieldnorm_data)?; let fieldnorm_readers = FieldNormReaders::open(fieldnorm_data)?;
@@ -310,7 +307,7 @@ impl SegmentReader {
} }
/// Returns an iterator that will iterate over the alive document ids /// Returns an iterator that will iterate over the alive document ids
pub fn doc_ids_alive<'a>(&'a self) -> impl Iterator<Item = DocId> + 'a { pub fn doc_ids_alive(&self) -> impl Iterator<Item = DocId> + '_ {
(0u32..self.max_doc).filter(move |doc| !self.is_deleted(*doc)) (0u32..self.max_doc).filter(move |doc| !self.is_deleted(*doc))
} }

View File

@@ -226,13 +226,9 @@ impl Directory for RAMDirectory {
))); )));
let path_buf = PathBuf::from(path); let path_buf = PathBuf::from(path);
// Reserve the path to prevent calls to .write() to succeed. self.fs.write().unwrap().write(path_buf, data);
self.fs.write().unwrap().write(path_buf.clone(), &[]);
let mut vec_writer = VecWriter::new(path_buf, self.clone()); if path == *META_FILEPATH {
vec_writer.write_all(data)?;
vec_writer.flush()?;
if path == Path::new(&*META_FILEPATH) {
let _ = self.fs.write().unwrap().watch_router.broadcast(); let _ = self.fs.write().unwrap().watch_router.broadcast();
} }
Ok(()) Ok(())

View File

@@ -1,4 +1,4 @@
use super::MultiValueIntFastFieldReader; use super::MultiValuedFastFieldReader;
use crate::error::DataCorruption; use crate::error::DataCorruption;
use crate::schema::Facet; use crate::schema::Facet;
use crate::termdict::TermDictionary; use crate::termdict::TermDictionary;
@@ -20,7 +20,7 @@ use std::str;
/// list of facets. This ordinal is segment local and /// list of facets. This ordinal is segment local and
/// only makes sense for a given segment. /// only makes sense for a given segment.
pub struct FacetReader { pub struct FacetReader {
term_ords: MultiValueIntFastFieldReader<u64>, term_ords: MultiValuedFastFieldReader<u64>,
term_dict: TermDictionary, term_dict: TermDictionary,
buffer: Vec<u8>, buffer: Vec<u8>,
} }
@@ -29,12 +29,12 @@ impl FacetReader {
/// Creates a new `FacetReader`. /// Creates a new `FacetReader`.
/// ///
/// A facet reader just wraps : /// A facet reader just wraps :
/// - a `MultiValueIntFastFieldReader` that makes it possible to /// - a `MultiValuedFastFieldReader` that makes it possible to
/// access the list of facet ords for a given document. /// access the list of facet ords for a given document.
/// - a `TermDictionary` that helps associating a facet to /// - a `TermDictionary` that helps associating a facet to
/// an ordinal and vice versa. /// an ordinal and vice versa.
pub fn new( pub fn new(
term_ords: MultiValueIntFastFieldReader<u64>, term_ords: MultiValuedFastFieldReader<u64>,
term_dict: TermDictionary, term_dict: TermDictionary,
) -> FacetReader { ) -> FacetReader {
FacetReader { FacetReader {

View File

@@ -28,7 +28,7 @@ pub use self::delete::write_delete_bitset;
pub use self::delete::DeleteBitSet; pub use self::delete::DeleteBitSet;
pub use self::error::{FastFieldNotAvailableError, Result}; pub use self::error::{FastFieldNotAvailableError, Result};
pub use self::facet_reader::FacetReader; pub use self::facet_reader::FacetReader;
pub use self::multivalued::{MultiValueIntFastFieldReader, MultiValueIntFastFieldWriter}; pub use self::multivalued::{MultiValuedFastFieldReader, MultiValuedFastFieldWriter};
pub use self::reader::FastFieldReader; pub use self::reader::FastFieldReader;
pub use self::readers::FastFieldReaders; pub use self::readers::FastFieldReaders;
pub use self::serializer::FastFieldSerializer; pub use self::serializer::FastFieldSerializer;

View File

@@ -1,8 +1,8 @@
mod reader; mod reader;
mod writer; mod writer;
pub use self::reader::MultiValueIntFastFieldReader; pub use self::reader::MultiValuedFastFieldReader;
pub use self::writer::MultiValueIntFastFieldWriter; pub use self::writer::MultiValuedFastFieldWriter;
#[cfg(test)] #[cfg(test)]
mod tests { mod tests {

View File

@@ -10,29 +10,22 @@ use crate::DocId;
/// The `idx_reader` associated, for each document, the index of its first value. /// The `idx_reader` associated, for each document, the index of its first value.
/// ///
#[derive(Clone)] #[derive(Clone)]
pub struct MultiValueIntFastFieldReader<Item: FastValue> { pub struct MultiValuedFastFieldReader<Item: FastValue> {
idx_reader: FastFieldReader<u64>, idx_reader: FastFieldReader<u64>,
vals_reader: FastFieldReader<Item>, vals_reader: FastFieldReader<Item>,
} }
impl<Item: FastValue> MultiValueIntFastFieldReader<Item> { impl<Item: FastValue> MultiValuedFastFieldReader<Item> {
pub(crate) fn open( pub(crate) fn open(
idx_reader: FastFieldReader<u64>, idx_reader: FastFieldReader<u64>,
vals_reader: FastFieldReader<Item>, vals_reader: FastFieldReader<Item>,
) -> MultiValueIntFastFieldReader<Item> { ) -> MultiValuedFastFieldReader<Item> {
MultiValueIntFastFieldReader { MultiValuedFastFieldReader {
idx_reader, idx_reader,
vals_reader, vals_reader,
} }
} }
pub(crate) fn into_u64s_reader(self) -> MultiValueIntFastFieldReader<u64> {
MultiValueIntFastFieldReader {
idx_reader: self.idx_reader,
vals_reader: self.vals_reader.into_u64_reader(),
}
}
/// Returns `(start, stop)`, such that the values associated /// Returns `(start, stop)`, such that the values associated
/// to the given document are `start..stop`. /// to the given document are `start..stop`.
fn range(&self, doc: DocId) -> (u64, u64) { fn range(&self, doc: DocId) -> (u64, u64) {

View File

@@ -18,7 +18,7 @@ use std::io;
/// in your schema /// in your schema
/// - add your document simply by calling `.add_document(...)`. /// - add your document simply by calling `.add_document(...)`.
/// ///
/// The `MultiValueIntFastFieldWriter` can be acquired from the /// The `MultiValuedFastFieldWriter` can be acquired from the
/// fastfield writer, by calling [`.get_multivalue_writer(...)`](./struct.FastFieldsWriter.html#method.get_multivalue_writer). /// fastfield writer, by calling [`.get_multivalue_writer(...)`](./struct.FastFieldsWriter.html#method.get_multivalue_writer).
/// ///
/// Once acquired, writing is done by calling calls to /// Once acquired, writing is done by calling calls to
@@ -29,17 +29,17 @@ use std::io;
/// This makes it possible to push unordered term ids, /// This makes it possible to push unordered term ids,
/// during indexing and remap them to their respective /// during indexing and remap them to their respective
/// term ids when the segment is getting serialized. /// term ids when the segment is getting serialized.
pub struct MultiValueIntFastFieldWriter { pub struct MultiValuedFastFieldWriter {
field: Field, field: Field,
vals: Vec<UnorderedTermId>, vals: Vec<UnorderedTermId>,
doc_index: Vec<u64>, doc_index: Vec<u64>,
is_facet: bool, is_facet: bool,
} }
impl MultiValueIntFastFieldWriter { impl MultiValuedFastFieldWriter {
/// Creates a new `IntFastFieldWriter` /// Creates a new `IntFastFieldWriter`
pub(crate) fn new(field: Field, is_facet: bool) -> Self { pub(crate) fn new(field: Field, is_facet: bool) -> Self {
MultiValueIntFastFieldWriter { MultiValuedFastFieldWriter {
field, field,
vals: Vec::new(), vals: Vec::new(),
doc_index: Vec::new(), doc_index: Vec::new(),
@@ -47,7 +47,7 @@ impl MultiValueIntFastFieldWriter {
} }
} }
/// Access the field associated to the `MultiValueIntFastFieldWriter` /// Access the field associated to the `MultiValuedFastFieldWriter`
pub fn field(&self) -> Field { pub fn field(&self) -> Field {
self.field self.field
} }

View File

@@ -42,24 +42,6 @@ impl<Item: FastValue> FastFieldReader<Item> {
}) })
} }
pub(crate) fn into_u64_reader(self) -> FastFieldReader<u64> {
FastFieldReader {
bit_unpacker: self.bit_unpacker,
min_value_u64: self.min_value_u64,
max_value_u64: self.max_value_u64,
_phantom: PhantomData,
}
}
pub(crate) fn cast<TFastValue: FastValue>(self) -> FastFieldReader<TFastValue> {
FastFieldReader {
bit_unpacker: self.bit_unpacker,
min_value_u64: self.min_value_u64,
max_value_u64: self.max_value_u64,
_phantom: PhantomData,
}
}
/// Return the value associated to the given document. /// Return the value associated to the given document.
/// ///
/// This accessor should return as fast as possible. /// This accessor should return as fast as possible.

View File

@@ -1,28 +1,22 @@
use crate::common::CompositeFile; use crate::common::CompositeFile;
use crate::fastfield::MultiValueIntFastFieldReader; use crate::directory::FileSlice;
use crate::fastfield::MultiValuedFastFieldReader;
use crate::fastfield::{BytesFastFieldReader, FastValue}; use crate::fastfield::{BytesFastFieldReader, FastValue};
use crate::fastfield::{FastFieldNotAvailableError, FastFieldReader}; use crate::fastfield::{FastFieldNotAvailableError, FastFieldReader};
use crate::schema::{Cardinality, Field, FieldType, Schema}; use crate::schema::{Cardinality, Field, FieldType, Schema};
use crate::space_usage::PerFieldSpaceUsage; use crate::space_usage::PerFieldSpaceUsage;
use std::collections::HashMap; use crate::TantivyError;
/// Provides access to all of the FastFieldReader. /// Provides access to all of the FastFieldReader.
/// ///
/// Internally, `FastFieldReaders` have preloaded fast field readers, /// Internally, `FastFieldReaders` have preloaded fast field readers,
/// and just wraps several `HashMap`. /// and just wraps several `HashMap`.
#[derive(Clone)]
pub struct FastFieldReaders { pub struct FastFieldReaders {
fast_field_i64: HashMap<Field, FastFieldReader<i64>>, schema: Schema,
fast_field_u64: HashMap<Field, FastFieldReader<u64>>,
fast_field_f64: HashMap<Field, FastFieldReader<f64>>,
fast_field_date: HashMap<Field, FastFieldReader<crate::DateTime>>,
fast_field_i64s: HashMap<Field, MultiValueIntFastFieldReader<i64>>,
fast_field_u64s: HashMap<Field, MultiValueIntFastFieldReader<u64>>,
fast_field_f64s: HashMap<Field, MultiValueIntFastFieldReader<f64>>,
fast_field_dates: HashMap<Field, MultiValueIntFastFieldReader<crate::DateTime>>,
fast_bytes: HashMap<Field, BytesFastFieldReader>,
fast_fields_composite: CompositeFile, fast_fields_composite: CompositeFile,
} }
#[derive(Eq, PartialEq, Debug)]
enum FastType { enum FastType {
I64, I64,
U64, U64,
@@ -50,236 +44,167 @@ fn type_and_cardinality(field_type: &FieldType) -> Option<(FastType, Cardinality
} }
impl FastFieldReaders { impl FastFieldReaders {
pub(crate) fn load_all( pub(crate) fn new(
schema: &Schema, schema: Schema,
fast_fields_composite: &CompositeFile, fast_fields_composite: CompositeFile,
) -> crate::Result<FastFieldReaders> { ) -> crate::Result<FastFieldReaders> {
let mut fast_field_readers = FastFieldReaders { Ok(FastFieldReaders {
fast_field_i64: Default::default(), fast_fields_composite,
fast_field_u64: Default::default(), schema,
fast_field_f64: Default::default(), })
fast_field_date: Default::default(),
fast_field_i64s: Default::default(),
fast_field_u64s: Default::default(),
fast_field_f64s: Default::default(),
fast_field_dates: Default::default(),
fast_bytes: Default::default(),
fast_fields_composite: fast_fields_composite.clone(),
};
for (field, field_entry) in schema.fields() {
let field_type = field_entry.field_type();
if let FieldType::Bytes(bytes_option) = field_type {
if !bytes_option.is_fast() {
continue;
}
let fast_field_idx_file = fast_fields_composite
.open_read_with_idx(field, 0)
.ok_or_else(|| FastFieldNotAvailableError::new(field_entry))?;
let idx_reader = FastFieldReader::open(fast_field_idx_file)?;
let data = fast_fields_composite
.open_read_with_idx(field, 1)
.ok_or_else(|| FastFieldNotAvailableError::new(field_entry))?;
let bytes_fast_field_reader = BytesFastFieldReader::open(idx_reader, data)?;
fast_field_readers
.fast_bytes
.insert(field, bytes_fast_field_reader);
} else if let Some((fast_type, cardinality)) = type_and_cardinality(field_type) {
match cardinality {
Cardinality::SingleValue => {
if let Some(fast_field_data) = fast_fields_composite.open_read(field) {
match fast_type {
FastType::U64 => {
let fast_field_reader = FastFieldReader::open(fast_field_data)?;
fast_field_readers
.fast_field_u64
.insert(field, fast_field_reader);
}
FastType::I64 => {
let fast_field_reader =
FastFieldReader::open(fast_field_data.clone())?;
fast_field_readers
.fast_field_i64
.insert(field, fast_field_reader);
}
FastType::F64 => {
let fast_field_reader =
FastFieldReader::open(fast_field_data.clone())?;
fast_field_readers
.fast_field_f64
.insert(field, fast_field_reader);
}
FastType::Date => {
let fast_field_reader =
FastFieldReader::open(fast_field_data.clone())?;
fast_field_readers
.fast_field_date
.insert(field, fast_field_reader);
}
}
} else {
return Err(From::from(FastFieldNotAvailableError::new(field_entry)));
}
}
Cardinality::MultiValues => {
let idx_opt = fast_fields_composite.open_read_with_idx(field, 0);
let data_opt = fast_fields_composite.open_read_with_idx(field, 1);
if let (Some(fast_field_idx), Some(fast_field_data)) = (idx_opt, data_opt) {
let idx_reader = FastFieldReader::open(fast_field_idx)?;
match fast_type {
FastType::I64 => {
let vals_reader = FastFieldReader::open(fast_field_data)?;
let multivalued_int_fast_field =
MultiValueIntFastFieldReader::open(idx_reader, vals_reader);
fast_field_readers
.fast_field_i64s
.insert(field, multivalued_int_fast_field);
}
FastType::U64 => {
let vals_reader = FastFieldReader::open(fast_field_data)?;
let multivalued_int_fast_field =
MultiValueIntFastFieldReader::open(idx_reader, vals_reader);
fast_field_readers
.fast_field_u64s
.insert(field, multivalued_int_fast_field);
}
FastType::F64 => {
let vals_reader = FastFieldReader::open(fast_field_data)?;
let multivalued_int_fast_field =
MultiValueIntFastFieldReader::open(idx_reader, vals_reader);
fast_field_readers
.fast_field_f64s
.insert(field, multivalued_int_fast_field);
}
FastType::Date => {
let vals_reader = FastFieldReader::open(fast_field_data)?;
let multivalued_int_fast_field =
MultiValueIntFastFieldReader::open(idx_reader, vals_reader);
fast_field_readers
.fast_field_dates
.insert(field, multivalued_int_fast_field);
}
}
} else {
return Err(From::from(FastFieldNotAvailableError::new(field_entry)));
}
}
}
}
}
Ok(fast_field_readers)
} }
pub(crate) fn space_usage(&self) -> PerFieldSpaceUsage { pub(crate) fn space_usage(&self) -> PerFieldSpaceUsage {
self.fast_fields_composite.space_usage() self.fast_fields_composite.space_usage()
} }
/// Returns the `u64` fast field reader reader associated to `field`. fn fast_field_data(&self, field: Field, idx: usize) -> crate::Result<FileSlice> {
/// self.fast_fields_composite
/// If `field` is not a u64 fast field, this method returns `None`. .open_read_with_idx(field, idx)
pub fn u64(&self, field: Field) -> Option<FastFieldReader<u64>> { .ok_or_else(|| {
self.fast_field_u64.get(&field).cloned() let field_name = self.schema.get_field_entry(field).name();
TantivyError::SchemaError(format!("Field({}) data was not found", field_name))
})
} }
/// If the field is a u64-fast field return the associated reader. fn check_type(
/// If the field is a i64-fast field, return the associated u64 reader. Values are &self,
/// mapped from i64 to u64 using a (well the, it is unique) monotonic mapping. /// field: Field,
/// expected_fast_type: FastType,
/// This method is useful when merging segment reader. expected_cardinality: Cardinality,
pub(crate) fn u64_lenient(&self, field: Field) -> Option<FastFieldReader<u64>> { ) -> crate::Result<()> {
if let Some(u64_ff_reader) = self.u64(field) { let field_entry = self.schema.get_field_entry(field);
return Some(u64_ff_reader); let (fast_type, cardinality) =
type_and_cardinality(field_entry.field_type()).ok_or_else(|| {
crate::TantivyError::SchemaError(format!(
"Field {:?} is not a fast field.",
field_entry.name()
))
})?;
if fast_type != expected_fast_type {
return Err(crate::TantivyError::SchemaError(format!(
"Field {:?} is of type {:?}, expected {:?}.",
field_entry.name(),
fast_type,
expected_fast_type
)));
} }
if let Some(i64_ff_reader) = self.i64(field) { if cardinality != expected_cardinality {
return Some(i64_ff_reader.into_u64_reader()); return Err(crate::TantivyError::SchemaError(format!(
"Field {:?} is of cardinality {:?}, expected {:?}.",
field_entry.name(),
cardinality,
expected_cardinality
)));
} }
if let Some(f64_ff_reader) = self.f64(field) { Ok(())
return Some(f64_ff_reader.into_u64_reader());
}
if let Some(date_ff_reader) = self.date(field) {
return Some(date_ff_reader.into_u64_reader());
}
None
} }
pub(crate) fn typed_fast_field_reader<TFastValue: FastValue>( pub(crate) fn typed_fast_field_reader<TFastValue: FastValue>(
&self, &self,
field: Field, field: Field,
) -> Option<FastFieldReader<TFastValue>> { ) -> crate::Result<FastFieldReader<TFastValue>> {
self.u64_lenient(field) let fast_field_slice = self.fast_field_data(field, 0)?;
.map(|fast_field_reader| fast_field_reader.cast()) FastFieldReader::open(fast_field_slice)
}
pub(crate) fn typed_fast_field_multi_reader<TFastValue: FastValue>(
&self,
field: Field,
) -> crate::Result<MultiValuedFastFieldReader<TFastValue>> {
let fast_field_slice_idx = self.fast_field_data(field, 0)?;
let fast_field_slice_vals = self.fast_field_data(field, 1)?;
let idx_reader = FastFieldReader::open(fast_field_slice_idx)?;
let vals_reader: FastFieldReader<TFastValue> =
FastFieldReader::open(fast_field_slice_vals)?;
Ok(MultiValuedFastFieldReader::open(idx_reader, vals_reader))
}
/// Returns the `u64` fast field reader reader associated to `field`.
///
/// If `field` is not a u64 fast field, this method returns `None`.
pub fn u64(&self, field: Field) -> crate::Result<FastFieldReader<u64>> {
self.check_type(field, FastType::U64, Cardinality::SingleValue)?;
self.typed_fast_field_reader(field)
} }
/// Returns the `i64` fast field reader reader associated to `field`. /// Returns the `i64` fast field reader reader associated to `field`.
/// ///
/// If `field` is not a i64 fast field, this method returns `None`. /// If `field` is not a i64 fast field, this method returns `None`.
pub fn i64(&self, field: Field) -> Option<FastFieldReader<i64>> { pub fn i64(&self, field: Field) -> crate::Result<FastFieldReader<i64>> {
self.fast_field_i64.get(&field).cloned() self.check_type(field, FastType::I64, Cardinality::SingleValue)?;
self.typed_fast_field_reader(field)
} }
/// Returns the `i64` fast field reader reader associated to `field`. /// Returns the `i64` fast field reader reader associated to `field`.
/// ///
/// If `field` is not a i64 fast field, this method returns `None`. /// If `field` is not a i64 fast field, this method returns `None`.
pub fn date(&self, field: Field) -> Option<FastFieldReader<crate::DateTime>> { pub fn date(&self, field: Field) -> crate::Result<FastFieldReader<crate::DateTime>> {
self.fast_field_date.get(&field).cloned() self.check_type(field, FastType::Date, Cardinality::SingleValue)?;
self.typed_fast_field_reader(field)
} }
/// Returns the `f64` fast field reader reader associated to `field`. /// Returns the `f64` fast field reader reader associated to `field`.
/// ///
/// If `field` is not a f64 fast field, this method returns `None`. /// If `field` is not a f64 fast field, this method returns `None`.
pub fn f64(&self, field: Field) -> Option<FastFieldReader<f64>> { pub fn f64(&self, field: Field) -> crate::Result<FastFieldReader<f64>> {
self.fast_field_f64.get(&field).cloned() self.check_type(field, FastType::F64, Cardinality::SingleValue)?;
self.typed_fast_field_reader(field)
} }
/// Returns a `u64s` multi-valued fast field reader reader associated to `field`. /// Returns a `u64s` multi-valued fast field reader reader associated to `field`.
/// ///
/// If `field` is not a u64 multi-valued fast field, this method returns `None`. /// If `field` is not a u64 multi-valued fast field, this method returns `None`.
pub fn u64s(&self, field: Field) -> Option<MultiValueIntFastFieldReader<u64>> { pub fn u64s(&self, field: Field) -> crate::Result<MultiValuedFastFieldReader<u64>> {
self.fast_field_u64s.get(&field).cloned() self.check_type(field, FastType::U64, Cardinality::MultiValues)?;
} self.typed_fast_field_multi_reader(field)
/// If the field is a u64s-fast field return the associated reader.
/// If the field is a i64s-fast field, return the associated u64s reader. Values are
/// mapped from i64 to u64 using a (well the, it is unique) monotonic mapping.
///
/// This method is useful when merging segment reader.
pub(crate) fn u64s_lenient(&self, field: Field) -> Option<MultiValueIntFastFieldReader<u64>> {
if let Some(u64s_ff_reader) = self.u64s(field) {
return Some(u64s_ff_reader);
}
if let Some(i64s_ff_reader) = self.i64s(field) {
return Some(i64s_ff_reader.into_u64s_reader());
}
if let Some(f64s_ff_reader) = self.f64s(field) {
return Some(f64s_ff_reader.into_u64s_reader());
}
None
} }
/// Returns a `i64s` multi-valued fast field reader reader associated to `field`. /// Returns a `i64s` multi-valued fast field reader reader associated to `field`.
/// ///
/// If `field` is not a i64 multi-valued fast field, this method returns `None`. /// If `field` is not a i64 multi-valued fast field, this method returns `None`.
pub fn i64s(&self, field: Field) -> Option<MultiValueIntFastFieldReader<i64>> { pub fn i64s(&self, field: Field) -> crate::Result<MultiValuedFastFieldReader<i64>> {
self.fast_field_i64s.get(&field).cloned() self.check_type(field, FastType::I64, Cardinality::MultiValues)?;
self.typed_fast_field_multi_reader(field)
} }
/// Returns a `f64s` multi-valued fast field reader reader associated to `field`. /// Returns a `f64s` multi-valued fast field reader reader associated to `field`.
/// ///
/// If `field` is not a f64 multi-valued fast field, this method returns `None`. /// If `field` is not a f64 multi-valued fast field, this method returns `None`.
pub fn f64s(&self, field: Field) -> Option<MultiValueIntFastFieldReader<f64>> { pub fn f64s(&self, field: Field) -> crate::Result<MultiValuedFastFieldReader<f64>> {
self.fast_field_f64s.get(&field).cloned() self.check_type(field, FastType::F64, Cardinality::MultiValues)?;
self.typed_fast_field_multi_reader(field)
} }
/// Returns a `crate::DateTime` multi-valued fast field reader reader associated to `field`. /// Returns a `crate::DateTime` multi-valued fast field reader reader associated to `field`.
/// ///
/// If `field` is not a `crate::DateTime` multi-valued fast field, this method returns `None`. /// If `field` is not a `crate::DateTime` multi-valued fast field, this method returns `None`.
pub fn dates(&self, field: Field) -> Option<MultiValueIntFastFieldReader<crate::DateTime>> { pub fn dates(
self.fast_field_dates.get(&field).cloned() &self,
field: Field,
) -> crate::Result<MultiValuedFastFieldReader<crate::DateTime>> {
self.check_type(field, FastType::Date, Cardinality::MultiValues)?;
self.typed_fast_field_multi_reader(field)
} }
/// Returns the `bytes` fast field reader associated to `field`. /// Returns the `bytes` fast field reader associated to `field`.
/// ///
/// If `field` is not a bytes fast field, returns `None`. /// If `field` is not a bytes fast field, returns `None`.
pub fn bytes(&self, field: Field) -> Option<BytesFastFieldReader> { pub fn bytes(&self, field: Field) -> crate::Result<BytesFastFieldReader> {
self.fast_bytes.get(&field).cloned() let field_entry = self.schema.get_field_entry(field);
if let FieldType::Bytes(bytes_option) = field_entry.field_type() {
if !bytes_option.is_fast() {
return Err(crate::TantivyError::SchemaError(format!(
"Field {:?} is not a fast field.",
field_entry.name()
)));
}
let fast_field_idx_file = self.fast_field_data(field, 0)?;
let idx_reader = FastFieldReader::open(fast_field_idx_file)?;
let data = self.fast_field_data(field, 1)?;
BytesFastFieldReader::open(idx_reader, data)
} else {
Err(FastFieldNotAvailableError::new(field_entry).into())
}
} }
} }

View File

@@ -1,4 +1,4 @@
use super::multivalued::MultiValueIntFastFieldWriter; use super::multivalued::MultiValuedFastFieldWriter;
use crate::common; use crate::common;
use crate::common::BinarySerializable; use crate::common::BinarySerializable;
use crate::common::VInt; use crate::common::VInt;
@@ -13,7 +13,7 @@ use std::io;
/// The fastfieldswriter regroup all of the fast field writers. /// The fastfieldswriter regroup all of the fast field writers.
pub struct FastFieldsWriter { pub struct FastFieldsWriter {
single_value_writers: Vec<IntFastFieldWriter>, single_value_writers: Vec<IntFastFieldWriter>,
multi_values_writers: Vec<MultiValueIntFastFieldWriter>, multi_values_writers: Vec<MultiValuedFastFieldWriter>,
bytes_value_writers: Vec<BytesFastFieldWriter>, bytes_value_writers: Vec<BytesFastFieldWriter>,
} }
@@ -46,14 +46,14 @@ impl FastFieldsWriter {
single_value_writers.push(fast_field_writer); single_value_writers.push(fast_field_writer);
} }
Some(Cardinality::MultiValues) => { Some(Cardinality::MultiValues) => {
let fast_field_writer = MultiValueIntFastFieldWriter::new(field, false); let fast_field_writer = MultiValuedFastFieldWriter::new(field, false);
multi_values_writers.push(fast_field_writer); multi_values_writers.push(fast_field_writer);
} }
None => {} None => {}
} }
} }
FieldType::HierarchicalFacet => { FieldType::HierarchicalFacet => {
let fast_field_writer = MultiValueIntFastFieldWriter::new(field, true); let fast_field_writer = MultiValuedFastFieldWriter::new(field, true);
multi_values_writers.push(fast_field_writer); multi_values_writers.push(fast_field_writer);
} }
FieldType::Bytes(bytes_option) => { FieldType::Bytes(bytes_option) => {
@@ -87,7 +87,7 @@ impl FastFieldsWriter {
pub fn get_multivalue_writer( pub fn get_multivalue_writer(
&mut self, &mut self,
field: Field, field: Field,
) -> Option<&mut MultiValueIntFastFieldWriter> { ) -> Option<&mut MultiValuedFastFieldWriter> {
// TODO optimize // TODO optimize
self.multi_values_writers self.multi_values_writers
.iter_mut() .iter_mut()

View File

@@ -7,7 +7,7 @@ use crate::fastfield::BytesFastFieldReader;
use crate::fastfield::DeleteBitSet; use crate::fastfield::DeleteBitSet;
use crate::fastfield::FastFieldReader; use crate::fastfield::FastFieldReader;
use crate::fastfield::FastFieldSerializer; use crate::fastfield::FastFieldSerializer;
use crate::fastfield::MultiValueIntFastFieldReader; use crate::fastfield::MultiValuedFastFieldReader;
use crate::fieldnorm::FieldNormsSerializer; use crate::fieldnorm::FieldNormsSerializer;
use crate::fieldnorm::FieldNormsWriter; use crate::fieldnorm::FieldNormsWriter;
use crate::fieldnorm::{FieldNormReader, FieldNormReaders}; use crate::fieldnorm::{FieldNormReader, FieldNormReaders};
@@ -246,7 +246,7 @@ impl IndexMerger {
for reader in &self.readers { for reader in &self.readers {
let u64_reader: FastFieldReader<u64> = reader let u64_reader: FastFieldReader<u64> = reader
.fast_fields() .fast_fields()
.u64_lenient(field) .typed_fast_field_reader(field)
.expect("Failed to find a reader for single fast field. This is a tantivy bug and it should never happen."); .expect("Failed to find a reader for single fast field. This is a tantivy bug and it should never happen.");
if let Some((seg_min_val, seg_max_val)) = if let Some((seg_min_val, seg_max_val)) =
compute_min_max_val(&u64_reader, reader.max_doc(), reader.delete_bitset()) compute_min_max_val(&u64_reader, reader.max_doc(), reader.delete_bitset())
@@ -290,7 +290,7 @@ impl IndexMerger {
fast_field_serializer: &mut FastFieldSerializer, fast_field_serializer: &mut FastFieldSerializer,
) -> crate::Result<()> { ) -> crate::Result<()> {
let mut total_num_vals = 0u64; let mut total_num_vals = 0u64;
let mut u64s_readers: Vec<MultiValueIntFastFieldReader<u64>> = Vec::new(); let mut u64s_readers: Vec<MultiValuedFastFieldReader<u64>> = Vec::new();
// In the first pass, we compute the total number of vals. // In the first pass, we compute the total number of vals.
// //
@@ -298,9 +298,8 @@ impl IndexMerger {
// what should be the bit length use for bitpacking. // what should be the bit length use for bitpacking.
for reader in &self.readers { for reader in &self.readers {
let u64s_reader = reader.fast_fields() let u64s_reader = reader.fast_fields()
.u64s_lenient(field) .typed_fast_field_multi_reader(field)
.expect("Failed to find index for multivalued field. This is a bug in tantivy, please report."); .expect("Failed to find index for multivalued field. This is a bug in tantivy, please report.");
if let Some(delete_bitset) = reader.delete_bitset() { if let Some(delete_bitset) = reader.delete_bitset() {
for doc in 0u32..reader.max_doc() { for doc in 0u32..reader.max_doc() {
if delete_bitset.is_alive(doc) { if delete_bitset.is_alive(doc) {
@@ -353,7 +352,7 @@ impl IndexMerger {
for (segment_ord, segment_reader) in self.readers.iter().enumerate() { for (segment_ord, segment_reader) in self.readers.iter().enumerate() {
let term_ordinal_mapping: &[TermOrdinal] = let term_ordinal_mapping: &[TermOrdinal] =
term_ordinal_mappings.get_segment(segment_ord); term_ordinal_mappings.get_segment(segment_ord);
let ff_reader: MultiValueIntFastFieldReader<u64> = segment_reader let ff_reader: MultiValuedFastFieldReader<u64> = segment_reader
.fast_fields() .fast_fields()
.u64s(field) .u64s(field)
.expect("Could not find multivalued u64 fast value reader."); .expect("Could not find multivalued u64 fast value reader.");
@@ -397,8 +396,10 @@ impl IndexMerger {
// We go through a complete first pass to compute the minimum and the // We go through a complete first pass to compute the minimum and the
// maximum value and initialize our Serializer. // maximum value and initialize our Serializer.
for reader in &self.readers { for reader in &self.readers {
let ff_reader: MultiValueIntFastFieldReader<u64> = let ff_reader: MultiValuedFastFieldReader<u64> = reader
reader.fast_fields().u64s_lenient(field).expect( .fast_fields()
.typed_fast_field_multi_reader(field)
.expect(
"Failed to find multivalued fast field reader. This is a bug in \ "Failed to find multivalued fast field reader. This is a bug in \
tantivy. Please report.", tantivy. Please report.",
); );
@@ -445,11 +446,7 @@ impl IndexMerger {
let mut bytes_readers: Vec<BytesFastFieldReader> = Vec::new(); let mut bytes_readers: Vec<BytesFastFieldReader> = Vec::new();
for reader in &self.readers { for reader in &self.readers {
let bytes_reader = reader.fast_fields().bytes(field).ok_or_else(|| { let bytes_reader = reader.fast_fields().bytes(field)?;
crate::TantivyError::InvalidArgument(
"Bytes fast field {:?} not found in segment.".to_string(),
)
})?;
if let Some(delete_bitset) = reader.delete_bitset() { if let Some(delete_bitset) = reader.delete_bitset() {
for doc in 0u32..reader.max_doc() { for doc in 0u32..reader.max_doc() {
if delete_bitset.is_alive(doc) { if delete_bitset.is_alive(doc) {

View File

@@ -96,7 +96,7 @@
//! A good place for you to get started is to check out //! A good place for you to get started is to check out
//! the example code ( //! the example code (
//! [literate programming](https://tantivy-search.github.io/examples/basic_search.html) / //! [literate programming](https://tantivy-search.github.io/examples/basic_search.html) /
//! [source code](https://github.com/tantivy-search/tantivy/blob/master/examples/basic_search.rs)) //! [source code](https://github.com/tantivy-search/tantivy/blob/main/examples/basic_search.rs))
#[cfg_attr(test, macro_use)] #[cfg_attr(test, macro_use)]
extern crate serde_json; extern crate serde_json;
@@ -866,39 +866,39 @@ mod tests {
let searcher = reader.searcher(); let searcher = reader.searcher();
let segment_reader: &SegmentReader = searcher.segment_reader(0); let segment_reader: &SegmentReader = searcher.segment_reader(0);
{ {
let fast_field_reader_opt = segment_reader.fast_fields().u64(text_field); let fast_field_reader_res = segment_reader.fast_fields().u64(text_field);
assert!(fast_field_reader_opt.is_none()); assert!(fast_field_reader_res.is_err());
} }
{ {
let fast_field_reader_opt = segment_reader.fast_fields().u64(stored_int_field); let fast_field_reader_opt = segment_reader.fast_fields().u64(stored_int_field);
assert!(fast_field_reader_opt.is_none()); assert!(fast_field_reader_opt.is_err());
} }
{ {
let fast_field_reader_opt = segment_reader.fast_fields().u64(fast_field_signed); let fast_field_reader_opt = segment_reader.fast_fields().u64(fast_field_signed);
assert!(fast_field_reader_opt.is_none()); assert!(fast_field_reader_opt.is_err());
} }
{ {
let fast_field_reader_opt = segment_reader.fast_fields().u64(fast_field_float); let fast_field_reader_opt = segment_reader.fast_fields().u64(fast_field_float);
assert!(fast_field_reader_opt.is_none()); assert!(fast_field_reader_opt.is_err());
} }
{ {
let fast_field_reader_opt = segment_reader.fast_fields().u64(fast_field_unsigned); let fast_field_reader_opt = segment_reader.fast_fields().u64(fast_field_unsigned);
assert!(fast_field_reader_opt.is_some()); assert!(fast_field_reader_opt.is_ok());
let fast_field_reader = fast_field_reader_opt.unwrap(); let fast_field_reader = fast_field_reader_opt.unwrap();
assert_eq!(fast_field_reader.get(0), 4u64) assert_eq!(fast_field_reader.get(0), 4u64)
} }
{ {
let fast_field_reader_opt = segment_reader.fast_fields().i64(fast_field_signed); let fast_field_reader_res = segment_reader.fast_fields().i64(fast_field_signed);
assert!(fast_field_reader_opt.is_some()); assert!(fast_field_reader_res.is_ok());
let fast_field_reader = fast_field_reader_opt.unwrap(); let fast_field_reader = fast_field_reader_res.unwrap();
assert_eq!(fast_field_reader.get(0), 4i64) assert_eq!(fast_field_reader.get(0), 4i64)
} }
{ {
let fast_field_reader_opt = segment_reader.fast_fields().f64(fast_field_float); let fast_field_reader_res = segment_reader.fast_fields().f64(fast_field_float);
assert!(fast_field_reader_opt.is_some()); assert!(fast_field_reader_res.is_ok());
let fast_field_reader = fast_field_reader_opt.unwrap(); let fast_field_reader = fast_field_reader_res.unwrap();
assert_eq!(fast_field_reader.get(0), 4f64) assert_eq!(fast_field_reader.get(0), 4f64)
} }
Ok(()) Ok(())

View File

@@ -132,7 +132,7 @@ impl PositionReader {
"offset arguments should be increasing." "offset arguments should be increasing."
); );
let delta_to_block_offset = offset as i64 - self.block_offset as i64; let delta_to_block_offset = offset as i64 - self.block_offset as i64;
if delta_to_block_offset < 0 || delta_to_block_offset >= 128 { if !(0..128).contains(&delta_to_block_offset) {
// The first position is not within the first block. // The first position is not within the first block.
// We need to decompress the first block. // We need to decompress the first block.
let delta_to_anchor_offset = offset - self.anchor_offset; let delta_to_anchor_offset = offset - self.anchor_offset;

View File

@@ -1,14 +1,11 @@
use crate::common::HasLen; use crate::common::HasLen;
use crate::directory::FileSlice;
use crate::docset::DocSet; use crate::docset::DocSet;
use crate::fastfield::DeleteBitSet; use crate::fastfield::DeleteBitSet;
use crate::positions::PositionReader; use crate::positions::PositionReader;
use crate::postings::compression::COMPRESSION_BLOCK_SIZE; use crate::postings::compression::COMPRESSION_BLOCK_SIZE;
use crate::postings::serializer::PostingsSerializer;
use crate::postings::BlockSearcher; use crate::postings::BlockSearcher;
use crate::postings::BlockSegmentPostings; use crate::postings::BlockSegmentPostings;
use crate::postings::Postings; use crate::postings::Postings;
use crate::schema::IndexRecordOption;
use crate::{DocId, TERMINATED}; use crate::{DocId, TERMINATED};
/// `SegmentPostings` represents the inverted list or postings associated to /// `SegmentPostings` represents the inverted list or postings associated to
@@ -68,7 +65,11 @@ impl SegmentPostings {
/// It serializes the doc ids using tantivy's codec /// It serializes the doc ids using tantivy's codec
/// and returns a `SegmentPostings` object that embeds a /// and returns a `SegmentPostings` object that embeds a
/// buffer with the serialized data. /// buffer with the serialized data.
#[cfg(test)]
pub fn create_from_docs(docs: &[u32]) -> SegmentPostings { pub fn create_from_docs(docs: &[u32]) -> SegmentPostings {
use crate::directory::FileSlice;
use crate::postings::serializer::PostingsSerializer;
use crate::schema::IndexRecordOption;
let mut buffer = Vec::new(); let mut buffer = Vec::new();
{ {
let mut postings_serializer = let mut postings_serializer =
@@ -97,6 +98,9 @@ impl SegmentPostings {
doc_and_tfs: &[(u32, u32)], doc_and_tfs: &[(u32, u32)],
fieldnorms: Option<&[u32]>, fieldnorms: Option<&[u32]>,
) -> SegmentPostings { ) -> SegmentPostings {
use crate::directory::FileSlice;
use crate::postings::serializer::PostingsSerializer;
use crate::schema::IndexRecordOption;
use crate::fieldnorm::FieldNormReader; use crate::fieldnorm::FieldNormReader;
use crate::Score; use crate::Score;
let mut buffer: Vec<u8> = Vec::new(); let mut buffer: Vec<u8> = Vec::new();

View File

@@ -28,8 +28,7 @@ pub struct Checkpoint {
impl Checkpoint { impl Checkpoint {
pub(crate) fn follows(&self, other: &Checkpoint) -> bool { pub(crate) fn follows(&self, other: &Checkpoint) -> bool {
(self.start_doc == other.end_doc) && (self.start_doc == other.end_doc) && (self.start_offset == other.end_offset)
(self.start_offset == other.end_offset)
} }
} }
@@ -96,7 +95,7 @@ mod tests {
Checkpoint { Checkpoint {
start_doc: 0, start_doc: 0,
end_doc: 3, end_doc: 3,
start_offset: 4, start_offset: 0,
end_offset: 9, end_offset: 9,
}, },
Checkpoint { Checkpoint {
@@ -201,19 +200,21 @@ mod tests {
Ok(()) Ok(())
} }
fn integrate_delta(mut vals: Vec<u64>) -> Vec<u64> { fn integrate_delta(vals: Vec<u64>) -> Vec<u64> {
let mut output = Vec::with_capacity(vals.len() + 1);
output.push(0u64);
let mut prev = 0u64; let mut prev = 0u64;
for val in vals.iter_mut() { for val in vals {
let new_val = *val + prev; let new_val = val + prev;
prev = new_val; prev = new_val;
*val = new_val; output.push(new_val);
} }
vals output
} }
// Generates a sequence of n valid checkpoints, with n < max_len. // Generates a sequence of n valid checkpoints, with n < max_len.
fn monotonic_checkpoints(max_len: usize) -> BoxedStrategy<Vec<Checkpoint>> { fn monotonic_checkpoints(max_len: usize) -> BoxedStrategy<Vec<Checkpoint>> {
(1..max_len) (0..max_len)
.prop_flat_map(move |len: usize| { .prop_flat_map(move |len: usize| {
( (
proptest::collection::vec(1u64..20u64, len as usize).prop_map(integrate_delta), proptest::collection::vec(1u64..20u64, len as usize).prop_map(integrate_delta),

View File

@@ -35,11 +35,11 @@ struct Layer {
} }
impl Layer { impl Layer {
fn cursor<'a>(&'a self) -> impl Iterator<Item = Checkpoint> + 'a { fn cursor(&self) -> impl Iterator<Item = Checkpoint> + '_ {
self.cursor_at_offset(0u64) self.cursor_at_offset(0u64)
} }
fn cursor_at_offset<'a>(&'a self, start_offset: u64) -> impl Iterator<Item = Checkpoint> + 'a { fn cursor_at_offset(&self, start_offset: u64) -> impl Iterator<Item = Checkpoint> + '_ {
let data = &self.data.as_slice(); let data = &self.data.as_slice();
LayerCursor { LayerCursor {
remaining: &data[start_offset as usize..], remaining: &data[start_offset as usize..],
@@ -77,7 +77,7 @@ impl SkipIndex {
SkipIndex { layers } SkipIndex { layers }
} }
pub(crate) fn checkpoints<'a>(&'a self) -> impl Iterator<Item = Checkpoint> + 'a { pub(crate) fn checkpoints(&self) -> impl Iterator<Item = Checkpoint> + '_ {
self.layers self.layers
.last() .last()
.into_iter() .into_iter()

View File

@@ -46,7 +46,7 @@ impl StoreReader {
}) })
} }
pub(crate) fn block_checkpoints<'a>(&'a self) -> impl Iterator<Item = Checkpoint> + 'a { pub(crate) fn block_checkpoints(&self) -> impl Iterator<Item = Checkpoint> + '_ {
self.skip_index.checkpoints() self.skip_index.checkpoints()
} }