Tantivy used to assume that all files could be somehow memory mapped. After this change, Directory return a `FileSlice` that can be reduced and eventually read into an `OwnedBytes` object. Long and blocking io operation are still required by they do not span over the entire file.
* add support for indexed bytes fast field
* remove backup code file
* refine test cases
* Simplified unit test. Renamed it as it is testing the storable part. Not the indexed part.
* Small refactoring and added unit test. If multivalued we only retain the first FAST value.
Co-authored-by: Raul <raul.tang.lc@gmail.com>
go_to_first_doc was typically calling seek with a target smaller than
doc.
Since SegmentPostings typically do a linear search on the full block,
regardless of the current position, it could have our segment postings
go backward.
* Do nothing when combining score values of excluded scores.
* Add test case for two excluded.
* Test score for two excluded terms.
* Use TopDocs in test_boolean_query_two_excluded
- Change in the DocSet and Scorer API. (@fulmicoton).
A freshly created DocSet point directly to their first doc. A sentinel value called TERMINATED marks the end of a DocSet.
`.advance()` returns the new DocId. `Scorer::skip(target)` has been replaced by `Scorer::seek(target)` and returns the resulting DocId.
As a result, iterating through DocSet now looks as follows
```rust
let mut doc = docset.doc();
while doc != TERMINATED {
// ...
doc = docset.advance();
}
```
The change made it possible to greatly simplify a lot of the docset's code.
- Misc internal optimization and introduction of the `Scorer::for_each_pruning` function. (@fulmicoton)
* Alternative take on boosted queries
* Fixing unit test
* Added boosting to the query grammar.
* Made BoostQuery public.
* Added support for boosting field in QueryParser
Closes#547
* Add a doctest to BooleanQuery
Closes#446
Mark a function that is only used in tests to be compiled for tests only
Fix doc-comments in a couple of related files
* Minor corrections
remove whitespace, fix typos, add explicit dyn marker
* WIP: BooleanQuery doc test
Trying to nest several BooleanQueries together
* Addressed old review
rust 2018 edition + make function available to everyone
* Box the previous query to resolve the type error
* Rework wording in DocAdress document strings
* Reworded and restructured the docstring
* small docs cleanup
* only compile a regex once per RegexQuery
Building a `Regex` is an expensive operation. Users of `RegexQuery`
need to cache and reuse regexes when searching across multiple fields.
This is the first step towards allowing that: we can store the `Regex`
directly in the `RegexQuery`, instead of the string pattern.
* RegexQuery: account for possible failure in the constructor
When building a regex from a str pattern, we have to account for the
possibility that the pattern is invalid. Before the previous commit, the
failure would happen in the `specialized_weight` method. Now that we
store a compiled `Regex` in `RegexQuery`, `specialized_weight` doesn't
fail anymore, and we can fail early while constructing `RegexQuery` if
the pattern is invalid.
This is a breaking change for users of `RegexQuery::new`.
* add RegexQuery::from_regex method
This builds a `RegexQuery` from an already compiled `Regex`. The use of
`Into<Arc<Regex>>` is to allow the caller to either simply pass a
`Regex`, or an `Arc<Regex>`, in case it needs to be cached and shared on
the caller's side.
* Using an Arc in AutomatonWeight
Closes#639