Commit Graph

312 Commits

Author SHA1 Message Date
Ype Kingma
7d773abc92 Boolean query: do not combine excluded scores. (#840)
* Do nothing when combining score values of excluded scores.

* Add test case for two excluded.

* Test score for two excluded terms.

* Use TopDocs in test_boolean_query_two_excluded
2020-06-08 20:01:19 +09:00
Paul Masurel
c0f5645cd9 Move for_each functions from Scorer to Weight. (#836)
* Move for_each functions from Scorer to Weight.

* Specialized foreach / foreach_pruning for union of termscorer.
2020-06-01 11:31:18 +09:00
Paul Masurel
baf015fc57 Simplification of the segment postings seek implementation. (#834) 2020-05-27 08:49:47 +09:00
Paul Masurel
7275ebdf3c Skiprefactoring skipabsolute (#831)
Simplification of the way we handle positions.
2020-05-25 09:51:23 +09:00
Paul Masurel
8f8f34499f Updated CHANGELOG with the TopCollector offset information and cargo fmt. 2020-05-20 22:26:54 +09:00
Paul Masurel
e25284bafe Major change in the DocSet/Scorer API (#824)
- Change in the DocSet and Scorer API. (@fulmicoton). 
A freshly created DocSet point directly to their first doc. A sentinel value called TERMINATED marks the end of a DocSet.
`.advance()` returns the new DocId. `Scorer::skip(target)` has been replaced by `Scorer::seek(target)` and returns the resulting DocId.
As a result, iterating through DocSet now looks as follows
```rust
let mut doc = docset.doc();
while doc != TERMINATED {
   // ...
   doc = docset.advance();
}
```
The change made it possible to greatly simplify a lot of the docset's code.
- Misc internal optimization and introduction of the `Scorer::for_each_pruning` function. (@fulmicoton)
2020-05-16 16:33:36 +09:00
Rob Young
9de1360538 Minor doc and test improvements around fuzzy querying (#825) 2020-05-16 10:59:24 +09:00
Paul Masurel
c0be461191 Removing tantivy-fst conf and removing warning. (#813) 2020-04-18 20:19:23 +09:00
Paul Masurel
186d7fc20e Fix build 2020-04-01 09:32:45 +09:00
Paul Masurel
cfbdef5186 Using tantivy-fst version 0.3. 2020-03-31 23:24:54 +09:00
Paul Masurel
d04368b1d4 Closes #788. OR not working when using conjunction by default. (#802) 2020-03-31 21:13:50 +09:00
Chen Xu
b167058028 Fix prefix option for FuzzyTermQuery (#797)
* Fix prefix option for FuzzyTermQuery

* Update changelog
2020-03-19 20:19:32 +09:00
Paul Masurel
262957717b unit test fix and use of matches 2020-03-15 00:20:17 +09:00
Paul Masurel
486b8fa9c5 Removing serde-derive dependency (#786) 2020-03-06 23:33:58 +09:00
Paul Masurel
6542dd5337 Removing parenthesis. 2020-03-01 09:41:53 +09:00
Paul Masurel
7d6cfa58e1 [WIP] Alternative take on boosted queries (#772)
* Alternative take on boosted queries

* Fixing unit test

* Added boosting to the query grammar.

* Made BoostQuery public.

* Added support for boosting field in QueryParser

Closes #547
2020-02-19 11:04:38 +09:00
Paul Masurel
ae14022bf0 Removed use::Result. (#771) 2020-01-31 18:47:02 +09:00
Paul Masurel
811fd0cb9e Dynamic analyzer (#755)
* Removed generics in tokenizers

* lowercaser

* Added TokenizerExt

* Introducing BoxedTokenizer

* Introducing BoxXXXXX helper struct

* Closes #762.

* Introducing a TextAnalyzer
2020-01-29 18:23:37 +09:00
Paul Masurel
c1400f25a7 Handle facet search in the QueryParser. (#741)
Closes #738
2019-12-25 17:43:33 +09:00
Paul Masurel
ef3eddf3da clippy first stab (#711) 2019-11-22 13:09:35 +09:00
Paul Masurel
7b21b3f25a Refactoring around Field (#673)
* Refactoring around Field

Removing the contract about the order of the field, and the
field id allocation.

* Update delete_queue.rs

* Update field.rs
2019-10-25 09:06:44 +09:00
petr-tik
4a8f7712f3 Add a doctest to BooleanQuery (#630)
* Add a doctest to BooleanQuery

Closes #446

Mark a function that is only used in tests to be compiled for tests only

Fix doc-comments in a couple of related files

* Minor corrections

remove whitespace, fix typos, add explicit dyn marker

* WIP: BooleanQuery doc test

Trying to nest several BooleanQueries together

* Addressed old review

rust 2018 edition + make function available to everyone

* Box the previous query to resolve the type error

* Rework wording in DocAdress document strings

* Reworded and restructured the docstring
2019-10-07 10:05:12 +09:00
Paul Masurel
2f867aad17 Fix bench (#663)
* fmt

* Fixing bench compilation
2019-10-04 17:07:49 +09:00
Paul Masurel
5c6580eb15 fmt (#661) 2019-10-04 12:10:01 +09:00
fdb-hiroshima
1a817f117f fix documentation error (#654)
Union missdocumented as doing an intersection
Union and Intersection can hold more than 2 DocSets
2019-09-11 17:12:08 +09:00
Paul Masurel
4b9c1dce69 Moving queyr grammar to a different crate. (#645) 2019-09-05 09:37:28 +09:00
Paul Masurel
c1635c13f6 RegexQuery performance: make it possible to cache Regexes - remastered by fulmicoton (Closes #639) (#641)
* small docs cleanup

* only compile a regex once per RegexQuery

Building a `Regex` is an expensive operation. Users of `RegexQuery`
need to cache and reuse regexes when searching across multiple fields.

This is the first step towards allowing that: we can store the `Regex`
directly in the `RegexQuery`, instead of the string pattern.

* RegexQuery: account for possible failure in the constructor

When building a regex from a str pattern, we have to account for the
possibility that the pattern is invalid. Before the previous commit, the
failure would happen in the `specialized_weight` method. Now that we
store a compiled `Regex` in `RegexQuery`, `specialized_weight` doesn't
fail anymore, and we can fail early while constructing `RegexQuery` if
the pattern is invalid.

This is a breaking change for users of `RegexQuery::new`.

* add RegexQuery::from_regex method

This builds a `RegexQuery` from an already compiled `Regex`. The use of
`Into<Arc<Regex>>` is to allow the caller to either simply pass a
`Regex`, or an `Arc<Regex>`, in case it needs to be cached and shared on
the caller's side.

* Using an Arc in AutomatonWeight

Closes #639
2019-08-22 16:14:01 +09:00
Joshua Dutton
9f74786db2 Update import statements in examples, doctests (#633)
Update import statements to edition 2018, including removing
`extern crate` and  `#[macro_use]`. Alphabetize the statements.
2019-08-19 07:26:35 +09:00
Paul Masurel
b3b0138b82 Change for tantivy-py
Schema.convert_named_doc
Better Debug string for Terms and TermQueries
2019-08-14 17:44:25 +09:00
petr-tik
028b0a749c Elastic unbounded range query (#624)
* Tidy up

fmt

remove unneccessary -> Result<()> followed by run.unwrap() in a test

* Adding support for elasticsearch-style unbounded queries

Extend the UserInputBound to include Unbounded, so we can reuse formatting and
internal query format

* Still working on elastic-style range queries

Fixes #498

Merge the elastic_range into range

Reformat to make code easier to follow, use optional() macro to return Some

* Fixed bugs

Made the range parser insensitive to whitespace between the ":" and the range.

Removed optional parsing of field.

Added a unit test for the range parser.

Derived PartialEq to compare the results of parsing as structs, instead of
strings. Found a bug with that unit test - "*}" was parsed as an
UserInputBound::Exclusive, instead of UserInputBound::Unbounded. Added an early
detection-and-return for * in the original range parser

* Correct failing test

Assume that we will use "{*" for Unbounded ranges

* Add a note in the changelog

cargo-fmt

* Moved parenthesis to a newline to make nested if-else more visible
2019-08-12 08:24:47 +09:00
Paul Masurel
941f06eb9f Added Schema.from_named_doc 2019-08-11 16:50:32 +09:00
Paul Masurel
04832a86eb WTF is this file doing here (#622) 2019-08-08 21:54:10 +09:00
fdb-hiroshima
beb8e990cd fix parsing neg float in range query (#621)
fix #620
2019-08-08 20:41:04 +09:00
Paul Masurel
001af3876f cargo fmt 2019-08-08 18:07:19 +09:00
Paul Masurel
f428f344da Various bugfix in the query parser (#619) 2019-08-08 17:48:21 +09:00
Paul Masurel
143f78eced Trying to fix #609 (#616) 2019-08-06 20:33:30 +09:00
Paul Masurel
280ea1209c Changes required for python binding (#610) 2019-08-01 17:26:21 +09:00
petr-tik
0154dbe477 Replace unwrap with match and proper Error handling (#606)
* Replace unwrap with match and proper Error handling

* Replaced 'magic' values with a documented variable

Didn't like the unexplained 0..3 range, thought it was best as a variable

Calculating Levenshtein distance is expensive, so best explain why we should
keep it low
2019-07-31 08:16:02 +09:00
fdb-hiroshima
6eb4e08636 add support for float (#603)
* add basic support for float

as for i64, they are mapped to u64 for indexing
query parser don't work yet

* Update value.rs

* implement support for float in query parser

* Update README.md
2019-07-27 17:57:33 +09:00
Paul Masurel
c3231ca252 Added phrase query tests (#601) 2019-07-22 13:43:00 +09:00
Paul Masurel
498057c5b7 Refactor deletes (#597)
* Refactor deletes

* Removing generation from SegmentUpdater. These have been obsolete for a long time

* Number literal clippy

* Removed clippy useless allow statement
2019-07-17 13:06:44 +09:00
Paul Masurel
4867be3d3b Kompass master (#590)
* Use once_cell in place of lazy_static

* Minor changes
2019-07-10 19:24:54 +09:00
Paul Masurel
0bc2c64a53 2018 (#585)
* removing macro import for fail-rs

* Downcast-rs

* matches
2019-07-07 17:09:04 +09:00
Paul Masurel
462774b15c Tiqb feature/2018 (#583)
* rust 2018

* Added CHANGELOG comment
2019-07-01 10:01:46 +09:00
Kirill Zaborsky
f52b1e68d1 Fix typo (#573) 2019-06-27 09:57:37 +09:00
Paul Masurel
4822940b19 Issue/36 (#559)
* Added explanation

* Explain

* Splitting weight and idf

* Added comments

Closes #36
2019-06-06 10:03:54 +09:00
Paul Masurel
66b4615e4e Issue/542 (#543)
* Closes 542.

Fast fields are all loaded when the segment reader is created.
2019-05-05 13:52:43 +09:00
Paul Masurel
b7c2d0de97 Clippy2 (#534)
* Clippy comments

Clippy complaints that about the cast of &[u32] to a *const __m128i,
because of the lack of alignment constraints.

This commit passes the OutputBuffer object (which enforces proper
    alignment) instead of `&[u32]`.

* Clippy. Block alignment

* Code simplification

* Added comment. Code simplification

* Removed the extraneous freq block len hack.
2019-04-24 12:31:32 +09:00
Paul Masurel
d823163d52 Closes #527. (#529)
Fixing the bug that affects the result of `query.count()` in presence of
deletes.
2019-04-19 09:19:50 +09:00
Paul Masurel
a8cc5208f1 Linear simd (#519)
* linear simd search within block
2019-03-20 22:10:05 +09:00