Commit Graph

2101 Commits

Author SHA1 Message Date
Pascal Seitz
b7d0dd154a fmt 2022-11-14 14:49:15 +08:00
PSeitz
ce10fab20f Apply suggestions from code review
Co-authored-by: Paul Masurel <paul@quickwit.io>
2022-11-14 14:21:53 +08:00
Pascal Seitz
e034328a8b Improve position_to_docid, refactor, add tests 2022-11-14 14:21:53 +08:00
Pascal Seitz
f811d1616b add support for ip range query on multivalue fastfields 2022-11-14 14:21:52 +08:00
PSeitz
c665b16ff0 Merge pull request #1672 from quickwit-oss/allow_range_without_indexed
Allow range query on fastfield without INDEXED
2022-11-14 12:45:12 +08:00
PSeitz
3b5f810051 Merge pull request #1677 from quickwit-oss/switch_to_u32
switch total_num_val to u32
2022-11-14 12:01:40 +08:00
trinity-1686a
5765c261aa allow warming up of the full posting list (#1673)
* allow warming up of the full posting list

* cargo fmt
2022-11-14 10:27:56 +09:00
Pascal Seitz
fb9f03118d switch total_num_val to u32 2022-11-11 17:35:52 +08:00
Pascal Seitz
9e8a0c2cca Allow range query on fastfield without INDEXED 2022-11-10 15:56:08 +08:00
Paul Masurel
3edf0a2724 Using the manual reload policy in IndexWriter. (#1667) 2022-11-09 11:20:41 +01:00
Adam Reichold
a4b759d2fe Include stop word lists from Lucene and the Snowball project (#1666) 2022-11-09 16:57:35 +09:00
PSeitz
3e9c806890 Merge pull request #1665 from quickwit-oss/fix_num_vals
fix num_vals on u128 value index after merge
2022-11-07 21:46:02 +08:00
Pascal Seitz
c69a873dd3 fix num_vals on value index after merge 2022-11-07 21:05:21 +08:00
Pascal Seitz
38ad46e580 fix clippy 2022-11-07 16:09:55 +08:00
Pascal Seitz
6e636c9cea fix num_vals in multivalue index after merge 2022-11-07 15:00:52 +08:00
PSeitz
509a265659 add docstore version (#1652)
* add docstore version

closes #1589

* assert for docstore version
2022-11-04 10:19:16 +09:00
PSeitz
5b2cea1b97 Merge pull request #1656 from quickwit-oss/multival_offset_index
move multivalue index to own file
2022-11-02 14:03:06 +08:00
PSeitz
0f98d91a39 Merge pull request #1646 from quickwit-oss/no_score_calls
No score calls if score is not requested
2022-11-01 20:09:32 +08:00
PSeitz
2af6b01c17 Update src/query/boolean_query/boolean_weight.rs
Co-authored-by: Paul Masurel <paul@quickwit.io>
2022-11-01 16:13:00 +08:00
Adam Reichold
c32ab66bbd Small improvements to StopWorldFilter (#1657)
* Do not copy the whole set of stop words for each stream

* Make construction of StopWordFilter more flexible.
2022-11-01 16:47:34 +09:00
PSeitz
3f3a6f9990 Merge pull request #1653 from quickwit-oss/faster_hash
switch to fx hashmap
2022-11-01 14:53:18 +08:00
Pascal Seitz
83325d8f3f move multivalue index to own file
start_doc parameter in positions to docids
2022-11-01 10:36:13 +08:00
PSeitz
4e46f4f8c4 Merge pull request #1649 from adamreichold/split-compound-words
RFC: Add dictionary-based SplitCompoundWords token filter.
2022-10-27 17:12:48 +08:00
Pascal Seitz
43df356010 rename to docset 2022-10-27 16:53:38 +08:00
PSeitz
6647362464 Merge pull request #1648 from adamreichold/stemmer-todo-alloc
Avoid unconditional allocation in StemmerTokenStream.
2022-10-27 16:50:41 +08:00
Pascal Seitz
279b1b28d3 switch to fx hashmap 2022-10-27 16:19:59 +08:00
PSeitz
7a80851e36 Merge pull request #1645 from quickwit-oss/ip_field_range_query
add ip range query benchmark, add seek behaviour
2022-10-27 16:13:52 +08:00
Adam Reichold
cd952429d2 Add dictionary-based SplitCompoundWords token filter. 2022-10-27 08:30:33 +02:00
Adam Reichold
bbb058d976 Replace FNV by rustc-hash
Both construction have similar goals but rustc-hash ist better suited for
contemporary CPU as it works one word at a time instead of byte per byte.
2022-10-27 00:35:09 +02:00
Adam Reichold
5f7d027a52 Avoid unconditional allocation in StemmerTokenStream.
This fixes the TODO in two ways: If the stemmer already yields an owned string,
it is used directly as the new text of the token. Otherwise, a temporary buffer
is used to copy the stemmed text (just as before) and then swapping it into the
token to reuse its existing buffer.
2022-10-26 18:11:15 +02:00
Pascal Seitz
dfab201191 for_each_docset to iterate without score 2022-10-26 17:25:05 +08:00
PSeitz
0c2bd36fe3 Panic on duplicate field names (#1647)
fixes #1601
2022-10-26 16:17:33 +09:00
Pascal Seitz
af839753e0 No score calls if score is not requested 2022-10-26 12:18:35 +08:00
Pascal Seitz
fec2b63571 improve bench by adding more blanks in compact space 2022-10-25 22:09:01 +08:00
Pascal Seitz
6213ea476a pass positions parameter 2022-10-25 17:44:51 +08:00
Pascal Seitz
5e159c26bf add ip range query benchmark, add seek behaviour 2022-10-25 15:57:19 +08:00
Pascal Seitz
e772d3170d switch get_val() to u32
Fixes #1638
2022-10-24 19:05:57 +08:00
Pascal Seitz
02328b0151 fix proptest 2022-10-24 17:46:06 +08:00
Pascal Seitz
7cc775256c add comments, rename 2022-10-24 17:08:37 +08:00
Pascal Seitz
07b40f8b8b add proptest 2022-10-24 16:52:55 +08:00
PSeitz
9b6b6be5b9 Apply suggestions from code review
Co-authored-by: Paul Masurel <paul@quickwit.io>
2022-10-24 16:00:38 +08:00
Pascal Seitz
6bb73a527f add range query via ip fast field 2022-10-24 16:00:38 +08:00
Paul Masurel
c24157f28b Bumping version format. (#1640)
The docstore format has changed in a non-compatible manner.
2022-10-21 15:35:35 +09:00
Pascal Seitz
791350091c switch num_vals() to u32
fixes #1630
2022-10-20 19:44:28 +08:00
Paul Masurel
483b1d13d4 Added unit test for long tokens (#1635)
* Bugfix on long tokens and multivalue text fields.

Fixes a minor bug for the strong edge case
in which a tokenizer would emit tokens where
the last token does not cover the last position.

More importantly, this adds unit tests.

Closes #1634

* Update src/indexer/segment_writer.rs

Co-authored-by: PSeitz <PSeitz@users.noreply.github.com>

Co-authored-by: PSeitz <PSeitz@users.noreply.github.com>
2022-10-20 15:05:37 +09:00
PSeitz
8de7fa9d95 Merge pull request #1631 from quickwit-oss/high_positions
add test for phrase search on multi text field
2022-10-20 10:26:00 +08:00
Paul Masurel
94313b62f8 Hotfix issue/1629 - position broken (#1633)
* Bugfix position broken.

For Field with several FieldValues, with a
value that contained no token at all, the token position
was reinitialized to 0.

As a result, PhraseQueries can show some false positives.
In addition, after the computation of the position delta, we can
underflow u32, and end up with gigantic delta.

We haven't been able to actually explain the bug in 1629, but it
is assumed that in some corner case these delta can cause a panic.

Closes #1629
2022-10-20 11:03:55 +09:00
Pascal Seitz
f2b2628feb add test for phrase search on multi text field 2022-10-19 16:29:56 +08:00
PSeitz
449f595832 Merge pull request #1628 from quickwit-oss/skip_index_deser
faster skipindex deserialization, larger blocksize on sort
2022-10-19 11:05:20 +08:00
Pascal Seitz
a4485f7611 faster skipindex deserialization, larger blocksize on sort 2022-10-18 19:32:23 +08:00