Commit Graph

2730 Commits

Author SHA1 Message Date
Paul Masurel
8ca12a5683 Added stop word filter to CHANGELOG.md 2022-11-09 17:00:45 +09:00
Adam Reichold
a4b759d2fe Include stop word lists from Lucene and the Snowball project (#1666) 2022-11-09 16:57:35 +09:00
PSeitz
3e9c806890 Merge pull request #1665 from quickwit-oss/fix_num_vals
fix num_vals on u128 value index after merge
2022-11-07 21:46:02 +08:00
Pascal Seitz
c69a873dd3 fix num_vals on value index after merge 2022-11-07 21:05:21 +08:00
PSeitz
666afcf641 Merge pull request #1663 from PSeitz/fix_clippy
fix clippy
2022-11-07 18:11:20 +08:00
Pascal Seitz
38ad46e580 fix clippy 2022-11-07 16:09:55 +08:00
PSeitz
e948889f4c Merge pull request #1662 from quickwit-oss/fix_num_vals
fix num_vals in multivalue index after merge
2022-11-07 15:57:32 +08:00
Pascal Seitz
6e636c9cea fix num_vals in multivalue index after merge 2022-11-07 15:00:52 +08:00
PSeitz
5a610efbc1 Merge pull request #1661 from quickwit-oss/upgrade_criterion
update criterion to 0.4
2022-11-04 14:45:34 +08:00
Pascal Seitz
500a0d5e48 update criterion to 0.4 2022-11-04 13:26:29 +08:00
PSeitz
509a265659 add docstore version (#1652)
* add docstore version

closes #1589

* assert for docstore version
2022-11-04 10:19:16 +09:00
PSeitz
5b2cea1b97 Merge pull request #1656 from quickwit-oss/multival_offset_index
move multivalue index to own file
2022-11-02 14:03:06 +08:00
PSeitz
a5a80ffaea Update fastfield_codecs/src/column.rs
Co-authored-by: Paul Masurel <paul@quickwit.io>
2022-11-02 06:37:27 +01:00
PSeitz
0f98d91a39 Merge pull request #1646 from quickwit-oss/no_score_calls
No score calls if score is not requested
2022-11-01 20:09:32 +08:00
PSeitz
2af6b01c17 Update src/query/boolean_query/boolean_weight.rs
Co-authored-by: Paul Masurel <paul@quickwit.io>
2022-11-01 16:13:00 +08:00
Adam Reichold
c32ab66bbd Small improvements to StopWorldFilter (#1657)
* Do not copy the whole set of stop words for each stream

* Make construction of StopWordFilter more flexible.
2022-11-01 16:47:34 +09:00
PSeitz
3f3a6f9990 Merge pull request #1653 from quickwit-oss/faster_hash
switch to fx hashmap
2022-11-01 14:53:18 +08:00
Pascal Seitz
83325d8f3f move multivalue index to own file
start_doc parameter in positions to docids
2022-11-01 10:36:13 +08:00
PSeitz
4e46f4f8c4 Merge pull request #1649 from adamreichold/split-compound-words
RFC: Add dictionary-based SplitCompoundWords token filter.
2022-10-27 17:12:48 +08:00
Pascal Seitz
43df356010 rename to docset 2022-10-27 16:53:38 +08:00
PSeitz
6647362464 Merge pull request #1648 from adamreichold/stemmer-todo-alloc
Avoid unconditional allocation in StemmerTokenStream.
2022-10-27 16:50:41 +08:00
Pascal Seitz
279b1b28d3 switch to fx hashmap 2022-10-27 16:19:59 +08:00
PSeitz
7a80851e36 Merge pull request #1645 from quickwit-oss/ip_field_range_query
add ip range query benchmark, add seek behaviour
2022-10-27 16:13:52 +08:00
Adam Reichold
cd952429d2 Add dictionary-based SplitCompoundWords token filter. 2022-10-27 08:30:33 +02:00
PSeitz
d777c964da Merge pull request #1650 from adamreichold/fnv-rustc-hash
Replace FNV by rustc-hash
2022-10-27 12:11:26 +08:00
Adam Reichold
bbb058d976 Replace FNV by rustc-hash
Both construction have similar goals but rustc-hash ist better suited for
contemporary CPU as it works one word at a time instead of byte per byte.
2022-10-27 00:35:09 +02:00
Adam Reichold
5f7d027a52 Avoid unconditional allocation in StemmerTokenStream.
This fixes the TODO in two ways: If the stemmer already yields an owned string,
it is used directly as the new text of the token. Otherwise, a temporary buffer
is used to copy the stemmed text (just as before) and then swapping it into the
token to reuse its existing buffer.
2022-10-26 18:11:15 +02:00
Pascal Seitz
dfab201191 for_each_docset to iterate without score 2022-10-26 17:25:05 +08:00
PSeitz
0c2bd36fe3 Panic on duplicate field names (#1647)
fixes #1601
2022-10-26 16:17:33 +09:00
Pascal Seitz
af839753e0 No score calls if score is not requested 2022-10-26 12:18:35 +08:00
Pascal Seitz
fec2b63571 improve bench by adding more blanks in compact space 2022-10-25 22:09:01 +08:00
Pascal Seitz
6213ea476a pass positions parameter 2022-10-25 17:44:51 +08:00
Pascal Seitz
5e159c26bf add ip range query benchmark, add seek behaviour 2022-10-25 15:57:19 +08:00
PSeitz
a5e59ab598 Merge pull request #1644 from quickwit-oss/get_val_u32
switch get_val() to u32
2022-10-24 19:30:03 +08:00
Pascal Seitz
e772d3170d switch get_val() to u32
Fixes #1638
2022-10-24 19:05:57 +08:00
PSeitz
8c2ba7bd55 Merge pull request #1637 from quickwit-oss/ip_field_range_query
add range query via ip fast field
2022-10-24 18:10:47 +08:00
Pascal Seitz
02328b0151 fix proptest 2022-10-24 17:46:06 +08:00
Pascal Seitz
7cc775256c add comments, rename 2022-10-24 17:08:37 +08:00
Pascal Seitz
07b40f8b8b add proptest 2022-10-24 16:52:55 +08:00
PSeitz
9b6b6be5b9 Apply suggestions from code review
Co-authored-by: Paul Masurel <paul@quickwit.io>
2022-10-24 16:00:38 +08:00
Pascal Seitz
6bb73a527f add range query via ip fast field 2022-10-24 16:00:38 +08:00
PSeitz
03885d0f3c Merge pull request #1643 from quickwit-oss/range_query_parser
allow more characters in range query
2022-10-24 15:09:47 +08:00
Pascal Seitz
f2e5135870 allow more characters in range query
closes #1642
2022-10-21 18:05:15 +08:00
Paul Masurel
c24157f28b Bumping version format. (#1640)
The docstore format has changed in a non-compatible manner.
2022-10-21 15:35:35 +09:00
PSeitz
873382cdcb Merge pull request #1639 from quickwit-oss/num_vals_u32
switch num_vals() to u32
2022-10-21 12:36:50 +08:00
Pascal Seitz
791350091c switch num_vals() to u32
fixes #1630
2022-10-20 19:44:28 +08:00
Paul Masurel
483b1d13d4 Added unit test for long tokens (#1635)
* Bugfix on long tokens and multivalue text fields.

Fixes a minor bug for the strong edge case
in which a tokenizer would emit tokens where
the last token does not cover the last position.

More importantly, this adds unit tests.

Closes #1634

* Update src/indexer/segment_writer.rs

Co-authored-by: PSeitz <PSeitz@users.noreply.github.com>

Co-authored-by: PSeitz <PSeitz@users.noreply.github.com>
2022-10-20 15:05:37 +09:00
PSeitz
8de7fa9d95 Merge pull request #1631 from quickwit-oss/high_positions
add test for phrase search on multi text field
2022-10-20 10:26:00 +08:00
Paul Masurel
94313b62f8 Hotfix issue/1629 - position broken (#1633)
* Bugfix position broken.

For Field with several FieldValues, with a
value that contained no token at all, the token position
was reinitialized to 0.

As a result, PhraseQueries can show some false positives.
In addition, after the computation of the position delta, we can
underflow u32, and end up with gigantic delta.

We haven't been able to actually explain the bug in 1629, but it
is assumed that in some corner case these delta can cause a panic.

Closes #1629
2022-10-20 11:03:55 +09:00
Pascal Seitz
f2b2628feb add test for phrase search on multi text field 2022-10-19 16:29:56 +08:00