PSeitz
5a610efbc1
Merge pull request #1661 from quickwit-oss/upgrade_criterion
...
update criterion to 0.4
2022-11-04 14:45:34 +08:00
Pascal Seitz
500a0d5e48
update criterion to 0.4
2022-11-04 13:26:29 +08:00
PSeitz
509a265659
add docstore version ( #1652 )
...
* add docstore version
closes #1589
* assert for docstore version
2022-11-04 10:19:16 +09:00
PSeitz
5b2cea1b97
Merge pull request #1656 from quickwit-oss/multival_offset_index
...
move multivalue index to own file
2022-11-02 14:03:06 +08:00
PSeitz
a5a80ffaea
Update fastfield_codecs/src/column.rs
...
Co-authored-by: Paul Masurel <paul@quickwit.io >
2022-11-02 06:37:27 +01:00
PSeitz
0f98d91a39
Merge pull request #1646 from quickwit-oss/no_score_calls
...
No score calls if score is not requested
2022-11-01 20:09:32 +08:00
PSeitz
2af6b01c17
Update src/query/boolean_query/boolean_weight.rs
...
Co-authored-by: Paul Masurel <paul@quickwit.io >
2022-11-01 16:13:00 +08:00
Adam Reichold
c32ab66bbd
Small improvements to StopWorldFilter ( #1657 )
...
* Do not copy the whole set of stop words for each stream
* Make construction of StopWordFilter more flexible.
2022-11-01 16:47:34 +09:00
PSeitz
3f3a6f9990
Merge pull request #1653 from quickwit-oss/faster_hash
...
switch to fx hashmap
2022-11-01 14:53:18 +08:00
Pascal Seitz
83325d8f3f
move multivalue index to own file
...
start_doc parameter in positions to docids
2022-11-01 10:36:13 +08:00
PSeitz
4e46f4f8c4
Merge pull request #1649 from adamreichold/split-compound-words
...
RFC: Add dictionary-based SplitCompoundWords token filter.
2022-10-27 17:12:48 +08:00
Pascal Seitz
43df356010
rename to docset
2022-10-27 16:53:38 +08:00
PSeitz
6647362464
Merge pull request #1648 from adamreichold/stemmer-todo-alloc
...
Avoid unconditional allocation in StemmerTokenStream.
2022-10-27 16:50:41 +08:00
Pascal Seitz
279b1b28d3
switch to fx hashmap
2022-10-27 16:19:59 +08:00
PSeitz
7a80851e36
Merge pull request #1645 from quickwit-oss/ip_field_range_query
...
add ip range query benchmark, add seek behaviour
2022-10-27 16:13:52 +08:00
Adam Reichold
cd952429d2
Add dictionary-based SplitCompoundWords token filter.
2022-10-27 08:30:33 +02:00
PSeitz
d777c964da
Merge pull request #1650 from adamreichold/fnv-rustc-hash
...
Replace FNV by rustc-hash
2022-10-27 12:11:26 +08:00
Adam Reichold
bbb058d976
Replace FNV by rustc-hash
...
Both construction have similar goals but rustc-hash ist better suited for
contemporary CPU as it works one word at a time instead of byte per byte.
2022-10-27 00:35:09 +02:00
Adam Reichold
5f7d027a52
Avoid unconditional allocation in StemmerTokenStream.
...
This fixes the TODO in two ways: If the stemmer already yields an owned string,
it is used directly as the new text of the token. Otherwise, a temporary buffer
is used to copy the stemmed text (just as before) and then swapping it into the
token to reuse its existing buffer.
2022-10-26 18:11:15 +02:00
Pascal Seitz
dfab201191
for_each_docset to iterate without score
2022-10-26 17:25:05 +08:00
PSeitz
0c2bd36fe3
Panic on duplicate field names ( #1647 )
...
fixes #1601
2022-10-26 16:17:33 +09:00
Pascal Seitz
af839753e0
No score calls if score is not requested
2022-10-26 12:18:35 +08:00
Pascal Seitz
fec2b63571
improve bench by adding more blanks in compact space
2022-10-25 22:09:01 +08:00
Pascal Seitz
6213ea476a
pass positions parameter
2022-10-25 17:44:51 +08:00
Pascal Seitz
5e159c26bf
add ip range query benchmark, add seek behaviour
2022-10-25 15:57:19 +08:00
PSeitz
a5e59ab598
Merge pull request #1644 from quickwit-oss/get_val_u32
...
switch get_val() to u32
2022-10-24 19:30:03 +08:00
Pascal Seitz
e772d3170d
switch get_val() to u32
...
Fixes #1638
2022-10-24 19:05:57 +08:00
PSeitz
8c2ba7bd55
Merge pull request #1637 from quickwit-oss/ip_field_range_query
...
add range query via ip fast field
2022-10-24 18:10:47 +08:00
Pascal Seitz
02328b0151
fix proptest
2022-10-24 17:46:06 +08:00
Pascal Seitz
7cc775256c
add comments, rename
2022-10-24 17:08:37 +08:00
Pascal Seitz
07b40f8b8b
add proptest
2022-10-24 16:52:55 +08:00
PSeitz
9b6b6be5b9
Apply suggestions from code review
...
Co-authored-by: Paul Masurel <paul@quickwit.io >
2022-10-24 16:00:38 +08:00
Pascal Seitz
6bb73a527f
add range query via ip fast field
2022-10-24 16:00:38 +08:00
PSeitz
03885d0f3c
Merge pull request #1643 from quickwit-oss/range_query_parser
...
allow more characters in range query
2022-10-24 15:09:47 +08:00
Pascal Seitz
f2e5135870
allow more characters in range query
...
closes #1642
2022-10-21 18:05:15 +08:00
Paul Masurel
c24157f28b
Bumping version format. ( #1640 )
...
The docstore format has changed in a non-compatible manner.
2022-10-21 15:35:35 +09:00
PSeitz
873382cdcb
Merge pull request #1639 from quickwit-oss/num_vals_u32
...
switch num_vals() to u32
2022-10-21 12:36:50 +08:00
Pascal Seitz
791350091c
switch num_vals() to u32
...
fixes #1630
2022-10-20 19:44:28 +08:00
Paul Masurel
483b1d13d4
Added unit test for long tokens ( #1635 )
...
* Bugfix on long tokens and multivalue text fields.
Fixes a minor bug for the strong edge case
in which a tokenizer would emit tokens where
the last token does not cover the last position.
More importantly, this adds unit tests.
Closes #1634
* Update src/indexer/segment_writer.rs
Co-authored-by: PSeitz <PSeitz@users.noreply.github.com >
Co-authored-by: PSeitz <PSeitz@users.noreply.github.com >
2022-10-20 15:05:37 +09:00
PSeitz
8de7fa9d95
Merge pull request #1631 from quickwit-oss/high_positions
...
add test for phrase search on multi text field
2022-10-20 10:26:00 +08:00
Paul Masurel
94313b62f8
Hotfix issue/1629 - position broken ( #1633 )
...
* Bugfix position broken.
For Field with several FieldValues, with a
value that contained no token at all, the token position
was reinitialized to 0.
As a result, PhraseQueries can show some false positives.
In addition, after the computation of the position delta, we can
underflow u32, and end up with gigantic delta.
We haven't been able to actually explain the bug in 1629, but it
is assumed that in some corner case these delta can cause a panic.
Closes #1629
2022-10-20 11:03:55 +09:00
Pascal Seitz
f2b2628feb
add test for phrase search on multi text field
2022-10-19 16:29:56 +08:00
PSeitz
449f595832
Merge pull request #1628 from quickwit-oss/skip_index_deser
...
faster skipindex deserialization, larger blocksize on sort
2022-10-19 11:05:20 +08:00
PSeitz
c9235df059
Merge pull request #1627 from quickwit-oss/ip_field_range_query
...
add range query handling for ip via term dictionary
2022-10-19 10:53:00 +08:00
Pascal Seitz
a4485f7611
faster skipindex deserialization, larger blocksize on sort
2022-10-18 19:32:23 +08:00
Pascal Seitz
1082ff60f9
add range query handling for ip via term dictionary
...
since IPs are mapped monotonically we can use the term dictionary for range queries
2022-10-18 13:08:27 +08:00
PSeitz
491854155c
Merge pull request #1625 from quickwit-oss/index_ip_field
...
index ip field
2022-10-18 11:18:17 +08:00
Christoph Herzog
96c3d54ac7
fix: Fix power of two computation on 32bit architectures ( #1624 )
...
The current `compute_previous_power_of_two()` implementation used for
TermHashmap takes and returns `usize` , but actually only works
correclty on 64 bit architectures (aka usize == u64)
On other architectures the leading_zeros computation is run on the wrong
type (must be u64), and leads to overflows.
Fixed simply computing the leading_zeros based on a u64 value.
2022-10-18 11:55:02 +09:00
Pascal Seitz
6800fdec9d
add indexing for ip field
...
Closes #1595
2022-10-18 10:07:48 +08:00
PSeitz
c9cf9c952a
Merge pull request #1614 from quickwit-oss/remove_superfluous_steps
...
refactor Term
2022-10-17 18:25:31 +08:00