Commit Graph

2208 Commits

Author SHA1 Message Date
Pascal Seitz
791350091c switch num_vals() to u32
fixes #1630
2022-10-20 19:44:28 +08:00
Paul Masurel
483b1d13d4 Added unit test for long tokens (#1635)
* Bugfix on long tokens and multivalue text fields.

Fixes a minor bug for the strong edge case
in which a tokenizer would emit tokens where
the last token does not cover the last position.

More importantly, this adds unit tests.

Closes #1634

* Update src/indexer/segment_writer.rs

Co-authored-by: PSeitz <PSeitz@users.noreply.github.com>

Co-authored-by: PSeitz <PSeitz@users.noreply.github.com>
2022-10-20 15:05:37 +09:00
PSeitz
8de7fa9d95 Merge pull request #1631 from quickwit-oss/high_positions
add test for phrase search on multi text field
2022-10-20 10:26:00 +08:00
Paul Masurel
94313b62f8 Hotfix issue/1629 - position broken (#1633)
* Bugfix position broken.

For Field with several FieldValues, with a
value that contained no token at all, the token position
was reinitialized to 0.

As a result, PhraseQueries can show some false positives.
In addition, after the computation of the position delta, we can
underflow u32, and end up with gigantic delta.

We haven't been able to actually explain the bug in 1629, but it
is assumed that in some corner case these delta can cause a panic.

Closes #1629
2022-10-20 11:03:55 +09:00
Pascal Seitz
f2b2628feb add test for phrase search on multi text field 2022-10-19 16:29:56 +08:00
PSeitz
449f595832 Merge pull request #1628 from quickwit-oss/skip_index_deser
faster skipindex deserialization, larger blocksize on sort
2022-10-19 11:05:20 +08:00
Pascal Seitz
a4485f7611 faster skipindex deserialization, larger blocksize on sort 2022-10-18 19:32:23 +08:00
Pascal Seitz
1082ff60f9 add range query handling for ip via term dictionary
since IPs are mapped monotonically we can use the term dictionary for range queries
2022-10-18 13:08:27 +08:00
PSeitz
491854155c Merge pull request #1625 from quickwit-oss/index_ip_field
index ip field
2022-10-18 11:18:17 +08:00
Christoph Herzog
96c3d54ac7 fix: Fix power of two computation on 32bit architectures (#1624)
The current `compute_previous_power_of_two()` implementation used for
TermHashmap takes and returns `usize` , but actually only works
correclty on 64 bit architectures (aka usize == u64)

On other architectures the leading_zeros computation is run on the wrong
type (must be u64), and leads to overflows.

Fixed simply computing the leading_zeros based on a u64 value.
2022-10-18 11:55:02 +09:00
Pascal Seitz
6800fdec9d add indexing for ip field
Closes #1595
2022-10-18 10:07:48 +08:00
Pascal Seitz
024e53a99c remove truncate 2022-10-17 12:14:35 +08:00
Pascal Seitz
8d75e451bd fix truncate, remove mutable access from term 2022-10-17 12:14:35 +08:00
Pascal Seitz
fcfd76ec55 refactor Term
fixes some issues with Term
Remove duplicate calls to truncate or resize
Replace Magic Number 5 with constant
Enforce minimum size of 5 for metadata
Fix broken truncate docs
use constructor instead new + set calls
normalize constructor stack
replace assert on internal behavior fixes #1585
2022-10-17 12:14:34 +08:00
Pascal Seitz
129f7422f5 remove unused buffer 2022-10-14 20:01:10 +08:00
PSeitz
f39cce2c8b Merge pull request #1622 from quickwit-oss/term_aggregation
add term aggregation clarification
2022-10-14 18:09:18 +08:00
Pascal Seitz
952b048341 add term aggregation clarification 2022-10-14 16:12:19 +08:00
PSeitz
80f9596ec8 Merge pull request #1611 from quickwit-oss/remove_token_stream_alloc
remove tokenstream vec alloc
2022-10-14 15:12:30 +08:00
PSeitz
a602c248fb Merge pull request #1590 from waywardmonkeys/fix-doc-warnings-quickwit
Fix missing doc warnings when enabling feature "quickwit".
2022-10-14 14:09:25 +08:00
PSeitz
4b9d1fe828 Merge pull request #1620 from quickwit-oss/fix_fieldnorms_indexing
Fix missing fieldnorm indexing
2022-10-14 13:41:38 +08:00
Pascal Seitz
63bc390b02 Fix missing fieldnorm indexing
Fixes broken search (no results) with BM25 for u64, i64, f64, bool, bytes and date after deletion and merge.
There were no fieldnorms recorded for those field. After merge InvertedIndexReader::total_num_tokens returns 0 (Sum over the fieldnorms is 0). BM25 does not work when total_num_tokens is 0.
Fixes #1617
2022-10-14 12:44:40 +08:00
Paul Masurel
07393c2fa0 Attempt to fix race condition in test. (#1619)
Close #1550
2022-10-14 10:56:37 +09:00
PSeitz
77a415cbe4 rename NothingRecorder to DocIdRecorder (#1615) 2022-10-13 15:43:40 +09:00
Pascal Seitz
9cb8cfbea8 return Error instead panic in fastfields
fixes #1572
2022-10-11 14:15:22 +08:00
PSeitz
8b69aab0fc avoid prepare_doc allocation (#1610)
avoid prepare_doc allocation, ~10% more thoughput best case
2022-10-11 14:15:55 +09:00
PSeitz
3650d1f36a Merge pull request #1553 from quickwit-oss/ip_field
ip field
2022-10-11 13:09:47 +08:00
Pascal Seitz
2efebdb1bb remove tokenstream vec alloc 2022-10-11 10:30:56 +08:00
François Massot
e443ca63aa Merge pull request #1608 from quickwit-oss/nigel/serialise-bytes-as-b64-#2042
Serialise bytes as base64 strings instead of arrays.
2022-10-10 11:51:23 +02:00
Pascal Seitz
5c9cbee29d handle IpV4 serialization case 2022-10-07 19:52:00 +08:00
Pascal Seitz
b2ca83a93c switch to ipv6, add monotonic_mapping tests 2022-10-07 18:47:55 +08:00
Nigel Andrews
3b189080d4 Use raw string literals in tests 2022-10-07 12:28:25 +02:00
Nigel Andrews
00a6586efe Replaced String::serialize for serializer.serialize_str 2022-10-07 11:55:05 +02:00
PSeitz
534b1d33c3 use ipv6
Co-authored-by: Paul Masurel <paul@quickwit.io>
2022-10-07 16:56:00 +08:00
PSeitz
f465173872 Apply suggestions from code review
Co-authored-by: Paul Masurel <paul@quickwit.io>
2022-10-07 16:55:53 +08:00
Pascal Seitz
96315df20d use idx part only for positions_to_docid 2022-10-07 16:54:04 +08:00
Pascal Seitz
9a1609d364 add test 2022-10-07 16:25:01 +08:00
Pascal Seitz
2864bf7123 use serializer for u128 2022-10-07 16:25:01 +08:00
Pascal Seitz
5171ff611b serialize ip as u128, add test for positions_to_docid 2022-10-07 16:25:01 +08:00
Pascal Seitz
e50e74acf8 remove u128 type 2022-10-07 16:25:01 +08:00
Pascal Seitz
0b86658389 rename ip addr, use buffer 2022-10-07 16:25:01 +08:00
Pascal Seitz
5d6602a8d9 mark null handling TODO 2022-10-07 16:25:01 +08:00
Pascal Seitz
4d29ff4d01 finalize ip addr rename 2022-10-07 16:25:01 +08:00
Pascal Seitz
cdc8e3a8be group montonic mapping and inverse
fix mapping inverse
remove ip indexing
add get_between_vals test
2022-10-07 16:25:01 +08:00
Pascal Seitz
787a37bacf expect instead of unwrap 2022-10-07 16:25:01 +08:00
Pascal Seitz
f5039f1846 remove roaring 2022-10-07 16:25:01 +08:00
Pascal Seitz
eeb1f19093 rename to iter_gen 2022-10-07 16:25:01 +08:00
Pascal Seitz
087beaf328 remove null handling 2022-10-07 16:25:01 +08:00
Pascal Seitz
309449dba3 rename to IpAddr 2022-10-07 16:25:01 +08:00
Pascal Seitz
c8713a01ed use iter api 2022-10-07 16:25:01 +08:00
Pascal Seitz
6113e0408c remove comment 2022-10-07 16:25:01 +08:00