Paul Masurel
727d024a23
Bugfix position broken.
...
For Field with several FieldValues, with a
value that contained no token at all, the token position
was reinitialized to 0.
As a result, PhraseQueries can show some false positives.
In addition, after the computation of the position delta, we can
underflow u32, and end up with gigantic delta.
We haven't been able to actually explain the bug in 1629, but it
is assumed that in some corner case these delta can cause a panic.
Closes #1629
2022-10-20 10:19:41 +09:00
PSeitz
449f595832
Merge pull request #1628 from quickwit-oss/skip_index_deser
...
faster skipindex deserialization, larger blocksize on sort
2022-10-19 11:05:20 +08:00
PSeitz
c9235df059
Merge pull request #1627 from quickwit-oss/ip_field_range_query
...
add range query handling for ip via term dictionary
2022-10-19 10:53:00 +08:00
Pascal Seitz
a4485f7611
faster skipindex deserialization, larger blocksize on sort
2022-10-18 19:32:23 +08:00
Pascal Seitz
1082ff60f9
add range query handling for ip via term dictionary
...
since IPs are mapped monotonically we can use the term dictionary for range queries
2022-10-18 13:08:27 +08:00
PSeitz
491854155c
Merge pull request #1625 from quickwit-oss/index_ip_field
...
index ip field
2022-10-18 11:18:17 +08:00
Christoph Herzog
96c3d54ac7
fix: Fix power of two computation on 32bit architectures ( #1624 )
...
The current `compute_previous_power_of_two()` implementation used for
TermHashmap takes and returns `usize` , but actually only works
correclty on 64 bit architectures (aka usize == u64)
On other architectures the leading_zeros computation is run on the wrong
type (must be u64), and leads to overflows.
Fixed simply computing the leading_zeros based on a u64 value.
2022-10-18 11:55:02 +09:00
Pascal Seitz
6800fdec9d
add indexing for ip field
...
Closes #1595
2022-10-18 10:07:48 +08:00
PSeitz
c9cf9c952a
Merge pull request #1614 from quickwit-oss/remove_superfluous_steps
...
refactor Term
2022-10-17 18:25:31 +08:00
Pascal Seitz
024e53a99c
remove truncate
2022-10-17 12:14:35 +08:00
Pascal Seitz
8d75e451bd
fix truncate, remove mutable access from term
2022-10-17 12:14:35 +08:00
Pascal Seitz
fcfd76ec55
refactor Term
...
fixes some issues with Term
Remove duplicate calls to truncate or resize
Replace Magic Number 5 with constant
Enforce minimum size of 5 for metadata
Fix broken truncate docs
use constructor instead new + set calls
normalize constructor stack
replace assert on internal behavior fixes #1585
2022-10-17 12:14:34 +08:00
PSeitz
6b7b1cc4fa
Merge pull request #1623 from quickwit-oss/remove_unused_buffer
...
remove unused buffer
2022-10-14 20:36:00 +08:00
Pascal Seitz
129f7422f5
remove unused buffer
2022-10-14 20:01:10 +08:00
PSeitz
f39cce2c8b
Merge pull request #1622 from quickwit-oss/term_aggregation
...
add term aggregation clarification
2022-10-14 18:09:18 +08:00
PSeitz
d2478fac8a
Merge pull request #1621 from quickwit-oss/changelog
...
update CHANGELOG
2022-10-14 18:08:57 +08:00
Pascal Seitz
952b048341
add term aggregation clarification
2022-10-14 16:12:19 +08:00
PSeitz
80f9596ec8
Merge pull request #1611 from quickwit-oss/remove_token_stream_alloc
...
remove tokenstream vec alloc
2022-10-14 15:12:30 +08:00
Pascal Seitz
84f9e77e1d
update CHANGELOG
2022-10-14 15:10:33 +08:00
PSeitz
a602c248fb
Merge pull request #1590 from waywardmonkeys/fix-doc-warnings-quickwit
...
Fix missing doc warnings when enabling feature "quickwit".
2022-10-14 14:09:25 +08:00
PSeitz
4b9d1fe828
Merge pull request #1620 from quickwit-oss/fix_fieldnorms_indexing
...
Fix missing fieldnorm indexing
2022-10-14 13:41:38 +08:00
Pascal Seitz
63bc390b02
Fix missing fieldnorm indexing
...
Fixes broken search (no results) with BM25 for u64, i64, f64, bool, bytes and date after deletion and merge.
There were no fieldnorms recorded for those field. After merge InvertedIndexReader::total_num_tokens returns 0 (Sum over the fieldnorms is 0). BM25 does not work when total_num_tokens is 0.
Fixes #1617
2022-10-14 12:44:40 +08:00
Paul Masurel
07393c2fa0
Attempt to fix race condition in test. ( #1619 )
...
Close #1550
2022-10-14 10:56:37 +09:00
PSeitz
77a415cbe4
rename NothingRecorder to DocIdRecorder ( #1615 )
2022-10-13 15:43:40 +09:00
PSeitz
4b4c231bba
Merge pull request #1612 from quickwit-oss/no_panic_please
...
return Error instead panic in fastfields
2022-10-11 18:33:00 +08:00
PSeitz
11d3409286
add missing docs for fastfield_codecs crate ( #1613 )
...
closes #1603
2022-10-11 18:54:24 +09:00
Pascal Seitz
9cb8cfbea8
return Error instead panic in fastfields
...
fixes #1572
2022-10-11 14:15:22 +08:00
PSeitz
8b69aab0fc
avoid prepare_doc allocation ( #1610 )
...
avoid prepare_doc allocation, ~10% more thoughput best case
2022-10-11 14:15:55 +09:00
PSeitz
3650d1f36a
Merge pull request #1553 from quickwit-oss/ip_field
...
ip field
2022-10-11 13:09:47 +08:00
Pascal Seitz
2efebdb1bb
remove tokenstream vec alloc
2022-10-11 10:30:56 +08:00
François Massot
e443ca63aa
Merge pull request #1608 from quickwit-oss/nigel/serialise-bytes-as-b64-#2042
...
Serialise bytes as base64 strings instead of arrays.
2022-10-10 11:51:23 +02:00
Pascal Seitz
5c9cbee29d
handle IpV4 serialization case
2022-10-07 19:52:00 +08:00
Pascal Seitz
b2ca83a93c
switch to ipv6, add monotonic_mapping tests
2022-10-07 18:47:55 +08:00
Nigel Andrews
3b189080d4
Use raw string literals in tests
2022-10-07 12:28:25 +02:00
Nigel Andrews
00a6586efe
Replaced String::serialize for serializer.serialize_str
2022-10-07 11:55:05 +02:00
Pascal Seitz
b9b913510e
fmt
2022-10-07 16:56:19 +08:00
PSeitz
534b1d33c3
use ipv6
...
Co-authored-by: Paul Masurel <paul@quickwit.io >
2022-10-07 16:56:00 +08:00
PSeitz
f465173872
Apply suggestions from code review
...
Co-authored-by: Paul Masurel <paul@quickwit.io >
2022-10-07 16:55:53 +08:00
Pascal Seitz
96315df20d
use idx part only for positions_to_docid
2022-10-07 16:54:04 +08:00
Pascal Seitz
9a1609d364
add test
2022-10-07 16:25:01 +08:00
Pascal Seitz
39f4e58450
improve comment
2022-10-07 16:25:01 +08:00
Pascal Seitz
a8a36b62cd
enable test
2022-10-07 16:25:01 +08:00
Pascal Seitz
226a49338f
add StrictlyMonotonicFn
2022-10-07 16:25:01 +08:00
Pascal Seitz
2864bf7123
use serializer for u128
2022-10-07 16:25:01 +08:00
Pascal Seitz
5171ff611b
serialize ip as u128, add test for positions_to_docid
2022-10-07 16:25:01 +08:00
Pascal Seitz
e50e74acf8
remove u128 type
2022-10-07 16:25:01 +08:00
Pascal Seitz
0b86658389
rename ip addr, use buffer
2022-10-07 16:25:01 +08:00
Pascal Seitz
5d6602a8d9
mark null handling TODO
2022-10-07 16:25:01 +08:00
Pascal Seitz
4d29ff4d01
finalize ip addr rename
2022-10-07 16:25:01 +08:00
Pascal Seitz
cdc8e3a8be
group montonic mapping and inverse
...
fix mapping inverse
remove ip indexing
add get_between_vals test
2022-10-07 16:25:01 +08:00