Paul Masurel
f39165e1e7
Moving FileSlice to tantivy-common ( #1729 )
2022-12-21 16:35:11 +09:00
PSeitz
f9171a3981
fix clippy ( #1725 )
...
* fix clippy
* fix clippy fastfield codecs
* fix clippy bitpacker
* fix clippy common
* fix clippy stacker
* fix clippy sstable
* fmt
2022-12-20 07:30:06 +01:00
Paul Masurel
0b40a7fe43
Added a expand_dots JsonObjectOptions. ( #1687 )
...
Related with quickwit#2345.
2022-11-21 23:03:00 +09:00
Paul Masurel
2a39289a1b
Handle escaped dot in json path in the QueryParser. ( #1682 )
2022-11-16 07:18:34 +09:00
Pascal Seitz
8641155cbb
remove column from MultiValuedU128FastFieldReader
2022-11-14 18:49:15 +08:00
Pascal Seitz
b7d0dd154a
fmt
2022-11-14 14:49:15 +08:00
Pascal Seitz
e034328a8b
Improve position_to_docid, refactor, add tests
2022-11-14 14:21:53 +08:00
Pascal Seitz
fb9f03118d
switch total_num_val to u32
2022-11-11 17:35:52 +08:00
Paul Masurel
3edf0a2724
Using the manual reload policy in IndexWriter. ( #1667 )
2022-11-09 11:20:41 +01:00
PSeitz
3e9c806890
Merge pull request #1665 from quickwit-oss/fix_num_vals
...
fix num_vals on u128 value index after merge
2022-11-07 21:46:02 +08:00
Pascal Seitz
c69a873dd3
fix num_vals on value index after merge
2022-11-07 21:05:21 +08:00
Pascal Seitz
38ad46e580
fix clippy
2022-11-07 16:09:55 +08:00
Pascal Seitz
6e636c9cea
fix num_vals in multivalue index after merge
2022-11-07 15:00:52 +08:00
PSeitz
5b2cea1b97
Merge pull request #1656 from quickwit-oss/multival_offset_index
...
move multivalue index to own file
2022-11-02 14:03:06 +08:00
PSeitz
0f98d91a39
Merge pull request #1646 from quickwit-oss/no_score_calls
...
No score calls if score is not requested
2022-11-01 20:09:32 +08:00
Pascal Seitz
83325d8f3f
move multivalue index to own file
...
start_doc parameter in positions to docids
2022-11-01 10:36:13 +08:00
Adam Reichold
bbb058d976
Replace FNV by rustc-hash
...
Both construction have similar goals but rustc-hash ist better suited for
contemporary CPU as it works one word at a time instead of byte per byte.
2022-10-27 00:35:09 +02:00
Pascal Seitz
dfab201191
for_each_docset to iterate without score
2022-10-26 17:25:05 +08:00
Pascal Seitz
af839753e0
No score calls if score is not requested
2022-10-26 12:18:35 +08:00
Pascal Seitz
e772d3170d
switch get_val() to u32
...
Fixes #1638
2022-10-24 19:05:57 +08:00
Pascal Seitz
791350091c
switch num_vals() to u32
...
fixes #1630
2022-10-20 19:44:28 +08:00
Paul Masurel
483b1d13d4
Added unit test for long tokens ( #1635 )
...
* Bugfix on long tokens and multivalue text fields.
Fixes a minor bug for the strong edge case
in which a tokenizer would emit tokens where
the last token does not cover the last position.
More importantly, this adds unit tests.
Closes #1634
* Update src/indexer/segment_writer.rs
Co-authored-by: PSeitz <PSeitz@users.noreply.github.com >
Co-authored-by: PSeitz <PSeitz@users.noreply.github.com >
2022-10-20 15:05:37 +09:00
PSeitz
8de7fa9d95
Merge pull request #1631 from quickwit-oss/high_positions
...
add test for phrase search on multi text field
2022-10-20 10:26:00 +08:00
Paul Masurel
94313b62f8
Hotfix issue/1629 - position broken ( #1633 )
...
* Bugfix position broken.
For Field with several FieldValues, with a
value that contained no token at all, the token position
was reinitialized to 0.
As a result, PhraseQueries can show some false positives.
In addition, after the computation of the position delta, we can
underflow u32, and end up with gigantic delta.
We haven't been able to actually explain the bug in 1629, but it
is assumed that in some corner case these delta can cause a panic.
Closes #1629
2022-10-20 11:03:55 +09:00
Pascal Seitz
f2b2628feb
add test for phrase search on multi text field
2022-10-19 16:29:56 +08:00
Pascal Seitz
a4485f7611
faster skipindex deserialization, larger blocksize on sort
2022-10-18 19:32:23 +08:00
Pascal Seitz
6800fdec9d
add indexing for ip field
...
Closes #1595
2022-10-18 10:07:48 +08:00
Pascal Seitz
8d75e451bd
fix truncate, remove mutable access from term
2022-10-17 12:14:35 +08:00
Pascal Seitz
fcfd76ec55
refactor Term
...
fixes some issues with Term
Remove duplicate calls to truncate or resize
Replace Magic Number 5 with constant
Enforce minimum size of 5 for metadata
Fix broken truncate docs
use constructor instead new + set calls
normalize constructor stack
replace assert on internal behavior fixes #1585
2022-10-17 12:14:34 +08:00
PSeitz
80f9596ec8
Merge pull request #1611 from quickwit-oss/remove_token_stream_alloc
...
remove tokenstream vec alloc
2022-10-14 15:12:30 +08:00
Pascal Seitz
63bc390b02
Fix missing fieldnorm indexing
...
Fixes broken search (no results) with BM25 for u64, i64, f64, bool, bytes and date after deletion and merge.
There were no fieldnorms recorded for those field. After merge InvertedIndexReader::total_num_tokens returns 0 (Sum over the fieldnorms is 0). BM25 does not work when total_num_tokens is 0.
Fixes #1617
2022-10-14 12:44:40 +08:00
Pascal Seitz
9cb8cfbea8
return Error instead panic in fastfields
...
fixes #1572
2022-10-11 14:15:22 +08:00
PSeitz
8b69aab0fc
avoid prepare_doc allocation ( #1610 )
...
avoid prepare_doc allocation, ~10% more thoughput best case
2022-10-11 14:15:55 +09:00
Pascal Seitz
2efebdb1bb
remove tokenstream vec alloc
2022-10-11 10:30:56 +08:00
Pascal Seitz
b2ca83a93c
switch to ipv6, add monotonic_mapping tests
2022-10-07 18:47:55 +08:00
Pascal Seitz
2864bf7123
use serializer for u128
2022-10-07 16:25:01 +08:00
Pascal Seitz
0b86658389
rename ip addr, use buffer
2022-10-07 16:25:01 +08:00
Pascal Seitz
5d6602a8d9
mark null handling TODO
2022-10-07 16:25:01 +08:00
Pascal Seitz
4d29ff4d01
finalize ip addr rename
2022-10-07 16:25:01 +08:00
Pascal Seitz
cdc8e3a8be
group montonic mapping and inverse
...
fix mapping inverse
remove ip indexing
add get_between_vals test
2022-10-07 16:25:01 +08:00
Pascal Seitz
eeb1f19093
rename to iter_gen
2022-10-07 16:25:01 +08:00
Pascal Seitz
309449dba3
rename to IpAddr
2022-10-07 16:25:01 +08:00
Pascal Seitz
c8713a01ed
use iter api
2022-10-07 16:25:01 +08:00
Pascal Seitz
400a20b7af
add ip field
...
add u128 multivalue reader and writer
add ip to schema
add ip writers, handle merge
2022-10-07 16:25:01 +08:00
Pascal Seitz
d742275048
renames
2022-10-05 19:16:49 +08:00
Pascal Seitz
8b42c4c126
disable linear codec for multivalue value index
...
don't materialize index column on merge
use simpler chain() variant
2022-10-05 19:09:17 +08:00
PSeitz
7905965800
Merge pull request #1594 from quickwit-oss/flat_map_with_buffer
...
Removing alloc on all .next() in MultiValueColumn
2022-10-05 18:34:15 +08:00
Pascal Seitz
f60a551890
add flat_map_with_buffer to Iterator trait
2022-10-05 17:44:26 +08:00
Paul Masurel
7baa6e3ec5
Removing alloc on all .next() in MultiValueColumn
2022-10-05 17:12:06 +09:00
Bruce Mitchener
b3bf9a5716
Documentation improvements.
2022-10-05 14:18:10 +07:00