Commit Graph

658 Commits

Author SHA1 Message Date
Paul Masurel
f39165e1e7 Moving FileSlice to tantivy-common (#1729) 2022-12-21 16:35:11 +09:00
PSeitz
f9171a3981 fix clippy (#1725)
* fix clippy

* fix clippy fastfield codecs

* fix clippy bitpacker

* fix clippy common

* fix clippy stacker

* fix clippy sstable

* fmt
2022-12-20 07:30:06 +01:00
Paul Masurel
0b40a7fe43 Added a expand_dots JsonObjectOptions. (#1687)
Related with quickwit#2345.
2022-11-21 23:03:00 +09:00
Paul Masurel
2a39289a1b Handle escaped dot in json path in the QueryParser. (#1682) 2022-11-16 07:18:34 +09:00
Pascal Seitz
8641155cbb remove column from MultiValuedU128FastFieldReader 2022-11-14 18:49:15 +08:00
Pascal Seitz
b7d0dd154a fmt 2022-11-14 14:49:15 +08:00
Pascal Seitz
e034328a8b Improve position_to_docid, refactor, add tests 2022-11-14 14:21:53 +08:00
Pascal Seitz
fb9f03118d switch total_num_val to u32 2022-11-11 17:35:52 +08:00
Paul Masurel
3edf0a2724 Using the manual reload policy in IndexWriter. (#1667) 2022-11-09 11:20:41 +01:00
PSeitz
3e9c806890 Merge pull request #1665 from quickwit-oss/fix_num_vals
fix num_vals on u128 value index after merge
2022-11-07 21:46:02 +08:00
Pascal Seitz
c69a873dd3 fix num_vals on value index after merge 2022-11-07 21:05:21 +08:00
Pascal Seitz
38ad46e580 fix clippy 2022-11-07 16:09:55 +08:00
Pascal Seitz
6e636c9cea fix num_vals in multivalue index after merge 2022-11-07 15:00:52 +08:00
PSeitz
5b2cea1b97 Merge pull request #1656 from quickwit-oss/multival_offset_index
move multivalue index to own file
2022-11-02 14:03:06 +08:00
PSeitz
0f98d91a39 Merge pull request #1646 from quickwit-oss/no_score_calls
No score calls if score is not requested
2022-11-01 20:09:32 +08:00
Pascal Seitz
83325d8f3f move multivalue index to own file
start_doc parameter in positions to docids
2022-11-01 10:36:13 +08:00
Adam Reichold
bbb058d976 Replace FNV by rustc-hash
Both construction have similar goals but rustc-hash ist better suited for
contemporary CPU as it works one word at a time instead of byte per byte.
2022-10-27 00:35:09 +02:00
Pascal Seitz
dfab201191 for_each_docset to iterate without score 2022-10-26 17:25:05 +08:00
Pascal Seitz
af839753e0 No score calls if score is not requested 2022-10-26 12:18:35 +08:00
Pascal Seitz
e772d3170d switch get_val() to u32
Fixes #1638
2022-10-24 19:05:57 +08:00
Pascal Seitz
791350091c switch num_vals() to u32
fixes #1630
2022-10-20 19:44:28 +08:00
Paul Masurel
483b1d13d4 Added unit test for long tokens (#1635)
* Bugfix on long tokens and multivalue text fields.

Fixes a minor bug for the strong edge case
in which a tokenizer would emit tokens where
the last token does not cover the last position.

More importantly, this adds unit tests.

Closes #1634

* Update src/indexer/segment_writer.rs

Co-authored-by: PSeitz <PSeitz@users.noreply.github.com>

Co-authored-by: PSeitz <PSeitz@users.noreply.github.com>
2022-10-20 15:05:37 +09:00
PSeitz
8de7fa9d95 Merge pull request #1631 from quickwit-oss/high_positions
add test for phrase search on multi text field
2022-10-20 10:26:00 +08:00
Paul Masurel
94313b62f8 Hotfix issue/1629 - position broken (#1633)
* Bugfix position broken.

For Field with several FieldValues, with a
value that contained no token at all, the token position
was reinitialized to 0.

As a result, PhraseQueries can show some false positives.
In addition, after the computation of the position delta, we can
underflow u32, and end up with gigantic delta.

We haven't been able to actually explain the bug in 1629, but it
is assumed that in some corner case these delta can cause a panic.

Closes #1629
2022-10-20 11:03:55 +09:00
Pascal Seitz
f2b2628feb add test for phrase search on multi text field 2022-10-19 16:29:56 +08:00
Pascal Seitz
a4485f7611 faster skipindex deserialization, larger blocksize on sort 2022-10-18 19:32:23 +08:00
Pascal Seitz
6800fdec9d add indexing for ip field
Closes #1595
2022-10-18 10:07:48 +08:00
Pascal Seitz
8d75e451bd fix truncate, remove mutable access from term 2022-10-17 12:14:35 +08:00
Pascal Seitz
fcfd76ec55 refactor Term
fixes some issues with Term
Remove duplicate calls to truncate or resize
Replace Magic Number 5 with constant
Enforce minimum size of 5 for metadata
Fix broken truncate docs
use constructor instead new + set calls
normalize constructor stack
replace assert on internal behavior fixes #1585
2022-10-17 12:14:34 +08:00
PSeitz
80f9596ec8 Merge pull request #1611 from quickwit-oss/remove_token_stream_alloc
remove tokenstream vec alloc
2022-10-14 15:12:30 +08:00
Pascal Seitz
63bc390b02 Fix missing fieldnorm indexing
Fixes broken search (no results) with BM25 for u64, i64, f64, bool, bytes and date after deletion and merge.
There were no fieldnorms recorded for those field. After merge InvertedIndexReader::total_num_tokens returns 0 (Sum over the fieldnorms is 0). BM25 does not work when total_num_tokens is 0.
Fixes #1617
2022-10-14 12:44:40 +08:00
Pascal Seitz
9cb8cfbea8 return Error instead panic in fastfields
fixes #1572
2022-10-11 14:15:22 +08:00
PSeitz
8b69aab0fc avoid prepare_doc allocation (#1610)
avoid prepare_doc allocation, ~10% more thoughput best case
2022-10-11 14:15:55 +09:00
Pascal Seitz
2efebdb1bb remove tokenstream vec alloc 2022-10-11 10:30:56 +08:00
Pascal Seitz
b2ca83a93c switch to ipv6, add monotonic_mapping tests 2022-10-07 18:47:55 +08:00
Pascal Seitz
2864bf7123 use serializer for u128 2022-10-07 16:25:01 +08:00
Pascal Seitz
0b86658389 rename ip addr, use buffer 2022-10-07 16:25:01 +08:00
Pascal Seitz
5d6602a8d9 mark null handling TODO 2022-10-07 16:25:01 +08:00
Pascal Seitz
4d29ff4d01 finalize ip addr rename 2022-10-07 16:25:01 +08:00
Pascal Seitz
cdc8e3a8be group montonic mapping and inverse
fix mapping inverse
remove ip indexing
add get_between_vals test
2022-10-07 16:25:01 +08:00
Pascal Seitz
eeb1f19093 rename to iter_gen 2022-10-07 16:25:01 +08:00
Pascal Seitz
309449dba3 rename to IpAddr 2022-10-07 16:25:01 +08:00
Pascal Seitz
c8713a01ed use iter api 2022-10-07 16:25:01 +08:00
Pascal Seitz
400a20b7af add ip field
add u128 multivalue reader and writer
add ip to schema
add ip writers, handle merge
2022-10-07 16:25:01 +08:00
Pascal Seitz
d742275048 renames 2022-10-05 19:16:49 +08:00
Pascal Seitz
8b42c4c126 disable linear codec for multivalue value index
don't materialize index column on merge
use simpler chain() variant
2022-10-05 19:09:17 +08:00
PSeitz
7905965800 Merge pull request #1594 from quickwit-oss/flat_map_with_buffer
Removing alloc on all .next() in MultiValueColumn
2022-10-05 18:34:15 +08:00
Pascal Seitz
f60a551890 add flat_map_with_buffer to Iterator trait 2022-10-05 17:44:26 +08:00
Paul Masurel
7baa6e3ec5 Removing alloc on all .next() in MultiValueColumn 2022-10-05 17:12:06 +09:00
Bruce Mitchener
b3bf9a5716 Documentation improvements. 2022-10-05 14:18:10 +07:00