Paul Masurel
ffe4446d90
Minor lint comments ( #1166 )
2021-10-06 11:27:48 +09:00
Paul Masurel
0855649986
Leaning more on the alive (vs delete) semantics. ( #1164 )
2021-10-05 18:53:29 +09:00
Pascal Seitz
aa0396fe27
fix variable names
2021-10-01 13:48:51 +08:00
Pascal Seitz
8d8315f8d0
prealloc vec in postinglist
2021-09-29 09:02:38 +08:00
Pascal Seitz
078c0a2e2e
reserve vec
2021-09-29 08:45:04 +08:00
Pascal Seitz
f21e8dd875
use only segment ordinal in docidmapping
2021-09-29 08:44:56 +08:00
Tomoko Uchida
74e36c7e97
Add unit tests for tokenizers and filters ( #1156 )
...
* add unit test for SimpleTokenizer
* add unit tests for tokenizers and filters.
2021-09-27 10:22:01 +09:00
PSeitz
0ce49c9dd4
use lz4_flex 0.9.0 ( #1160 )
2021-09-27 10:12:20 +09:00
PSeitz
fe8e58e078
Merge pull request #1154 from PSeitz/delete_bitset
...
add DeleteBitSet iterator
2021-09-24 09:37:39 +02:00
Pascal Seitz
22bcc83d10
fix padding in initialization
2021-09-24 14:43:04 +08:00
Pascal Seitz
5ee5037934
create and use ReadSerializedBitSet
2021-09-24 12:53:33 +08:00
Pascal Seitz
c217bfed1e
cargo fmt
2021-09-23 21:02:19 +08:00
Pascal Seitz
c27ccd3e24
improve naming
2021-09-23 21:02:09 +08:00
Paul Masurel
367f5da782
Fixed comment to the index accessor
2021-09-23 21:53:48 +09:00
Mestery
b256df6599
add index accessor for index writer ( #1159 )
...
* add index accessor for index writer
* Update src/indexer/index_writer.rs
Co-authored-by: Paul Masurel <paul@quickwit.io >
2021-09-23 21:49:20 +09:00
Pascal Seitz
d7a6a409a1
renames
2021-09-23 20:33:11 +08:00
Pascal Seitz
a1f5cead96
AliveBitSet instead of DeleteBitSet
2021-09-23 20:03:57 +08:00
Pascal Seitz
4583fa270b
fixes
2021-09-23 10:39:53 +08:00
Pascal Seitz
93cbd52bf0
move code to biset, add inline, add benchmark
2021-09-18 17:35:22 +08:00
Pascal Seitz
c22177a005
add iterator
2021-09-17 15:29:27 +08:00
Pascal Seitz
4da71273e1
add de/serialization for bitset
...
remove len footgun
2021-09-17 10:28:12 +08:00
Pascal Seitz
4ae1d87632
add DeleteBitSet iterator
2021-09-15 23:10:04 +08:00
PSeitz
3bc177e69d
fix #1151 ( #1152 )
...
* fix #1151
Fixes a off by one error in the stats for the index fast field in the multi value fast field.
When retrieving the data range for a docid, `get(doc)..get(docid+1)` is requested. On creation
the num_vals statistic was set to doc instead of docid + 1. In the multivaluelinearinterpol fast
field the last value was therefore not serialized (and would return 0 instead in most cases).
So the last document get(lastdoc)..get(lastdoc + 1) would return the invalid range `value..0`.
This PR adds a proptest to cover this scenario. A combination of a large number values, since multilinear
interpolation is only active for more than 5_000 values, and a merge is required.
2021-09-10 23:00:37 +09:00
Kanji Yomoda
9d87b89718
Fix incorrect comment for Index::create_in_dir ( #1148 )
...
* Fix incorrect comment for Index::create_in_dir
2021-09-03 10:37:16 +09:00
Tomoko Uchida
dd81e38e53
Add WhitespaceTokenizer ( #1147 )
...
* Add WhitespaceTokenizer.
2021-08-29 18:20:49 +09:00
sigaloid
096ce7488e
Resolve some clippys, format ( #1144 )
...
* cargo +nightly clippy --fix -Z unstable-options
2021-08-26 08:46:00 +09:00
Pascal Seitz
e0b83eb291
cargo fmt
2021-08-21 18:52:10 +01:00
PSeitz
13401f46ea
add wildcard mention
2021-08-21 18:10:33 +01:00
Pascal Seitz
62052bcc2d
add missing test function
...
closes #1139
2021-08-20 07:26:22 +01:00
Pascal Seitz
3265f7bec3
dissolve common module
2021-08-19 23:26:34 +01:00
Pascal Seitz
ee0881712a
move bitset to common crate, move composite file to directory
2021-08-19 17:45:09 +01:00
Paul Masurel
750f6e6479
Removed obsolete unit test ( #1138 )
2021-08-19 10:07:49 +09:00
Evance Soumaoro
5b475e6603
Checksum validation using active files ( #1130 )
...
* now validate checksum uses segment files not managed files
2021-08-19 10:03:20 +09:00
Pascal Seitz
dc141cdb29
more docs detail
...
remove code duplicate
2021-08-13 17:40:13 +01:00
Pascal Seitz
f379a80233
test doc_freq and term_freq in sorted index
2021-08-03 11:38:05 +01:00
PSeitz
4a320fd1ff
fix delta position in merge and index sorting ( #1132 )
...
fixes #1125
2021-08-03 18:06:36 +09:00
Pascal Seitz
605e8603dc
add positions to long running test
2021-08-02 15:29:49 +01:00
Pascal Seitz
70f160b329
add long running test in ci
2021-08-02 11:35:39 +01:00
PSeitz
fdc512391b
Merge pull request #1128 from tantivy-search/merge_overflow
...
add sort to functional test, add env for iterations
2021-08-02 10:29:16 +01:00
Pascal Seitz
108714c934
add sort to functional test, add env for iterations
2021-08-02 10:11:17 +01:00
Paul Masurel
44e8cf98a5
Cargo fmt
2021-07-30 15:30:01 +09:00
Paul Masurel
f0ee69d9e9
Remove the complicated block search logic for a simpler branchless ( #1124 )
...
binary search
The code is simpler and faster.
Before
test postings::bench::bench_segment_intersection ... bench: 2,093,697 ns/iter (+/- 115,509)
test postings::bench::bench_skip_next_p01 ... bench: 58,585 ns/iter (+/- 796)
test postings::bench::bench_skip_next_p1 ... bench: 160,872 ns/iter (+/- 5,164)
test postings::bench::bench_skip_next_p10 ... bench: 615,229 ns/iter (+/- 25,108)
test postings::bench::bench_skip_next_p90 ... bench: 1,120,509 ns/iter (+/- 22,271)
After
test postings::bench::bench_segment_intersection ... bench: 1,747,726 ns/iter (+/- 52,867)
test postings::bench::bench_skip_next_p01 ... bench: 55,205 ns/iter (+/- 714)
test postings::bench::bench_skip_next_p1 ... bench: 131,433 ns/iter (+/- 2,814)
test postings::bench::bench_skip_next_p10 ... bench: 478,830 ns/iter (+/- 12,794)
test postings::bench::bench_skip_next_p90 ... bench: 931,082 ns/iter (+/- 31,468)
2021-07-30 14:38:42 +09:00
Evance Soumaoro
b8a10c8406
switched to memmap2-rs ( #1120 )
2021-07-27 18:40:41 +09:00
François Massot
1db76dd9cf
Merge pull request #1113 from shikhar/patch-1
...
stale comments in segment_reader.rs
2021-07-20 23:02:20 +02:00
Shikhar Bhushan
b361315a67
FilterCollector doc fix
...
Other types supported since https://github.com/tantivy-search/tantivy/pull/953/files
2021-07-15 22:55:47 -04:00
Shikhar Bhushan
4e3771bffc
stale comments in segment_reader.rs
2021-07-15 22:47:32 -04:00
PSeitz
8176b0335a
Merge pull request #1108 from PSeitz/pwnedbytes
...
move ownedbytes to own crate
2021-07-05 16:07:56 +02:00
François Massot
f4b2e71800
Handle field names with any characters with a known set of special ( #1109 )
...
* Handle field names with any characters with a known set of special characters and an escape one
* Update field name validation rule to check only if it has at least one character and does not start with `-`
Closes #1087 .
2021-07-05 22:31:36 +09:00
PSeitz
c431cfcf12
extend proptests, fix race condition ( #1107 )
...
* extend proptests, fix race condition
* cargo fmt
2021-07-05 18:28:56 +09:00
Pascal Seitz
9b662e6d03
move ownedbytes to own crate
...
fixes #1106
2021-07-02 16:51:59 +02:00