trinity-1686a
bcff3eb2d2
try with custom Cow<str>
2023-01-11 16:02:52 +01:00
trinity-1686a
85f2588875
implement add_borrowed_values on Document
2022-12-23 16:16:22 +01:00
trinity-1686a
db6cf65d53
make Document support Yoked inner values
2022-12-22 17:52:53 +01:00
trinity-1686a
654aa7f42c
allow Value to borrow
2022-12-22 15:43:13 +01:00
François Massot
951a898633
Update bench.
2022-10-30 14:12:07 +01:00
François Massot
003722d831
Add bench to reproduce performance drop on array of texts.
2022-10-29 02:54:07 +02:00
PSeitz
4e46f4f8c4
Merge pull request #1649 from adamreichold/split-compound-words
...
RFC: Add dictionary-based SplitCompoundWords token filter.
2022-10-27 17:12:48 +08:00
PSeitz
6647362464
Merge pull request #1648 from adamreichold/stemmer-todo-alloc
...
Avoid unconditional allocation in StemmerTokenStream.
2022-10-27 16:50:41 +08:00
PSeitz
7a80851e36
Merge pull request #1645 from quickwit-oss/ip_field_range_query
...
add ip range query benchmark, add seek behaviour
2022-10-27 16:13:52 +08:00
Adam Reichold
cd952429d2
Add dictionary-based SplitCompoundWords token filter.
2022-10-27 08:30:33 +02:00
PSeitz
d777c964da
Merge pull request #1650 from adamreichold/fnv-rustc-hash
...
Replace FNV by rustc-hash
2022-10-27 12:11:26 +08:00
Adam Reichold
bbb058d976
Replace FNV by rustc-hash
...
Both construction have similar goals but rustc-hash ist better suited for
contemporary CPU as it works one word at a time instead of byte per byte.
2022-10-27 00:35:09 +02:00
Adam Reichold
5f7d027a52
Avoid unconditional allocation in StemmerTokenStream.
...
This fixes the TODO in two ways: If the stemmer already yields an owned string,
it is used directly as the new text of the token. Otherwise, a temporary buffer
is used to copy the stemmed text (just as before) and then swapping it into the
token to reuse its existing buffer.
2022-10-26 18:11:15 +02:00
PSeitz
0c2bd36fe3
Panic on duplicate field names ( #1647 )
...
fixes #1601
2022-10-26 16:17:33 +09:00
Pascal Seitz
fec2b63571
improve bench by adding more blanks in compact space
2022-10-25 22:09:01 +08:00
Pascal Seitz
6213ea476a
pass positions parameter
2022-10-25 17:44:51 +08:00
Pascal Seitz
5e159c26bf
add ip range query benchmark, add seek behaviour
2022-10-25 15:57:19 +08:00
PSeitz
a5e59ab598
Merge pull request #1644 from quickwit-oss/get_val_u32
...
switch get_val() to u32
2022-10-24 19:30:03 +08:00
Pascal Seitz
e772d3170d
switch get_val() to u32
...
Fixes #1638
2022-10-24 19:05:57 +08:00
PSeitz
8c2ba7bd55
Merge pull request #1637 from quickwit-oss/ip_field_range_query
...
add range query via ip fast field
2022-10-24 18:10:47 +08:00
Pascal Seitz
02328b0151
fix proptest
2022-10-24 17:46:06 +08:00
Pascal Seitz
7cc775256c
add comments, rename
2022-10-24 17:08:37 +08:00
Pascal Seitz
07b40f8b8b
add proptest
2022-10-24 16:52:55 +08:00
PSeitz
9b6b6be5b9
Apply suggestions from code review
...
Co-authored-by: Paul Masurel <paul@quickwit.io >
2022-10-24 16:00:38 +08:00
Pascal Seitz
6bb73a527f
add range query via ip fast field
2022-10-24 16:00:38 +08:00
PSeitz
03885d0f3c
Merge pull request #1643 from quickwit-oss/range_query_parser
...
allow more characters in range query
2022-10-24 15:09:47 +08:00
Pascal Seitz
f2e5135870
allow more characters in range query
...
closes #1642
2022-10-21 18:05:15 +08:00
Paul Masurel
c24157f28b
Bumping version format. ( #1640 )
...
The docstore format has changed in a non-compatible manner.
2022-10-21 15:35:35 +09:00
PSeitz
873382cdcb
Merge pull request #1639 from quickwit-oss/num_vals_u32
...
switch num_vals() to u32
2022-10-21 12:36:50 +08:00
Pascal Seitz
791350091c
switch num_vals() to u32
...
fixes #1630
2022-10-20 19:44:28 +08:00
Paul Masurel
483b1d13d4
Added unit test for long tokens ( #1635 )
...
* Bugfix on long tokens and multivalue text fields.
Fixes a minor bug for the strong edge case
in which a tokenizer would emit tokens where
the last token does not cover the last position.
More importantly, this adds unit tests.
Closes #1634
* Update src/indexer/segment_writer.rs
Co-authored-by: PSeitz <PSeitz@users.noreply.github.com >
Co-authored-by: PSeitz <PSeitz@users.noreply.github.com >
2022-10-20 15:05:37 +09:00
PSeitz
8de7fa9d95
Merge pull request #1631 from quickwit-oss/high_positions
...
add test for phrase search on multi text field
2022-10-20 10:26:00 +08:00
Paul Masurel
94313b62f8
Hotfix issue/1629 - position broken ( #1633 )
...
* Bugfix position broken.
For Field with several FieldValues, with a
value that contained no token at all, the token position
was reinitialized to 0.
As a result, PhraseQueries can show some false positives.
In addition, after the computation of the position delta, we can
underflow u32, and end up with gigantic delta.
We haven't been able to actually explain the bug in 1629, but it
is assumed that in some corner case these delta can cause a panic.
Closes #1629
2022-10-20 11:03:55 +09:00
Pascal Seitz
f2b2628feb
add test for phrase search on multi text field
2022-10-19 16:29:56 +08:00
PSeitz
449f595832
Merge pull request #1628 from quickwit-oss/skip_index_deser
...
faster skipindex deserialization, larger blocksize on sort
2022-10-19 11:05:20 +08:00
PSeitz
c9235df059
Merge pull request #1627 from quickwit-oss/ip_field_range_query
...
add range query handling for ip via term dictionary
2022-10-19 10:53:00 +08:00
Pascal Seitz
a4485f7611
faster skipindex deserialization, larger blocksize on sort
2022-10-18 19:32:23 +08:00
Pascal Seitz
1082ff60f9
add range query handling for ip via term dictionary
...
since IPs are mapped monotonically we can use the term dictionary for range queries
2022-10-18 13:08:27 +08:00
PSeitz
491854155c
Merge pull request #1625 from quickwit-oss/index_ip_field
...
index ip field
2022-10-18 11:18:17 +08:00
Christoph Herzog
96c3d54ac7
fix: Fix power of two computation on 32bit architectures ( #1624 )
...
The current `compute_previous_power_of_two()` implementation used for
TermHashmap takes and returns `usize` , but actually only works
correclty on 64 bit architectures (aka usize == u64)
On other architectures the leading_zeros computation is run on the wrong
type (must be u64), and leads to overflows.
Fixed simply computing the leading_zeros based on a u64 value.
2022-10-18 11:55:02 +09:00
Pascal Seitz
6800fdec9d
add indexing for ip field
...
Closes #1595
2022-10-18 10:07:48 +08:00
PSeitz
c9cf9c952a
Merge pull request #1614 from quickwit-oss/remove_superfluous_steps
...
refactor Term
2022-10-17 18:25:31 +08:00
Pascal Seitz
024e53a99c
remove truncate
2022-10-17 12:14:35 +08:00
Pascal Seitz
8d75e451bd
fix truncate, remove mutable access from term
2022-10-17 12:14:35 +08:00
Pascal Seitz
fcfd76ec55
refactor Term
...
fixes some issues with Term
Remove duplicate calls to truncate or resize
Replace Magic Number 5 with constant
Enforce minimum size of 5 for metadata
Fix broken truncate docs
use constructor instead new + set calls
normalize constructor stack
replace assert on internal behavior fixes #1585
2022-10-17 12:14:34 +08:00
PSeitz
6b7b1cc4fa
Merge pull request #1623 from quickwit-oss/remove_unused_buffer
...
remove unused buffer
2022-10-14 20:36:00 +08:00
Pascal Seitz
129f7422f5
remove unused buffer
2022-10-14 20:01:10 +08:00
PSeitz
f39cce2c8b
Merge pull request #1622 from quickwit-oss/term_aggregation
...
add term aggregation clarification
2022-10-14 18:09:18 +08:00
PSeitz
d2478fac8a
Merge pull request #1621 from quickwit-oss/changelog
...
update CHANGELOG
2022-10-14 18:08:57 +08:00
Pascal Seitz
952b048341
add term aggregation clarification
2022-10-14 16:12:19 +08:00