Paul Masurel
bb48c3e488
Refactoring to prepare for the addition of dynamic fast field ( #1730 )
...
* Refactoring to prepare for the addition of dynamic fast field
- Exposing insert_key / insert_value
- Renamed SSTable::{Reader/Writer}-> SSTable::{ValueReader/ValueWriter}
- Added a generic Dictionary object in the sstable crate
- Removing the TermDictionary wrapper from tantivy, relying directly on
an alias of the generic Dictionary object.
- dropped the use of byteorder in sstable.
- Stopped scanning / reading the entire dictionary when streaming a range.
* Added a benchmark for streaming sstable ranges.
* CR comments.
Rename deserialize_u64 -> deserialize_vint_u64
* Removed needless allocation, split serialize into serialize and clear.
2022-12-22 12:25:46 +09:00
Paul Masurel
f39165e1e7
Moving FileSlice to tantivy-common ( #1729 )
2022-12-21 16:35:11 +09:00
Paul Masurel
32cb1d22da
Removed AsyncIoResult. ( #1728 )
2022-12-21 16:01:17 +09:00
Paul Masurel
4a6bf50e78
Clippy
2022-12-21 15:43:34 +09:00
PSeitz
f9171a3981
fix clippy ( #1725 )
...
* fix clippy
* fix clippy fastfield codecs
* fix clippy bitpacker
* fix clippy common
* fix clippy stacker
* fix clippy sstable
* fmt
2022-12-20 07:30:06 +01:00
Paul Masurel
f6e87a5319
Cargo fmt
2022-12-13 12:30:40 +09:00
Paul Masurel
f9971e15fe
Fixing unit test with sstable test.
2022-12-13 12:22:44 +09:00
Paul Masurel
136a8f4124
Isolating sstable and stacker in independant crates. ( #1718 )
...
Both crate will be used in the new (optional + dynamic) fastfield work.
2022-12-13 11:44:17 +09:00
PSeitz
2c50b02eb3
Fix max bucket limit in histogram ( #1703 )
...
* Fix max bucket limit in histogram
The max bucket limit in histogram was broken, since some code introduced temporary filtering of buckets, which then resulted into an incorrect increment on the bucket count.
The provided solution covers more scenarios, but there are still some scenarios unhandled (See #1702 ).
* Apply suggestions from code review
Co-authored-by: Paul Masurel <paul@quickwit.io >
Co-authored-by: Paul Masurel <paul@quickwit.io >
2022-12-12 04:40:15 +01:00
boraarslan
495824361a
Move split_full_path to Schema ( #1692 )
2022-11-29 20:56:13 +09:00
PSeitz
1119e59eae
prepare fastfield format for null index ( #1691 )
...
* prepare fastfield format for null index
* add format version for fastfield
* Update fastfield_codecs/src/compact_space/mod.rs
* switch to variable size footer
* serialize delta of end
2022-11-28 17:15:24 +09:00
PSeitz
ee1f2c1f28
add aggregation support for date type ( #1693 )
...
* add aggregation support for date type
fixes #1332
* serialize key_as_string as rfc3339 in date histogram
* update docs
* enable date for range aggregation
2022-11-28 09:12:08 +09:00
PSeitz
0281b22b77
update create_in_ram docs ( #1695 )
2022-11-24 17:30:09 +01:00
Paul Masurel
0b40a7fe43
Added a expand_dots JsonObjectOptions. ( #1687 )
...
Related with quickwit#2345.
2022-11-21 23:03:00 +09:00
trinity-1686a
e758080465
add support for TermSetQuery in query parser ( #1683 )
2022-11-17 16:49:49 +01:00
Paul Masurel
2a39289a1b
Handle escaped dot in json path in the QueryParser. ( #1682 )
2022-11-16 07:18:34 +09:00
Adam Reichold
ca6231170e
Make the built-in stop word lists selectable via the Language enum already used by the Stemmer filter. ( #1671 )
2022-11-15 17:40:25 +09:00
Pascal Seitz
8641155cbb
remove column from MultiValuedU128FastFieldReader
2022-11-14 18:49:15 +08:00
Pascal Seitz
b7d0dd154a
fmt
2022-11-14 14:49:15 +08:00
PSeitz
ce10fab20f
Apply suggestions from code review
...
Co-authored-by: Paul Masurel <paul@quickwit.io >
2022-11-14 14:21:53 +08:00
Pascal Seitz
e034328a8b
Improve position_to_docid, refactor, add tests
2022-11-14 14:21:53 +08:00
Pascal Seitz
f811d1616b
add support for ip range query on multivalue fastfields
2022-11-14 14:21:52 +08:00
PSeitz
c665b16ff0
Merge pull request #1672 from quickwit-oss/allow_range_without_indexed
...
Allow range query on fastfield without INDEXED
2022-11-14 12:45:12 +08:00
PSeitz
3b5f810051
Merge pull request #1677 from quickwit-oss/switch_to_u32
...
switch total_num_val to u32
2022-11-14 12:01:40 +08:00
trinity-1686a
5765c261aa
allow warming up of the full posting list ( #1673 )
...
* allow warming up of the full posting list
* cargo fmt
2022-11-14 10:27:56 +09:00
Pascal Seitz
fb9f03118d
switch total_num_val to u32
2022-11-11 17:35:52 +08:00
Pascal Seitz
9e8a0c2cca
Allow range query on fastfield without INDEXED
2022-11-10 15:56:08 +08:00
Paul Masurel
3edf0a2724
Using the manual reload policy in IndexWriter. ( #1667 )
2022-11-09 11:20:41 +01:00
Adam Reichold
a4b759d2fe
Include stop word lists from Lucene and the Snowball project ( #1666 )
2022-11-09 16:57:35 +09:00
PSeitz
3e9c806890
Merge pull request #1665 from quickwit-oss/fix_num_vals
...
fix num_vals on u128 value index after merge
2022-11-07 21:46:02 +08:00
Pascal Seitz
c69a873dd3
fix num_vals on value index after merge
2022-11-07 21:05:21 +08:00
Pascal Seitz
38ad46e580
fix clippy
2022-11-07 16:09:55 +08:00
Pascal Seitz
6e636c9cea
fix num_vals in multivalue index after merge
2022-11-07 15:00:52 +08:00
PSeitz
509a265659
add docstore version ( #1652 )
...
* add docstore version
closes #1589
* assert for docstore version
2022-11-04 10:19:16 +09:00
PSeitz
5b2cea1b97
Merge pull request #1656 from quickwit-oss/multival_offset_index
...
move multivalue index to own file
2022-11-02 14:03:06 +08:00
PSeitz
0f98d91a39
Merge pull request #1646 from quickwit-oss/no_score_calls
...
No score calls if score is not requested
2022-11-01 20:09:32 +08:00
PSeitz
2af6b01c17
Update src/query/boolean_query/boolean_weight.rs
...
Co-authored-by: Paul Masurel <paul@quickwit.io >
2022-11-01 16:13:00 +08:00
Adam Reichold
c32ab66bbd
Small improvements to StopWorldFilter ( #1657 )
...
* Do not copy the whole set of stop words for each stream
* Make construction of StopWordFilter more flexible.
2022-11-01 16:47:34 +09:00
PSeitz
3f3a6f9990
Merge pull request #1653 from quickwit-oss/faster_hash
...
switch to fx hashmap
2022-11-01 14:53:18 +08:00
Pascal Seitz
83325d8f3f
move multivalue index to own file
...
start_doc parameter in positions to docids
2022-11-01 10:36:13 +08:00
PSeitz
4e46f4f8c4
Merge pull request #1649 from adamreichold/split-compound-words
...
RFC: Add dictionary-based SplitCompoundWords token filter.
2022-10-27 17:12:48 +08:00
Pascal Seitz
43df356010
rename to docset
2022-10-27 16:53:38 +08:00
PSeitz
6647362464
Merge pull request #1648 from adamreichold/stemmer-todo-alloc
...
Avoid unconditional allocation in StemmerTokenStream.
2022-10-27 16:50:41 +08:00
Pascal Seitz
279b1b28d3
switch to fx hashmap
2022-10-27 16:19:59 +08:00
PSeitz
7a80851e36
Merge pull request #1645 from quickwit-oss/ip_field_range_query
...
add ip range query benchmark, add seek behaviour
2022-10-27 16:13:52 +08:00
Adam Reichold
cd952429d2
Add dictionary-based SplitCompoundWords token filter.
2022-10-27 08:30:33 +02:00
Adam Reichold
bbb058d976
Replace FNV by rustc-hash
...
Both construction have similar goals but rustc-hash ist better suited for
contemporary CPU as it works one word at a time instead of byte per byte.
2022-10-27 00:35:09 +02:00
Adam Reichold
5f7d027a52
Avoid unconditional allocation in StemmerTokenStream.
...
This fixes the TODO in two ways: If the stemmer already yields an owned string,
it is used directly as the new text of the token. Otherwise, a temporary buffer
is used to copy the stemmed text (just as before) and then swapping it into the
token to reuse its existing buffer.
2022-10-26 18:11:15 +02:00
Pascal Seitz
dfab201191
for_each_docset to iterate without score
2022-10-26 17:25:05 +08:00
PSeitz
0c2bd36fe3
Panic on duplicate field names ( #1647 )
...
fixes #1601
2022-10-26 16:17:33 +09:00