Commit Graph

2137 Commits

Author SHA1 Message Date
Pascal Seitz
49baa15f0f start migrate Field to &str
start migrate Field to &str in preparation of columnar
return Result for get_field
2023-01-17 13:34:21 +09:00
Adrien Guillo
0caaf13a90 Remove standard deviation from stats aggregation 2023-01-16 22:58:23 -05:00
Adrien Guillo
f2dad194ea Add count, min, max, and sum aggregations 2023-01-16 12:22:20 -05:00
PSeitz
6ca9a477f3 reuse stats for average (#1785)
* reuse stats for average

* fix count type
2023-01-13 23:32:27 +08:00
Shikhar Bhushan
2650111b76 EnableScoring::Disabled - optional Searcher (#1780) 2023-01-12 09:26:50 -05:00
PSeitz
1176555eff handle user input on get_docid_for_value_range (#1760)
* handle user input on get_docid_for_value_range

fixes #1757

* pass range as parameter
2023-01-12 14:20:16 +01:00
Adrien Guillo
e17996f2fd Allow range queries via fast fields on non-indexed fields 2023-01-11 09:56:13 -05:00
Adrien Guillo
14222a47a3 Fix typo (#1776) 2023-01-11 00:49:13 +09:00
Adam Reichold
8312c882a5 More cosmetic fixes for upcoming Clippy lints. (#1771) 2023-01-10 10:32:45 +01:00
Paul Masurel
7a8fce0ae7 Minor mini fixes 2023-01-10 14:15:30 +09:00
Michael Kleen
196e42f33e Add regex tokenizer (#1759)
This adds a regex tokenizer which tokenizes the text by using a
regex pattern to split.

Co-authored-by: Michael Kleen <mkleen@gmailw.com>
2023-01-10 13:38:37 +09:00
Adam Reichold
82a183bc2d Bump dependency on lru to from version 0.7.5 to version 0.9.0. (#1755) 2023-01-10 13:35:37 +09:00
PSeitz
7c6cc818ae enable range query on fast field for u64 compatible types (#1762)
* enable range query on fast field for u64 compatible types

* rename, update benches
2023-01-10 04:08:26 +01:00
PSeitz
514d23a20c move tokenizer API to seperate crate (#1767)
closes #1766

Finding tantivy tokenizers is a frustrating experience currently, since
they need be updated for each tantivy version. That's unnecessary since
the API is rather stable anyway.
2023-01-09 06:37:38 +01:00
Adam Reichold
1afa5bf3db Make construction of LevenshteinAutomatonBuilder for FuzzyTermQuery instances lazy. (#1756) 2023-01-06 12:44:49 +09:00
PSeitz
07a51eb7c8 refactor multivalue fastfield, refactor range query (#1749)
Introduce MakeZero trait, remove make_zero from FastValue
Merge two multivalue fastfield implementations into one
prepare range query on fastfield for different types
2023-01-05 12:09:50 +01:00
Adam Reichold
2080c370c2 Enable usage of FuzzyTermQuery for specific fields via QueryParser (#1750)
* Make nightly Clippy mostly happy.

* Document how to produce TermSetQuery queries using QueryParser.

* Enable construction of queries using FuzzyTermQuery via the QueryParser

* Use FxHashMap instead of HashMap in the QueryParser as these hash tables are not exposed to DoS attacks.

* Use a struct instead of a tuple to improve readability.
2023-01-04 18:11:27 +09:00
Hasnain Lakhani
f4804ce2f5 Adjust spelling of "returns" in docs for DisjunctionMaxQuery (#1733) 2022-12-22 14:04:07 +09:00
Paul Masurel
bb48c3e488 Refactoring to prepare for the addition of dynamic fast field (#1730)
* Refactoring to prepare for the addition of dynamic fast field

- Exposing insert_key / insert_value
- Renamed SSTable::{Reader/Writer}-> SSTable::{ValueReader/ValueWriter}
- Added a generic Dictionary object in the sstable crate
- Removing the TermDictionary wrapper from tantivy, relying directly on
  an alias of the generic Dictionary object.
- dropped the use of byteorder in sstable.
- Stopped scanning / reading the entire dictionary when streaming a range.

* Added a benchmark for streaming sstable ranges.

* CR comments.

Rename deserialize_u64 -> deserialize_vint_u64

* Removed needless allocation, split serialize into serialize and clear.
2022-12-22 12:25:46 +09:00
Paul Masurel
f39165e1e7 Moving FileSlice to tantivy-common (#1729) 2022-12-21 16:35:11 +09:00
Paul Masurel
32cb1d22da Removed AsyncIoResult. (#1728) 2022-12-21 16:01:17 +09:00
Paul Masurel
4a6bf50e78 Clippy 2022-12-21 15:43:34 +09:00
PSeitz
f9171a3981 fix clippy (#1725)
* fix clippy

* fix clippy fastfield codecs

* fix clippy bitpacker

* fix clippy common

* fix clippy stacker

* fix clippy sstable

* fmt
2022-12-20 07:30:06 +01:00
Paul Masurel
f6e87a5319 Cargo fmt 2022-12-13 12:30:40 +09:00
Paul Masurel
f9971e15fe Fixing unit test with sstable test. 2022-12-13 12:22:44 +09:00
Paul Masurel
136a8f4124 Isolating sstable and stacker in independant crates. (#1718)
Both crate will be used in the new (optional + dynamic) fastfield work.
2022-12-13 11:44:17 +09:00
PSeitz
2c50b02eb3 Fix max bucket limit in histogram (#1703)
* Fix max bucket limit in histogram

The max bucket limit in histogram was broken, since some code introduced temporary filtering of buckets, which then resulted into an incorrect increment on the bucket count.
The provided solution covers more scenarios, but there are still some scenarios unhandled (See #1702).

* Apply suggestions from code review

Co-authored-by: Paul Masurel <paul@quickwit.io>

Co-authored-by: Paul Masurel <paul@quickwit.io>
2022-12-12 04:40:15 +01:00
boraarslan
495824361a Move split_full_path to Schema (#1692) 2022-11-29 20:56:13 +09:00
PSeitz
1119e59eae prepare fastfield format for null index (#1691)
* prepare fastfield format for null index
* add format version for fastfield
* Update fastfield_codecs/src/compact_space/mod.rs
* switch to variable size footer
* serialize delta of end
2022-11-28 17:15:24 +09:00
PSeitz
ee1f2c1f28 add aggregation support for date type (#1693)
* add aggregation support for date type
fixes #1332

* serialize key_as_string as rfc3339 in date histogram
* update docs
* enable date for range aggregation
2022-11-28 09:12:08 +09:00
PSeitz
0281b22b77 update create_in_ram docs (#1695) 2022-11-24 17:30:09 +01:00
Paul Masurel
0b40a7fe43 Added a expand_dots JsonObjectOptions. (#1687)
Related with quickwit#2345.
2022-11-21 23:03:00 +09:00
trinity-1686a
e758080465 add support for TermSetQuery in query parser (#1683) 2022-11-17 16:49:49 +01:00
Paul Masurel
2a39289a1b Handle escaped dot in json path in the QueryParser. (#1682) 2022-11-16 07:18:34 +09:00
Adam Reichold
ca6231170e Make the built-in stop word lists selectable via the Language enum already used by the Stemmer filter. (#1671) 2022-11-15 17:40:25 +09:00
Pascal Seitz
8641155cbb remove column from MultiValuedU128FastFieldReader 2022-11-14 18:49:15 +08:00
Pascal Seitz
b7d0dd154a fmt 2022-11-14 14:49:15 +08:00
PSeitz
ce10fab20f Apply suggestions from code review
Co-authored-by: Paul Masurel <paul@quickwit.io>
2022-11-14 14:21:53 +08:00
Pascal Seitz
e034328a8b Improve position_to_docid, refactor, add tests 2022-11-14 14:21:53 +08:00
Pascal Seitz
f811d1616b add support for ip range query on multivalue fastfields 2022-11-14 14:21:52 +08:00
PSeitz
c665b16ff0 Merge pull request #1672 from quickwit-oss/allow_range_without_indexed
Allow range query on fastfield without INDEXED
2022-11-14 12:45:12 +08:00
PSeitz
3b5f810051 Merge pull request #1677 from quickwit-oss/switch_to_u32
switch total_num_val to u32
2022-11-14 12:01:40 +08:00
trinity-1686a
5765c261aa allow warming up of the full posting list (#1673)
* allow warming up of the full posting list

* cargo fmt
2022-11-14 10:27:56 +09:00
Pascal Seitz
fb9f03118d switch total_num_val to u32 2022-11-11 17:35:52 +08:00
Pascal Seitz
9e8a0c2cca Allow range query on fastfield without INDEXED 2022-11-10 15:56:08 +08:00
Paul Masurel
3edf0a2724 Using the manual reload policy in IndexWriter. (#1667) 2022-11-09 11:20:41 +01:00
Adam Reichold
a4b759d2fe Include stop word lists from Lucene and the Snowball project (#1666) 2022-11-09 16:57:35 +09:00
PSeitz
3e9c806890 Merge pull request #1665 from quickwit-oss/fix_num_vals
fix num_vals on u128 value index after merge
2022-11-07 21:46:02 +08:00
Pascal Seitz
c69a873dd3 fix num_vals on value index after merge 2022-11-07 21:05:21 +08:00
Pascal Seitz
38ad46e580 fix clippy 2022-11-07 16:09:55 +08:00