Commit Graph

2807 Commits

Author SHA1 Message Date
Pascal Seitz
e07f1970ea fix count type 2023-01-13 20:10:23 +08:00
Pascal Seitz
78273bfb0d reuse stats for average 2023-01-13 17:43:25 +08:00
Shikhar Bhushan
2650111b76 EnableScoring::Disabled - optional Searcher (#1780) 2023-01-12 09:26:50 -05:00
PSeitz
1176555eff handle user input on get_docid_for_value_range (#1760)
* handle user input on get_docid_for_value_range

fixes #1757

* pass range as parameter
2023-01-12 14:20:16 +01:00
Adrien Guillo
f8d111a75e Merge pull request #1777 from quickwit-oss/guilload/ff-range-query-on-not-indexed-fields
Allow range queries via fast fields on non-indexed fields
2023-01-11 10:14:32 -05:00
Adrien Guillo
e17996f2fd Allow range queries via fast fields on non-indexed fields 2023-01-11 09:56:13 -05:00
Adrien Guillo
f3621c0487 Add license to tokenizer-api crate (#1778) 2023-01-11 05:26:41 +01:00
Adrien Guillo
14222a47a3 Fix typo (#1776) 2023-01-11 00:49:13 +09:00
Adam Reichold
8312c882a5 More cosmetic fixes for upcoming Clippy lints. (#1771) 2023-01-10 10:32:45 +01:00
Paul Masurel
7a8fce0ae7 Minor mini fixes 2023-01-10 14:15:30 +09:00
Michael Kleen
196e42f33e Add regex tokenizer (#1759)
This adds a regex tokenizer which tokenizes the text by using a
regex pattern to split.

Co-authored-by: Michael Kleen <mkleen@gmailw.com>
2023-01-10 13:38:37 +09:00
Adam Reichold
82a183bc2d Bump dependency on lru to from version 0.7.5 to version 0.9.0. (#1755) 2023-01-10 13:35:37 +09:00
dependabot[bot]
3090d49615 Update base64 requirement from 0.20.0 to 0.21.0 (#1769)
Updates the requirements on [base64](https://github.com/marshallpierce/rust-base64) to permit the latest version.
- [Release notes](https://github.com/marshallpierce/rust-base64/releases)
- [Changelog](https://github.com/marshallpierce/rust-base64/blob/master/RELEASE-NOTES.md)
- [Commits](https://github.com/marshallpierce/rust-base64/compare/v0.20.0...v0.21.0)

---
updated-dependencies:
- dependency-name: base64
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-01-10 13:35:05 +09:00
PSeitz
7c6cc818ae enable range query on fast field for u64 compatible types (#1762)
* enable range query on fast field for u64 compatible types

* rename, update benches
2023-01-10 04:08:26 +01:00
PSeitz
514d23a20c move tokenizer API to seperate crate (#1767)
closes #1766

Finding tantivy tokenizers is a frustrating experience currently, since
they need be updated for each tantivy version. That's unnecessary since
the API is rather stable anyway.
2023-01-09 06:37:38 +01:00
Paul Masurel
4f9efe654c Support for columnar (#1734)
* Added support for dynamic fast field.

See README for more information.

* Apply suggestions from code review

Co-authored-by: PSeitz <PSeitz@users.noreply.github.com>
2023-01-07 17:37:00 +09:00
Adam Reichold
1afa5bf3db Make construction of LevenshteinAutomatonBuilder for FuzzyTermQuery instances lazy. (#1756) 2023-01-06 12:44:49 +09:00
PSeitz
07a51eb7c8 refactor multivalue fastfield, refactor range query (#1749)
Introduce MakeZero trait, remove make_zero from FastValue
Merge two multivalue fastfield implementations into one
prepare range query on fastfield for different types
2023-01-05 12:09:50 +01:00
Adam Reichold
2080c370c2 Enable usage of FuzzyTermQuery for specific fields via QueryParser (#1750)
* Make nightly Clippy mostly happy.

* Document how to produce TermSetQuery queries using QueryParser.

* Enable construction of queries using FuzzyTermQuery via the QueryParser

* Use FxHashMap instead of HashMap in the QueryParser as these hash tables are not exposed to DoS attacks.

* Use a struct instead of a tuple to improve readability.
2023-01-04 18:11:27 +09:00
Daw-Chih Liou
b22f96624e doc: update comments in the faceted search example (#1737)
* doc: update comments in the faceted search example

* chore: format
2023-01-02 11:07:30 +01:00
pinkforest(she/her)
b78dc5e313 Bump prettytables (#1746) 2022-12-31 15:01:39 +01:00
Paul Masurel
3f915925af Fixing unit tests 2022-12-27 12:02:16 +09:00
Paul Masurel
9c5fef5af7 Fixing sstable proptest (#1743) 2022-12-26 16:29:33 +09:00
Paul Masurel
9948a84ebe Simplifies the count_ones definition. (#1742) 2022-12-26 16:08:01 +09:00
PSeitz
45156fd869 use group_by in translate_codec_idx_to_original_id (#1736) 2022-12-26 06:13:29 +01:00
Paul Masurel
bc959006fa Ooops. Removing ordered_floats. 2022-12-22 19:50:34 +09:00
Paul Masurel
7385a8f80c Supporting PartialCmp in VectorColumn. (#1735)
* Supporting PartialCmp in VectorColumn.
* Apply suggestions from code review

Co-authored-by: PSeitz <PSeitz@users.noreply.github.com>
2022-12-22 17:47:25 +09:00
Paul Masurel
13b89cba17 Adding inlines. 2022-12-22 14:29:41 +09:00
Hasnain Lakhani
f4804ce2f5 Adjust spelling of "returns" in docs for DisjunctionMaxQuery (#1733) 2022-12-22 14:04:07 +09:00
Paul Masurel
2a6d1eaf78 Added missing license. 2022-12-22 12:47:43 +09:00
Paul Masurel
540a9972bd Support for NotNaN in fast fields 2022-12-22 12:28:25 +09:00
Paul Masurel
bb48c3e488 Refactoring to prepare for the addition of dynamic fast field (#1730)
* Refactoring to prepare for the addition of dynamic fast field

- Exposing insert_key / insert_value
- Renamed SSTable::{Reader/Writer}-> SSTable::{ValueReader/ValueWriter}
- Added a generic Dictionary object in the sstable crate
- Removing the TermDictionary wrapper from tantivy, relying directly on
  an alias of the generic Dictionary object.
- dropped the use of byteorder in sstable.
- Stopped scanning / reading the entire dictionary when streaming a range.

* Added a benchmark for streaming sstable ranges.

* CR comments.

Rename deserialize_u64 -> deserialize_vint_u64

* Removed needless allocation, split serialize into serialize and clear.
2022-12-22 12:25:46 +09:00
Paul Masurel
3339a3ec05 Removed feature(quickwit) in tantivy-common. 2022-12-22 10:19:57 +09:00
Paul Masurel
f39165e1e7 Moving FileSlice to tantivy-common (#1729) 2022-12-21 16:35:11 +09:00
Paul Masurel
32cb1d22da Removed AsyncIoResult. (#1728) 2022-12-21 16:01:17 +09:00
Paul Masurel
4a6bf50e78 Clippy 2022-12-21 15:43:34 +09:00
PSeitz
2ac1cc2fc0 add sparse codec (#1723)
* add sparse codec

* Apply suggestions from code review

Co-authored-by: Paul Masurel <paul@quickwit.io>

* Apply suggestions from code review

Co-authored-by: Paul Masurel <paul@quickwit.io>

* Apply suggestions from code review

Co-authored-by: Paul Masurel <paul@quickwit.io>

* add the -1 u16 fix for metadata num_vals

* add dense block encoding to sparse codec

* add comment, refactor u16 reading

Co-authored-by: Paul Masurel <paul@quickwit.io>
2022-12-20 15:30:33 +01:00
PSeitz
f9171a3981 fix clippy (#1725)
* fix clippy

* fix clippy fastfield codecs

* fix clippy bitpacker

* fix clippy common

* fix clippy stacker

* fix clippy sstable

* fmt
2022-12-20 07:30:06 +01:00
PSeitz
a2cf6a79b4 Sparse dense index (#1716)
* add dense codec

* benchmark fix and important optimisation

* move code to DenseIndexBlock

improve benchmark

* Apply suggestions from code review

Co-authored-by: Paul Masurel <paul@quickwit.io>

* Apply suggestions from code review

Co-authored-by: Paul Masurel <paul@quickwit.io>

* extend benchmarks

* Apply suggestions from code review

Co-authored-by: Paul Masurel <paul@quickwit.io>

Co-authored-by: Paul Masurel <paul@quickwit.io>
2022-12-13 07:50:09 +01:00
Paul Masurel
f6e87a5319 Cargo fmt 2022-12-13 12:30:40 +09:00
Paul Masurel
f9971e15fe Fixing unit test with sstable test. 2022-12-13 12:22:44 +09:00
PSeitz
3cdc8e7472 pass index info to serialize (#1719) 2022-12-13 04:20:31 +01:00
dependabot[bot]
fbb0f8b55d Update base64 requirement from 0.13.0 to 0.20.0 (#1720)
Updates the requirements on [base64](https://github.com/marshallpierce/rust-base64) to permit the latest version.
- [Release notes](https://github.com/marshallpierce/rust-base64/releases)
- [Changelog](https://github.com/marshallpierce/rust-base64/blob/master/RELEASE-NOTES.md)
- [Commits](https://github.com/marshallpierce/rust-base64/compare/v0.13.0...v0.20.0)

---
updated-dependencies:
- dependency-name: base64
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-12-13 11:46:23 +09:00
Paul Masurel
136a8f4124 Isolating sstable and stacker in independant crates. (#1718)
Both crate will be used in the new (optional + dynamic) fastfield work.
2022-12-13 11:44:17 +09:00
PSeitz
5d4535de83 Changelog fix (#1717) 2022-12-12 14:28:42 +09:00
PSeitz
2c50b02eb3 Fix max bucket limit in histogram (#1703)
* Fix max bucket limit in histogram

The max bucket limit in histogram was broken, since some code introduced temporary filtering of buckets, which then resulted into an incorrect increment on the bucket count.
The provided solution covers more scenarios, but there are still some scenarios unhandled (See #1702).

* Apply suggestions from code review

Co-authored-by: Paul Masurel <paul@quickwit.io>

Co-authored-by: Paul Masurel <paul@quickwit.io>
0.19
2022-12-12 04:40:15 +01:00
PSeitz
509adab79d Bump version (#1715)
* group workspace deps

* update cargo.toml

* revert tant version

* chore: Release
2022-12-12 04:39:43 +01:00
PSeitz
96c93a6ba3 Merge pull request #1700 from quickwit-oss/PSeitz-patch-1
Update CHANGELOG.md
2022-12-02 16:31:11 +01:00
boraarslan
495824361a Move split_full_path to Schema (#1692) 2022-11-29 20:56:13 +09:00
PSeitz
485a8f507e Update CHANGELOG.md 2022-11-28 15:41:31 +01:00