tantivy

mirror of https://github.com/quickwit-oss/tantivy.git synced 2026-05-25 12:40:41 +00:00

Author	SHA1	Message	Date
Yukun Guo	dfe4e95fde	Make index compatible with virtual drives on Windows (#1843 ) * Make index compatible with virtual drives on Windows * Get rid of normpath	2023-02-14 16:41:48 +09:00
Paul Masurel	60cc2644d6	Fixing test_fail_on_flush_segment_but_one_worker_remains (#1869 ) The new fast field code, based on columnar, had a larger minimum memory footprint, causing the first docuemnt to trigger a flush of the asegment in this unit test. This PR prevents the allocation of a large capacity for the different hashmap tables using in the columnar writer. Closes #1859	2023-02-14 16:09:42 +09:00
Paul Masurel	10bccac61b	Bugfix in parse_into_milliseconds (#1867 )	2023-02-14 15:06:40 +09:00
PSeitz	1cfb9ce59a	improve range query performance (#1864 ) fix RowId vs DocId naming fixes #1863	2023-02-14 13:25:39 +09:00
trinity-1686a	539ff08a79	move DateTime to tantivy_common (#1861 ) * move DateTime to tantivy_common * resolve imports of columnar::DateTime as import of common::DateTime	2023-02-11 17:03:06 +01:00
PSeitz	dab93df94e	fix benchmarks (#1862 )	2023-02-11 15:44:47 +09:00
PSeitz	cbcafae04c	fix: doc store for files larger 4GB (#1856 ) Fixes an issue in the skip list deserialization, which deserialized the byte start offset incorrectly as u32. `get_doc` will fail for any docs that live in a block with start offset larger than u32::MAX (~4GB). Causes index corruption, if a segment with a doc store larger 4GB is merged. tantivy version 0.19 is affected	2023-02-10 14:29:43 +01:00
PSeitz	36c6138e7f	fix: auto downgrade index record option, instead of vint error (#1857 ) Prev: thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: IoError(Custom { kind: InvalidData, error: "Reach end of buffer while reading VInt" })', src/main.rs:46:14 Now: Automatic downgrade to next available level	2023-02-10 13:45:23 +01:00
PSeitz	7a9befd18d	fix sort order test for term aggregation (#1858 ) fix sort order test for term aggregation fix invalid request test	2023-02-10 10:26:58 +01:00
PSeitz	03345f0aa2	fmt code, update lz4_flex (#1838 ) formatting on nightly changed	2023-02-10 01:42:32 +09:00
Paul Masurel	b7bfa20e38	Fixed test performance.	2023-02-09 17:39:55 +01:00
trinity-1686a	1390834ae8	make Term::as_slice public (#1846 )	2023-02-09 15:37:07 +01:00
trinity-1686a	3ac973bea4	fix invalid endianness in documentation (#1845 ) * fix doc about term endianness * rustfmt	2023-02-09 15:36:38 +01:00
Paul Masurel	405e2cf4d9	Merge with main	2023-02-09 14:28:57 +01:00
Paul Masurel	bd5eea9852	Integrated columnar work.	2023-02-09 13:14:31 +01:00
PSeitz	0f20787917	fix doc store cache docs (#1821 ) * fix doc store cache docs addresses an issue reported in #1820 * rename doc_store_cache_size	2023-01-23 07:06:49 +01:00
Paul Masurel	08919a2900	Improvement on the scalar / random bitpacker code. (#1781 ) * Improvement on the scalar / random bitpacker code. Added proptesting Added simple benchmark Added assert and comments on the very non trivial hidden contract Remove the need for an extra padding. The last point introduces a small performance regression (~10%). * Fixing unit tests	2023-01-19 18:09:13 +09:00
Lonre Wang	8ba333f1b4	Typo fix (#1803 ) * Update text_options.rs * Update src/schema/text_options.rs Co-authored-by: Paul Masurel <paul@quickwit.io>	2023-01-19 17:56:05 +09:00
PSeitz	a2ca12995e	update aggregation docs (#1807 )	2023-01-19 09:52:47 +01:00
Paul Masurel	5180b612ef	Removing the demuxer code (#1799 )	2023-01-18 16:12:35 +09:00
PSeitz	f687b3a5aa	start migrate Field to &str (#1772 ) start migrate Field to &str in preparation of columnar return Result for get_field	2023-01-18 16:12:07 +09:00
Adrien Guillo	c51d9f9f83	Fix some Clippy warnings	2023-01-17 10:17:51 -05:00
Adrien Guillo	0caaf13a90	Remove standard deviation from stats aggregation	2023-01-16 22:58:23 -05:00
Adrien Guillo	f2dad194ea	Add count, min, max, and sum aggregations	2023-01-16 12:22:20 -05:00
PSeitz	6ca9a477f3	reuse stats for average (#1785 ) * reuse stats for average * fix count type	2023-01-13 23:32:27 +08:00
Shikhar Bhushan	2650111b76	EnableScoring::Disabled - optional Searcher (#1780 )	2023-01-12 09:26:50 -05:00
PSeitz	1176555eff	handle user input on get_docid_for_value_range (#1760 ) * handle user input on get_docid_for_value_range fixes #1757 * pass range as parameter	2023-01-12 14:20:16 +01:00
Adrien Guillo	e17996f2fd	Allow range queries via fast fields on non-indexed fields	2023-01-11 09:56:13 -05:00
Adrien Guillo	14222a47a3	Fix typo (#1776 )	2023-01-11 00:49:13 +09:00
Adam Reichold	8312c882a5	More cosmetic fixes for upcoming Clippy lints. (#1771 )	2023-01-10 10:32:45 +01:00
Paul Masurel	7a8fce0ae7	Minor mini fixes	2023-01-10 14:15:30 +09:00
Michael Kleen	196e42f33e	Add regex tokenizer (#1759 ) This adds a regex tokenizer which tokenizes the text by using a regex pattern to split. Co-authored-by: Michael Kleen <mkleen@gmailw.com>	2023-01-10 13:38:37 +09:00
Adam Reichold	82a183bc2d	Bump dependency on lru to from version 0.7.5 to version 0.9.0. (#1755 )	2023-01-10 13:35:37 +09:00
PSeitz	7c6cc818ae	enable range query on fast field for u64 compatible types (#1762 ) * enable range query on fast field for u64 compatible types * rename, update benches	2023-01-10 04:08:26 +01:00
PSeitz	514d23a20c	move tokenizer API to seperate crate (#1767 ) closes #1766 Finding tantivy tokenizers is a frustrating experience currently, since they need be updated for each tantivy version. That's unnecessary since the API is rather stable anyway.	2023-01-09 06:37:38 +01:00
Adam Reichold	1afa5bf3db	Make construction of LevenshteinAutomatonBuilder for FuzzyTermQuery instances lazy. (#1756 )	2023-01-06 12:44:49 +09:00
PSeitz	07a51eb7c8	refactor multivalue fastfield, refactor range query (#1749 ) Introduce MakeZero trait, remove make_zero from FastValue Merge two multivalue fastfield implementations into one prepare range query on fastfield for different types	2023-01-05 12:09:50 +01:00
Adam Reichold	2080c370c2	Enable usage of FuzzyTermQuery for specific fields via QueryParser (#1750 ) * Make nightly Clippy mostly happy. * Document how to produce TermSetQuery queries using QueryParser. * Enable construction of queries using FuzzyTermQuery via the QueryParser * Use FxHashMap instead of HashMap in the QueryParser as these hash tables are not exposed to DoS attacks. * Use a struct instead of a tuple to improve readability.	2023-01-04 18:11:27 +09:00
Hasnain Lakhani	f4804ce2f5	Adjust spelling of "returns" in docs for DisjunctionMaxQuery (#1733 )	2022-12-22 14:04:07 +09:00
Paul Masurel	bb48c3e488	Refactoring to prepare for the addition of dynamic fast field (#1730 ) * Refactoring to prepare for the addition of dynamic fast field - Exposing insert_key / insert_value - Renamed SSTable::{Reader/Writer}-> SSTable::{ValueReader/ValueWriter} - Added a generic Dictionary object in the sstable crate - Removing the TermDictionary wrapper from tantivy, relying directly on an alias of the generic Dictionary object. - dropped the use of byteorder in sstable. - Stopped scanning / reading the entire dictionary when streaming a range. * Added a benchmark for streaming sstable ranges. * CR comments. Rename deserialize_u64 -> deserialize_vint_u64 * Removed needless allocation, split serialize into serialize and clear.	2022-12-22 12:25:46 +09:00
Paul Masurel	f39165e1e7	Moving FileSlice to tantivy-common (#1729 )	2022-12-21 16:35:11 +09:00
Paul Masurel	32cb1d22da	Removed AsyncIoResult. (#1728 )	2022-12-21 16:01:17 +09:00
Paul Masurel	4a6bf50e78	Clippy	2022-12-21 15:43:34 +09:00
PSeitz	f9171a3981	fix clippy (#1725 ) * fix clippy * fix clippy fastfield codecs * fix clippy bitpacker * fix clippy common * fix clippy stacker * fix clippy sstable * fmt	2022-12-20 07:30:06 +01:00
Paul Masurel	f6e87a5319	Cargo fmt	2022-12-13 12:30:40 +09:00
Paul Masurel	f9971e15fe	Fixing unit test with sstable test.	2022-12-13 12:22:44 +09:00
Paul Masurel	136a8f4124	Isolating sstable and stacker in independant crates. (#1718 ) Both crate will be used in the new (optional + dynamic) fastfield work.	2022-12-13 11:44:17 +09:00
PSeitz	2c50b02eb3	Fix max bucket limit in histogram (#1703 ) * Fix max bucket limit in histogram The max bucket limit in histogram was broken, since some code introduced temporary filtering of buckets, which then resulted into an incorrect increment on the bucket count. The provided solution covers more scenarios, but there are still some scenarios unhandled (See #1702). * Apply suggestions from code review Co-authored-by: Paul Masurel <paul@quickwit.io> Co-authored-by: Paul Masurel <paul@quickwit.io>	2022-12-12 04:40:15 +01:00
boraarslan	495824361a	Move `split_full_path` to `Schema` (#1692 )	2022-11-29 20:56:13 +09:00
PSeitz	1119e59eae	prepare fastfield format for null index (#1691 ) * prepare fastfield format for null index * add format version for fastfield * Update fastfield_codecs/src/compact_space/mod.rs * switch to variable size footer * serialize delta of end	2022-11-28 17:15:24 +09:00

1 2 3 4 5 ...

2208 Commits