tantivy

mirror of https://github.com/quickwit-oss/tantivy.git synced 2026-06-02 08:30:41 +00:00

Author	SHA1	Message	Date
Paul Masurel	c838aa808b	Removedc the extra nesting in unit test file (#1907 )	2023-02-27 12:17:52 +09:00
Paul Masurel	06850719dc	Renaming .values(DocId) to .values_for_doc(DocId) (#1906 )	2023-02-27 12:15:13 +09:00
PSeitz	5f23bb7e65	switch to sparse collection for histogram (#1898 ) * switch to sparse collection for histogram Replaces histogram vec collection with a hashmap. This approach works much better for sparse data and enables use cases like drill downs (filter + small interval). It is slower for dense cases (1.3x-2x slower). This can be alleviated with a specialized hashmap in the future. closes #1704 closes #1370 * refactor, clippy * fix bucket_pos overflow issue	2023-02-23 07:02:58 +01:00
trinity-1686a	533ad99cd5	add PhrasePrefixQuery (#1842 ) * add PhrasePrefixQuery	2023-02-22 11:18:33 +01:00
PSeitz	c7278b3258	remove schema in aggs (#1888 ) * switch to ColumnType, move tests * remove Schema dependency in agg	2023-02-22 04:50:28 +01:00
Paul Masurel	6b403e3281	Re-export of columnar	2023-02-22 11:23:54 +09:00
Paul Masurel	789cc8703e	Adding unit test testing docfreq after merge (#1895 )	2023-02-22 11:05:34 +09:00
Paul Masurel	e5098d9fe8	Moving test around reenabling tests that were disabled. (#1894 )	2023-02-22 10:31:52 +09:00
Paul Masurel	f537334e4f	Adding a write schema to columnar's merge operations. (#1884 ) * Adding a write schema to columnar's merge operations. * Added unit test checking min/max when columns are empty. * CR comment * Rename to value_type_to_column_type	2023-02-21 18:25:16 +09:00
Paul Masurel	e2aa5af075	Clippy warnings fixes (#1885 )	2023-02-20 19:04:13 +09:00
PSeitz	74bf60b4f7	implement SegmentAggregationCollector on bucket aggs (#1878 )	2023-02-17 12:53:29 +01:00
PSeitz	111f25a8f7	clippy (#1879 ) * fix clippy * fix clippy * fmt	2023-02-17 11:34:21 +01:00
PSeitz	019db10e8e	refactor aggregations (#1875 ) * add specialized version for full cardinality Pre Columnar test aggregation::tests::bench::bench_aggregation_average_u64 ... bench: 6,681,850 ns/iter (+/- 1,217,385) test aggregation::tests::bench::bench_aggregation_average_u64_and_f64 ... bench: 10,576,327 ns/iter (+/- 494,380) Current test aggregation::tests::bench::bench_aggregation_average_u64 ... bench: 11,562,084 ns/iter (+/- 3,678,682) test aggregation::tests::bench::bench_aggregation_average_u64_and_f64 ... bench: 18,925,790 ns/iter (+/- 17,616,771) Post Change test aggregation::tests::bench::bench_aggregation_average_u64 ... bench: 9,123,811 ns/iter (+/- 399,720) test aggregation::tests::bench::bench_aggregation_average_u64_and_f64 ... bench: 13,111,825 ns/iter (+/- 273,547) * refactor aggregation collection * add buffering collector	2023-02-16 13:15:16 +01:00
Paul Masurel	7423f99719	Issue/columnar for json (#1876 ) Adding support for JSON fast field.	2023-02-16 20:38:32 +09:00
Alex Cole	f2f38c43ce	Make BM25 scoring more flexible (#1855 ) * Introduce Bm25StatisticsProvider to inject statistics * fix formatting I accidentally changed	2023-02-16 19:14:12 +09:00
PSeitz	347614c841	test error for avg agg on ip field (#1873 ) closes #1835	2023-02-14 23:22:56 +08:00
Paul Masurel	097fd6138d	Fix clippy comments (#1872 )	2023-02-14 23:12:45 +09:00
PSeitz	01e5a22759	switch to new ff api (#1868 )	2023-02-14 15:57:32 +08:00
Yukun Guo	dfe4e95fde	Make index compatible with virtual drives on Windows (#1843 ) * Make index compatible with virtual drives on Windows * Get rid of normpath	2023-02-14 16:41:48 +09:00
Paul Masurel	60cc2644d6	Fixing test_fail_on_flush_segment_but_one_worker_remains (#1869 ) The new fast field code, based on columnar, had a larger minimum memory footprint, causing the first docuemnt to trigger a flush of the asegment in this unit test. This PR prevents the allocation of a large capacity for the different hashmap tables using in the columnar writer. Closes #1859	2023-02-14 16:09:42 +09:00
Paul Masurel	10bccac61b	Bugfix in parse_into_milliseconds (#1867 )	2023-02-14 15:06:40 +09:00
PSeitz	1cfb9ce59a	improve range query performance (#1864 ) fix RowId vs DocId naming fixes #1863	2023-02-14 13:25:39 +09:00
trinity-1686a	539ff08a79	move DateTime to tantivy_common (#1861 ) * move DateTime to tantivy_common * resolve imports of columnar::DateTime as import of common::DateTime	2023-02-11 17:03:06 +01:00
PSeitz	dab93df94e	fix benchmarks (#1862 )	2023-02-11 15:44:47 +09:00
PSeitz	cbcafae04c	fix: doc store for files larger 4GB (#1856 ) Fixes an issue in the skip list deserialization, which deserialized the byte start offset incorrectly as u32. `get_doc` will fail for any docs that live in a block with start offset larger than u32::MAX (~4GB). Causes index corruption, if a segment with a doc store larger 4GB is merged. tantivy version 0.19 is affected	2023-02-10 14:29:43 +01:00
PSeitz	36c6138e7f	fix: auto downgrade index record option, instead of vint error (#1857 ) Prev: thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: IoError(Custom { kind: InvalidData, error: "Reach end of buffer while reading VInt" })', src/main.rs:46:14 Now: Automatic downgrade to next available level	2023-02-10 13:45:23 +01:00
PSeitz	7a9befd18d	fix sort order test for term aggregation (#1858 ) fix sort order test for term aggregation fix invalid request test	2023-02-10 10:26:58 +01:00
PSeitz	03345f0aa2	fmt code, update lz4_flex (#1838 ) formatting on nightly changed	2023-02-10 01:42:32 +09:00
Paul Masurel	b7bfa20e38	Fixed test performance.	2023-02-09 17:39:55 +01:00
trinity-1686a	1390834ae8	make Term::as_slice public (#1846 )	2023-02-09 15:37:07 +01:00
trinity-1686a	3ac973bea4	fix invalid endianness in documentation (#1845 ) * fix doc about term endianness * rustfmt	2023-02-09 15:36:38 +01:00
Paul Masurel	405e2cf4d9	Merge with main	2023-02-09 14:28:57 +01:00
Paul Masurel	bd5eea9852	Integrated columnar work.	2023-02-09 13:14:31 +01:00
PSeitz	0f20787917	fix doc store cache docs (#1821 ) * fix doc store cache docs addresses an issue reported in #1820 * rename doc_store_cache_size	2023-01-23 07:06:49 +01:00
Paul Masurel	08919a2900	Improvement on the scalar / random bitpacker code. (#1781 ) * Improvement on the scalar / random bitpacker code. Added proptesting Added simple benchmark Added assert and comments on the very non trivial hidden contract Remove the need for an extra padding. The last point introduces a small performance regression (~10%). * Fixing unit tests	2023-01-19 18:09:13 +09:00
Lonre Wang	8ba333f1b4	Typo fix (#1803 ) * Update text_options.rs * Update src/schema/text_options.rs Co-authored-by: Paul Masurel <paul@quickwit.io>	2023-01-19 17:56:05 +09:00
PSeitz	a2ca12995e	update aggregation docs (#1807 )	2023-01-19 09:52:47 +01:00
Paul Masurel	5180b612ef	Removing the demuxer code (#1799 )	2023-01-18 16:12:35 +09:00
PSeitz	f687b3a5aa	start migrate Field to &str (#1772 ) start migrate Field to &str in preparation of columnar return Result for get_field	2023-01-18 16:12:07 +09:00
Adrien Guillo	c51d9f9f83	Fix some Clippy warnings	2023-01-17 10:17:51 -05:00
Adrien Guillo	0caaf13a90	Remove standard deviation from stats aggregation	2023-01-16 22:58:23 -05:00
Adrien Guillo	f2dad194ea	Add count, min, max, and sum aggregations	2023-01-16 12:22:20 -05:00
PSeitz	6ca9a477f3	reuse stats for average (#1785 ) * reuse stats for average * fix count type	2023-01-13 23:32:27 +08:00
Shikhar Bhushan	2650111b76	EnableScoring::Disabled - optional Searcher (#1780 )	2023-01-12 09:26:50 -05:00
PSeitz	1176555eff	handle user input on get_docid_for_value_range (#1760 ) * handle user input on get_docid_for_value_range fixes #1757 * pass range as parameter	2023-01-12 14:20:16 +01:00
Adrien Guillo	e17996f2fd	Allow range queries via fast fields on non-indexed fields	2023-01-11 09:56:13 -05:00
Adrien Guillo	14222a47a3	Fix typo (#1776 )	2023-01-11 00:49:13 +09:00
Adam Reichold	8312c882a5	More cosmetic fixes for upcoming Clippy lints. (#1771 )	2023-01-10 10:32:45 +01:00
Paul Masurel	7a8fce0ae7	Minor mini fixes	2023-01-10 14:15:30 +09:00
Michael Kleen	196e42f33e	Add regex tokenizer (#1759 ) This adds a regex tokenizer which tokenizes the text by using a regex pattern to split. Co-authored-by: Michael Kleen <mkleen@gmailw.com>	2023-01-10 13:38:37 +09:00

1 2 3 4 5 ...

2176 Commits