tantivy

mirror of https://github.com/quickwit-oss/tantivy.git synced 2025-12-26 12:09:57 +00:00

Author	SHA1	Message	Date
Pascal Seitz	806a1e1b1e	clarify tokenizer docs	2023-04-03 22:59:38 +08:00
PSeitz	5c4ea6a708	tokenizer option on text fastfield (#1945 ) * tokenizer option on text fastfield allow to set tokenizer option on text fastfield (fixes #1901) handle PreTokenized strings in fast field * change visibility * remove custom de/serialization	2023-03-31 10:03:38 +02:00
PSeitz	4cf93dab7d	fix build (#1973 )	2023-03-31 13:54:03 +09:00
PSeitz	5c380b76e7	Better mixed types support in aggs and fix serialization issue (#1971 ) * Better mixed types support in aggs and fix serialization issue - Improve support for mixed types in JSON field aggregations (pick the right field, #1913) - Resolve the issue with JSON serialization for numeric keys (fixes #1967) - Add JSON round-trip test for term buckets - Remove `u64_lenient`, as this is a footgun without the type - move aggregation benchmarks * remove shadowing	2023-03-31 05:52:11 +02:00
PSeitz	571735c5f7	Fix index sort by on optional/multicolumn (#1972 ) Fix index sort by on optional/multicolumn add optional columns to proptest extend proptests for sort add columnar sort tests	2023-03-31 04:24:11 +02:00
zhouhui	8e92f960d3	Fix comment: change max_merge_size to max_docs_before_merge. (#1970 )	2023-03-28 22:49:00 +09:00
Paul Masurel	2b6a4da640	Exposing empty column builder. (#1959 )	2023-03-24 16:34:41 +09:00
PSeitz	d6a95381ee	add memory check for term agg (#1957 )	2023-03-24 06:47:45 +01:00
PSeitz	da2804644f	fetch blocks of vals in aggregation for all cardinality (#1950 ) * fetch blocks of vals in aggregation for all cardinality * move caching in common accessor	2023-03-23 08:41:11 +01:00
trinity-1686a	482b4155e8	fix bug with new sstable index format (#1953 )	2023-03-22 10:22:36 +01:00
Till Wegmüller	1a35f6573d	Switch fs2 to fs4 as it is now unmaintained and does not support illumos (#1944 ) Signed-off-by: Till Wegmueller <toasterson@gmail.com>	2023-03-22 13:48:49 +09:00
trinity-1686a	e5e50603a8	new sstable format (#1943 ) * document a new sstable format * add support for changing target block size * use new format for sstable index * handle sstable version errror * use very small blocks for proptests * add a footer structure	2023-03-21 15:03:52 +01:00
PSeitz	8f7f1d6be4	add Display for ByteCount (#1949 ) * add Display for ByteCount * export missing AggregationLimits	2023-03-21 08:02:35 +01:00
PSeitz	6a7a1106d6	work in batches of docs (#1937 ) * work in batches of docs * add fill_buffer test	2023-03-21 06:57:44 +01:00
PSeitz	9e2faecf5b	add memory limit for aggregations (#1942 ) * add memory limit for aggregations introduce AggregationLimits to set memory consumption limit and bucket limits memory limit is checked during aggregation, bucket limit is checked before returning the aggregation request. * Apply suggestions from code review Co-authored-by: Paul Masurel <paul@quickwit.io> * add ByteCount with human readable format --------- Co-authored-by: Paul Masurel <paul@quickwit.io>	2023-03-16 06:21:07 +01:00
PSeitz	b6703f1b3c	fix validation in date histogram (#1936 ) fix validation in date histogram for parameters interval and date_interval	2023-03-15 06:10:43 +01:00
PSeitz	2fb3740cb0	handle missing column for aggs (#1920 ) * handle missing column for aggs add empty column fallback for missing column in aggs. Fix sort for term agg on sub-agg with missing value (null is smallest) * add error when field is not fast	2023-03-15 06:09:59 +01:00
PSeitz	8459efa32c	split term collection count and sub_agg (#1921 ) use unrolled ColumnValues::get_vals	2023-03-13 04:37:41 +01:00
PSeitz	61cfd8dc57	fix clippy (#1927 )	2023-03-13 03:12:02 +01:00
trinity-1686a	064518156f	refactor tokenization pipeline to use GATs (#1924 ) * refactor tokenization pipeline to use GATs * fix doctests * fix clippy lints * remove commented code	2023-03-09 09:39:37 +01:00
Paul Masurel	364e321415	Clippy fix (#1926 )	2023-03-06 10:37:17 +09:00
PSeitz	ca20bfa776	add date_histogram (#1900 ) * add date_histogram * add return result	2023-03-02 05:17:35 +01:00
PSeitz	faa706d804	add coerce option for text and numbers types (#1904 ) * add coerce option for text and numbers types allow to coerce the field type when indexing if the type does not match * Apply suggestions from code review Co-authored-by: Paul Masurel <paul@quickwit.io> * add tests,add COERCE flag, include bool in coercion --------- Co-authored-by: Paul Masurel <paul@quickwit.io>	2023-03-01 11:36:59 +01:00
PSeitz	850a0d7ae2	add agg benchmark for optional and multi value (#1916 ) closes #1870	2023-03-01 17:01:52 +09:00
Paul Masurel	7fae4d98d7	Adapting for quickwit2 (#1912 ) * Adapting tantivy to make it possible to be plugged to quickwit. * Apply suggestions from code review Co-authored-by: PSeitz <PSeitz@users.noreply.github.com> * Added unit test --------- Co-authored-by: PSeitz <PSeitz@users.noreply.github.com>	2023-03-01 16:27:46 +09:00
PSeitz	bc36458334	move buffer in front of dynamic dispatch (#1915 ) dynamic dispatch seems to be really expensive, move the buffer in front of the dynamic dispatch, to reduce the number of calls into the dynamic dispatched collector.	2023-02-28 13:07:50 +08:00
trinity-1686a	8a71e00da3	allow limiting the number of matched term in range query (#1899 )	2023-02-27 10:44:08 +01:00
PSeitz	e510f699c8	feat: add support for u64,i64,f64 fields in term aggregation (#1883 ) * feat: add support for u64,i64,f64 fields in term aggregation * hash enum values * fix build * Apply suggestions from code review Co-authored-by: Paul Masurel <paul@quickwit.io> --------- Co-authored-by: Paul Masurel <paul@quickwit.io>	2023-02-27 15:04:41 +08:00
Paul Masurel	d25fc155b2	Making some of the column/termdict operations async-friendly (#1902 )	2023-02-27 15:34:47 +09:00
Paul Masurel	8ea97e7d6b	Minor refactoring preparing for getting columnar integrated in quickwit. (#1911 )	2023-02-27 14:23:30 +09:00
Paul Masurel	66ff53b0f4	Various minor code cleanup (#1909 )	2023-02-27 13:48:34 +09:00
Paul Masurel	d002698008	Re-export of query grammar. (#1908 )	2023-02-27 12:26:34 +09:00
Paul Masurel	c838aa808b	Removedc the extra nesting in unit test file (#1907 )	2023-02-27 12:17:52 +09:00
Paul Masurel	06850719dc	Renaming .values(DocId) to .values_for_doc(DocId) (#1906 )	2023-02-27 12:15:13 +09:00
PSeitz	5f23bb7e65	switch to sparse collection for histogram (#1898 ) * switch to sparse collection for histogram Replaces histogram vec collection with a hashmap. This approach works much better for sparse data and enables use cases like drill downs (filter + small interval). It is slower for dense cases (1.3x-2x slower). This can be alleviated with a specialized hashmap in the future. closes #1704 closes #1370 * refactor, clippy * fix bucket_pos overflow issue	2023-02-23 07:02:58 +01:00
trinity-1686a	533ad99cd5	add PhrasePrefixQuery (#1842 ) * add PhrasePrefixQuery	2023-02-22 11:18:33 +01:00
PSeitz	c7278b3258	remove schema in aggs (#1888 ) * switch to ColumnType, move tests * remove Schema dependency in agg	2023-02-22 04:50:28 +01:00
Paul Masurel	6b403e3281	Re-export of columnar	2023-02-22 11:23:54 +09:00
Paul Masurel	789cc8703e	Adding unit test testing docfreq after merge (#1895 )	2023-02-22 11:05:34 +09:00
Paul Masurel	e5098d9fe8	Moving test around reenabling tests that were disabled. (#1894 )	2023-02-22 10:31:52 +09:00
Paul Masurel	f537334e4f	Adding a write schema to columnar's merge operations. (#1884 ) * Adding a write schema to columnar's merge operations. * Added unit test checking min/max when columns are empty. * CR comment * Rename to value_type_to_column_type	2023-02-21 18:25:16 +09:00
Paul Masurel	e2aa5af075	Clippy warnings fixes (#1885 )	2023-02-20 19:04:13 +09:00
PSeitz	74bf60b4f7	implement SegmentAggregationCollector on bucket aggs (#1878 )	2023-02-17 12:53:29 +01:00
PSeitz	111f25a8f7	clippy (#1879 ) * fix clippy * fix clippy * fmt	2023-02-17 11:34:21 +01:00
PSeitz	019db10e8e	refactor aggregations (#1875 ) * add specialized version for full cardinality Pre Columnar test aggregation::tests::bench::bench_aggregation_average_u64 ... bench: 6,681,850 ns/iter (+/- 1,217,385) test aggregation::tests::bench::bench_aggregation_average_u64_and_f64 ... bench: 10,576,327 ns/iter (+/- 494,380) Current test aggregation::tests::bench::bench_aggregation_average_u64 ... bench: 11,562,084 ns/iter (+/- 3,678,682) test aggregation::tests::bench::bench_aggregation_average_u64_and_f64 ... bench: 18,925,790 ns/iter (+/- 17,616,771) Post Change test aggregation::tests::bench::bench_aggregation_average_u64 ... bench: 9,123,811 ns/iter (+/- 399,720) test aggregation::tests::bench::bench_aggregation_average_u64_and_f64 ... bench: 13,111,825 ns/iter (+/- 273,547) * refactor aggregation collection * add buffering collector	2023-02-16 13:15:16 +01:00
Paul Masurel	7423f99719	Issue/columnar for json (#1876 ) Adding support for JSON fast field.	2023-02-16 20:38:32 +09:00
Alex Cole	f2f38c43ce	Make BM25 scoring more flexible (#1855 ) * Introduce Bm25StatisticsProvider to inject statistics * fix formatting I accidentally changed	2023-02-16 19:14:12 +09:00
PSeitz	347614c841	test error for avg agg on ip field (#1873 ) closes #1835	2023-02-14 23:22:56 +08:00
Paul Masurel	097fd6138d	Fix clippy comments (#1872 )	2023-02-14 23:12:45 +09:00
PSeitz	01e5a22759	switch to new ff api (#1868 )	2023-02-14 15:57:32 +08:00

1 2 3 4 5 ...

2208 Commits