tantivy

mirror of https://github.com/quickwit-oss/tantivy.git synced 2026-01-04 08:12:54 +00:00

Author	SHA1	Message	Date
RT_Enzyme	ff3d3313c4	fix BooleanQuery document (#1999 ) * fix BooleanQuery document * Update src/query/boolean_query/boolean_query.rs --------- Co-authored-by: Paul Masurel <paul@quickwit.io>	2023-04-20 11:37:20 +02:00
Paul Masurel	fbda511a1a	Making more things public for quickwit. (#2005 )	2023-04-20 11:37:45 +09:00
Adam Reichold	c1defdda05	Bump aho-corasick dependency to version 1.0 and adjust to API changes (#2002 ) * Drop additional Arc-layer as the automaton itself is now cheap-to-clone. * Drop state ID type parameter as it is not exposed by the library any more.	2023-04-18 07:34:30 +02:00
PSeitz	e522163a1c	use json in agg tests (#1998 ) * switch to JSON in tests, add flat aggregation types * use method * clippy * remove commented file	2023-04-17 14:08:48 +02:00
PSeitz	e83abbfe4a	perf: faster term hash map (#1940 ) * add term hashmap benchmark * refactor arena hashmap add inlines remove occupied array and use table_entry.is_empty instead (saves 4 bytes per entry) reduce saturation threshold from 1/3 to 1/2 to reduce memory use u32 for UnorderedId (we have the 4billion limit anyways on the Columnar stuff) fix naming LinearProbing remove byteorder dependency memory consumption went down from 2Gb to 1.8GB on indexing wikipedia dataset in tantivy * Update stacker/src/arena_hashmap.rs Co-authored-by: Paul Masurel <paul@quickwit.io> --------- Co-authored-by: Paul Masurel <paul@quickwit.io>	2023-04-17 09:07:33 +02:00
trinity-1686a	780e26331d	sstable compression (#1946 ) * compress sstable with zstd * add some details to sstable readme * compress only block which benefit from it * multiple changes to sstable make compression optional use OwnedBytes instead of impl Read in sstable, required for next point use zstd bulk api, which is much faster on small records * cleanup and use bulk api for compression * use dedicated byte for compression * switch block len and compression flag * change default zstd level in sstable	2023-04-14 16:25:50 +02:00
trinity-1686a	0286ecea09	re-export a few sstable functions on dicitonary (#1996 ) * re-export a few sstable functions on dicitonary * Update documentation Co-authored-by: François Massot <francois.massot@gmail.com> --------- Co-authored-by: François Massot <francois.massot@gmail.com>	2023-04-14 11:13:48 +02:00
PSeitz	b0ef9a6252	use crates.io dependency (#1990 )	2023-04-14 09:35:20 +08:00
François Massot	36138c493b	Merge pull request #1994 from quickwit-oss/fmassot/expose-simple-token-stream Expose `SimpleTokenStream` to use it in quickwit for the multilanguage tokenizer	2023-04-13 18:55:02 +02:00
François Massot	64bce340b2	Expose to use it in quickwit.	2023-04-13 18:28:53 +02:00
trinity-1686a	205e8a0a92	encode dictionary type in fst footer (#1968 ) * encode additional footer for dictionary kind in fst	2023-04-12 09:43:01 +02:00
Paul Masurel	4b01cc4c49	Made BooleanWeight and BoostWeight public (#1991 )	2023-04-12 10:26:30 +09:00
PSeitz	0ed13eeea8	add sparse to agg benchmark (#1986 ) * add sparse to agg benchmark * Update src/aggregation/agg_bench.rs Co-authored-by: Paul Masurel <paul@quickwit.io> --------- Co-authored-by: Paul Masurel <paul@quickwit.io>	2023-04-11 08:13:32 +02:00
Tony-X	91a38058fe	Fix typo in READEME.md (#1989 )	2023-04-11 12:07:20 +09:00
PSeitz	41af70799d	add percentiles aggregations (#1984 ) * add percentiles aggregations add percentiles aggregation fix disabled agg benchmark * Update src/aggregation/metric/percentiles.rs Co-authored-by: Paul Masurel <paul@quickwit.io> * Apply suggestions from code review Co-authored-by: Paul Masurel <paul@quickwit.io> * fix import * fix import --------- Co-authored-by: Paul Masurel <paul@quickwit.io>	2023-04-07 07:18:28 +02:00
Paul Masurel	f853bf204b	Align the numerical type priority order with columnar. (#1978 ) Closes #1956	2023-04-07 10:07:54 +09:00
Tony-X	11ae48d3bc	Update benchmarks section in READEME.md to link to the bench repo (#1985 ) * Update benchmarks section in READEME.md to link to the bench repo * Apply suggestions from code review --------- Co-authored-by: Paul Masurel <paul@quickwit.io>	2023-04-07 10:07:06 +09:00
Paul Masurel	5eb12173d6	Proptest merge columnar (#1976 ) * Added proptest on columnar merge with a shuffle Made column serialization more explicit. Bugfix when a bytes column is missing, and with a shuffle. Improved the cardinality detection logic / column detection. * Code review * CR comments * Following CR	2023-04-04 11:28:42 +09:00
PSeitz	5c4ea6a708	tokenizer option on text fastfield (#1945 ) * tokenizer option on text fastfield allow to set tokenizer option on text fastfield (fixes #1901) handle PreTokenized strings in fast field * change visibility * remove custom de/serialization	2023-03-31 10:03:38 +02:00
PSeitz	4cf93dab7d	fix build (#1973 )	2023-03-31 13:54:03 +09:00
PSeitz	5c380b76e7	Better mixed types support in aggs and fix serialization issue (#1971 ) * Better mixed types support in aggs and fix serialization issue - Improve support for mixed types in JSON field aggregations (pick the right field, #1913) - Resolve the issue with JSON serialization for numeric keys (fixes #1967) - Add JSON round-trip test for term buckets - Remove `u64_lenient`, as this is a footgun without the type - move aggregation benchmarks * remove shadowing	2023-03-31 05:52:11 +02:00
PSeitz	571735c5f7	Fix index sort by on optional/multicolumn (#1972 ) Fix index sort by on optional/multicolumn add optional columns to proptest extend proptests for sort add columnar sort tests	2023-03-31 04:24:11 +02:00
zhouhui	8e92f960d3	Fix comment: change max_merge_size to max_docs_before_merge. (#1970 )	2023-03-28 22:49:00 +09:00
Paul Masurel	057211c3d8	Fixing build on arm (#1966 )	2023-03-27 22:42:57 +09:00
Paul Masurel	059fc767ea	Added ::MIN ::MAX DateTime. (#1965 )	2023-03-27 15:32:53 +09:00
Paul Masurel	694a056255	Faster range (#1954 ) * Faster range queries This PR does several changes - ip compact space now uses u32 - the bitunpacker now gets a get_batch function - we push down range filtering, removing GCD / shift in the bitpacking codec. - we rely on AVX2 routine to do the filtering. * Apply suggestions from code review * Apply suggestions from code review * CR comments	2023-03-27 14:56:32 +09:00
Paul Masurel	2955e34452	Added proptests for building/merging columnar. (#1963 )	2023-03-27 14:56:02 +09:00
Paul Masurel	821208480b	Adding Debug/Display impl. Refining the ColumnIndex::get_cardinality	2023-03-26 14:40:37 +09:00
Paul Masurel	a2e3c2ed5b	Renaming Column::idx -> Column::index (#1961 ) There was some variable name ghosting happening.	2023-03-26 13:58:50 +09:00
PSeitz	835f228bfa	fix cardinality when merging empty columns (#1960 ) fixes #1958	2023-03-25 15:58:15 +09:00
Paul Masurel	2b6a4da640	Exposing empty column builder. (#1959 )	2023-03-24 16:34:41 +09:00
PSeitz	d6a95381ee	add memory check for term agg (#1957 )	2023-03-24 06:47:45 +01:00
PSeitz	da2804644f	fetch blocks of vals in aggregation for all cardinality (#1950 ) * fetch blocks of vals in aggregation for all cardinality * move caching in common accessor	2023-03-23 08:41:11 +01:00
PSeitz	5504cfd012	remove IterColumn (#1955 ) fixes #1658	2023-03-23 06:43:17 +01:00
trinity-1686a	482b4155e8	fix bug with new sstable index format (#1953 )	2023-03-22 10:22:36 +01:00
Till Wegmüller	1a35f6573d	Switch fs2 to fs4 as it is now unmaintained and does not support illumos (#1944 ) Signed-off-by: Till Wegmueller <toasterson@gmail.com>	2023-03-22 13:48:49 +09:00
trinity-1686a	e5e50603a8	new sstable format (#1943 ) * document a new sstable format * add support for changing target block size * use new format for sstable index * handle sstable version errror * use very small blocks for proptests * add a footer structure	2023-03-21 15:03:52 +01:00
PSeitz	8f7f1d6be4	add Display for ByteCount (#1949 ) * add Display for ByteCount * export missing AggregationLimits	2023-03-21 08:02:35 +01:00
PSeitz	6a7a1106d6	work in batches of docs (#1937 ) * work in batches of docs * add fill_buffer test	2023-03-21 06:57:44 +01:00
PSeitz	9e2faecf5b	add memory limit for aggregations (#1942 ) * add memory limit for aggregations introduce AggregationLimits to set memory consumption limit and bucket limits memory limit is checked during aggregation, bucket limit is checked before returning the aggregation request. * Apply suggestions from code review Co-authored-by: Paul Masurel <paul@quickwit.io> * add ByteCount with human readable format --------- Co-authored-by: Paul Masurel <paul@quickwit.io>	2023-03-16 06:21:07 +01:00
PSeitz	b6703f1b3c	fix validation in date histogram (#1936 ) fix validation in date histogram for parameters interval and date_interval	2023-03-15 06:10:43 +01:00
PSeitz	2fb3740cb0	handle missing column for aggs (#1920 ) * handle missing column for aggs add empty column fallback for missing column in aggs. Fix sort for term agg on sub-agg with missing value (null is smallest) * add error when field is not fast	2023-03-15 06:09:59 +01:00
PSeitz	8459efa32c	split term collection count and sub_agg (#1921 ) use unrolled ColumnValues::get_vals	2023-03-13 04:37:41 +01:00
PSeitz	61cfd8dc57	fix clippy (#1927 )	2023-03-13 03:12:02 +01:00
trinity-1686a	064518156f	refactor tokenization pipeline to use GATs (#1924 ) * refactor tokenization pipeline to use GATs * fix doctests * fix clippy lints * remove commented code	2023-03-09 09:39:37 +01:00
PSeitz	a42a96f470	fix panic in dict column merge (#1930 ) * fix panic in dict column merge * Bugfix and added unit test --------- Co-authored-by: Paul Masurel <paul@quickwit.io>	2023-03-08 22:04:37 +09:00
trinity-1686a	fcf5a25d93	use DeltaReader directly to implement Dictionnary::ord_to_term (#1928 )	2023-03-08 11:15:56 +09:00
dependabot[bot]	c0a5b28fd3	Update lru requirement from 0.9.0 to 0.10.0 (#1932 ) Updates the requirements on [lru](https://github.com/jeromefroe/lru-rs) to permit the latest version. - [Release notes](https://github.com/jeromefroe/lru-rs/releases) - [Changelog](https://github.com/jeromefroe/lru-rs/blob/master/CHANGELOG.md) - [Commits](https://github.com/jeromefroe/lru-rs/compare/0.9.0...0.10.0) --- updated-dependencies: - dependency-name: lru dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2023-03-07 15:09:02 +09:00
trinity-1686a	a4f7ca8309	use DeltaReader directly to implement Dictionnary::term_ord (#1925 ) * use DeltaReader directly to implement Dictionnary::term_ord * add some additional test case for Dictionary::term_ord	2023-03-06 09:45:22 +01:00
Paul Masurel	364e321415	Clippy fix (#1926 )	2023-03-06 10:37:17 +09:00

1 2 3 4 5 ...

2943 Commits