tantivy

mirror of https://github.com/quickwit-oss/tantivy.git synced 2026-05-18 17:20:41 +00:00

Author	SHA1	Message	Date
PSeitz	e3eacb4388	release tantivy (#2083 ) * prerelease * chore: Release	2023-06-09 10:47:46 +02:00
PSeitz	6239697a02	switch to ms in histogram for date type (#2045 ) * switch to ms in histogram for date type switch to ms in histogram, by adding a normalization step that converts to nanoseconds precision when creating the collector. closes #2028 related to #2026 * add missing unit long variants * use single thread to avoid handling test case * fix docs * revert CI * cleanup * improve docs * Update src/aggregation/bucket/histogram/histogram.rs Co-authored-by: Paul Masurel <paul@quickwit.io> --------- Co-authored-by: Paul Masurel <paul@quickwit.io>	2023-05-19 08:15:44 +02:00
Yuri Astrakhan	74275b76a6	Inline format arguments where makes sense (#2038 ) Applied this command to the code, making it a bit shorter and slightly more readable. ``` cargo +nightly clippy --all-features --benches --tests --workspace --fix -- -A clippy::all -W clippy::uninlined_format_args cargo +nightly fmt --all ```	2023-05-10 18:03:59 +09:00
tottoto	73452284ae	Remove unused crates from dependencies (#2018 ) * Remove unused crates from dependencies * Revert rand to columnar * Revert criterion to stacker	2023-05-02 12:34:20 +02:00
PSeitz	ba309e18a1	switch to nanosecond precision (#2016 )	2023-05-01 03:32:20 +02:00
trinity-1686a	780e26331d	sstable compression (#1946 ) * compress sstable with zstd * add some details to sstable readme * compress only block which benefit from it * multiple changes to sstable make compression optional use OwnedBytes instead of impl Read in sstable, required for next point use zstd bulk api, which is much faster on small records * cleanup and use bulk api for compression * use dedicated byte for compression * switch block len and compression flag * change default zstd level in sstable	2023-04-14 16:25:50 +02:00
trinity-1686a	205e8a0a92	encode dictionary type in fst footer (#1968 ) * encode additional footer for dictionary kind in fst	2023-04-12 09:43:01 +02:00
Paul Masurel	5eb12173d6	Proptest merge columnar (#1976 ) * Added proptest on columnar merge with a shuffle Made column serialization more explicit. Bugfix when a bytes column is missing, and with a shuffle. Improved the cardinality detection logic / column detection. * Code review * CR comments * Following CR	2023-04-04 11:28:42 +09:00
PSeitz	571735c5f7	Fix index sort by on optional/multicolumn (#1972 ) Fix index sort by on optional/multicolumn add optional columns to proptest extend proptests for sort add columnar sort tests	2023-03-31 04:24:11 +02:00
Paul Masurel	694a056255	Faster range (#1954 ) * Faster range queries This PR does several changes - ip compact space now uses u32 - the bitunpacker now gets a get_batch function - we push down range filtering, removing GCD / shift in the bitpacking codec. - we rely on AVX2 routine to do the filtering. * Apply suggestions from code review * Apply suggestions from code review * CR comments	2023-03-27 14:56:32 +09:00
Paul Masurel	2955e34452	Added proptests for building/merging columnar. (#1963 )	2023-03-27 14:56:02 +09:00
Paul Masurel	821208480b	Adding Debug/Display impl. Refining the ColumnIndex::get_cardinality	2023-03-26 14:40:37 +09:00
Paul Masurel	a2e3c2ed5b	Renaming Column::idx -> Column::index (#1961 ) There was some variable name ghosting happening.	2023-03-26 13:58:50 +09:00
PSeitz	835f228bfa	fix cardinality when merging empty columns (#1960 ) fixes #1958	2023-03-25 15:58:15 +09:00
Paul Masurel	2b6a4da640	Exposing empty column builder. (#1959 )	2023-03-24 16:34:41 +09:00
PSeitz	da2804644f	fetch blocks of vals in aggregation for all cardinality (#1950 ) * fetch blocks of vals in aggregation for all cardinality * move caching in common accessor	2023-03-23 08:41:11 +01:00
PSeitz	5504cfd012	remove IterColumn (#1955 ) fixes #1658	2023-03-23 06:43:17 +01:00
trinity-1686a	482b4155e8	fix bug with new sstable index format (#1953 )	2023-03-22 10:22:36 +01:00
trinity-1686a	e5e50603a8	new sstable format (#1943 ) * document a new sstable format * add support for changing target block size * use new format for sstable index * handle sstable version errror * use very small blocks for proptests * add a footer structure	2023-03-21 15:03:52 +01:00
PSeitz	6a7a1106d6	work in batches of docs (#1937 ) * work in batches of docs * add fill_buffer test	2023-03-21 06:57:44 +01:00
PSeitz	9e2faecf5b	add memory limit for aggregations (#1942 ) * add memory limit for aggregations introduce AggregationLimits to set memory consumption limit and bucket limits memory limit is checked during aggregation, bucket limit is checked before returning the aggregation request. * Apply suggestions from code review Co-authored-by: Paul Masurel <paul@quickwit.io> * add ByteCount with human readable format --------- Co-authored-by: Paul Masurel <paul@quickwit.io>	2023-03-16 06:21:07 +01:00
PSeitz	8459efa32c	split term collection count and sub_agg (#1921 ) use unrolled ColumnValues::get_vals	2023-03-13 04:37:41 +01:00
PSeitz	a42a96f470	fix panic in dict column merge (#1930 ) * fix panic in dict column merge * Bugfix and added unit test --------- Co-authored-by: Paul Masurel <paul@quickwit.io>	2023-03-08 22:04:37 +09:00
Paul Masurel	364e321415	Clippy fix (#1926 )	2023-03-06 10:37:17 +09:00
PSeitz	bc36458334	move buffer in front of dynamic dispatch (#1915 ) dynamic dispatch seems to be really expensive, move the buffer in front of the dynamic dispatch, to reduce the number of calls into the dynamic dispatched collector.	2023-02-28 13:07:50 +08:00
Paul Masurel	d25fc155b2	Making some of the column/termdict operations async-friendly (#1902 )	2023-02-27 15:34:47 +09:00
Paul Masurel	0a726a0897	Added Empty ColumnIndex (#1910 )	2023-02-27 13:59:22 +09:00
Paul Masurel	06850719dc	Renaming .values(DocId) to .values_for_doc(DocId) (#1906 )	2023-02-27 12:15:13 +09:00
PSeitz	c7278b3258	remove schema in aggs (#1888 ) * switch to ColumnType, move tests * remove Schema dependency in agg	2023-02-22 04:50:28 +01:00
Paul Masurel	f537334e4f	Adding a write schema to columnar's merge operations. (#1884 ) * Adding a write schema to columnar's merge operations. * Added unit test checking min/max when columns are empty. * CR comment * Rename to value_type_to_column_type	2023-02-21 18:25:16 +09:00
Paul Masurel	02bebf4ff5	Cargo fmt	2023-02-20 09:40:04 +09:00
Paul Masurel	0274c982d5	Refactoring. (#1881 ) `ColumnValues` wrongly located in column_values/column.rs due to historical reason moves to column_values/mod.rs u128 stuff gets its own directory like u64 stuff.	2023-02-17 21:57:14 +09:00
PSeitz	74bf60b4f7	implement SegmentAggregationCollector on bucket aggs (#1878 )	2023-02-17 12:53:29 +01:00
PSeitz	111f25a8f7	clippy (#1879 ) * fix clippy * fix clippy * fmt	2023-02-17 11:34:21 +01:00
PSeitz	71f43ace1d	fix dynamic dispatch regression for range queries (#1871 )	2023-02-14 16:56:40 +01:00
Paul Masurel	097fd6138d	Fix clippy comments (#1872 )	2023-02-14 23:12:45 +09:00
Paul Masurel	60cc2644d6	Fixing test_fail_on_flush_segment_but_one_worker_remains (#1869 ) The new fast field code, based on columnar, had a larger minimum memory footprint, causing the first docuemnt to trigger a flush of the asegment in this unit test. This PR prevents the allocation of a large capacity for the different hashmap tables using in the columnar writer. Closes #1859	2023-02-14 16:09:42 +09:00
PSeitz	1cfb9ce59a	improve range query performance (#1864 ) fix RowId vs DocId naming fixes #1863	2023-02-14 13:25:39 +09:00
trinity-1686a	539ff08a79	move DateTime to tantivy_common (#1861 ) * move DateTime to tantivy_common * resolve imports of columnar::DateTime as import of common::DateTime	2023-02-11 17:03:06 +01:00
PSeitz	dab93df94e	fix benchmarks (#1862 )	2023-02-11 15:44:47 +09:00
Paul Masurel	62c811df2b	Added a columnar cli	2023-02-09 19:02:16 +01:00
Paul Masurel	b7bfa20e38	Fixed test performance.	2023-02-09 17:39:55 +01:00
Paul Masurel	db8583db75	Fixing unit test	2023-02-09 16:53:05 +01:00
Paul Masurel	405e2cf4d9	Merge with main	2023-02-09 14:28:57 +01:00
Paul Masurel	b63c6c27bc	adding change from main	2023-02-09 14:18:46 +01:00
Paul Masurel	bd5eea9852	Integrated columnar work.	2023-02-09 13:14:31 +01:00
Paul Masurel	2874554ee4	Removed the sorting logic that forced column type to be sorted like (#1816 ) * Removed the sorting logic that forced column type to be sorted like ColumnTypes. * add comments Co-authored-by: PSeitz <PSeitz@users.noreply.github.com>	2023-01-20 12:43:28 +01:00
PSeitz	cbc70a9eae	Cargo.toml cleanup (#1817 )	2023-01-20 12:30:35 +01:00
PSeitz	226d0f88bc	add columnar to workspace (#1808 )	2023-01-20 11:47:10 +01:00
Paul Masurel	9548570e88	Fixing broken test build	2023-01-20 18:18:32 +09:00

1 2

64 Commits