tantivy

mirror of https://github.com/quickwit-oss/tantivy.git synced 2026-05-14 15:20:43 +00:00

Author	SHA1	Message	Date
Paul Masurel	77505c3d03	Making stemming optional. (#2791 ) Fixed code and CI to run on no default features. Co-authored-by: Paul Masurel <paul.masurel@datadoghq.com>	2026-01-02 12:40:42 +01:00
PSeitz	945af922d1	clippy (#2661 ) * clippy * use readable version --------- Co-authored-by: Pascal Seitz <pascal.seitz@datadoghq.com>	2025-07-02 11:25:03 +02:00
PSeitz	21d057059e	clippy (#2527 ) * clippy * clippy * clippy * clippy * convert allow to expect and remove unused * cargo fmt * cleanup * export sample * clippy	2024-10-22 09:26:54 +08:00
PSeitz	2f2db16ec1	store DateTime as nanoseconds in doc store (#2486 ) * store DateTime as nanoseconds in doc store The doc store DateTime was truncated to microseconds previously. This removes this truncation, while still keeping backwards compatibility. This is done by adding the trait `ConfigurableBinarySerializable`, which works like `BinarySerializable`, but with a config that allows de/serialize as different date time precision currently. bump version format to 7. add compat test to check the date time truncation. * remove configurable binary serialize, add enum for doc store version * test doc store version ord	2024-10-18 10:50:20 +08:00
Bruce Mitchener	c17e513377	Reduce typo count. (#2510 )	2024-10-10 09:55:37 +08:00
trinity-1686a	85395d942a	fix clippy lints from 1.80-1.81 (#2488 ) * fix some clippy lints * fix clippy::doc_lazy_continuation * fix some lints for 1.82	2024-09-05 14:33:05 +02:00
PSeitz	2e3641c2ae	return CompactDocValue instead of trait (#2410 ) The CompactDocValue is easier to handle than the trait in some cases like comparison and conversion	2024-05-27 07:33:50 +02:00
PSeitz	e1679f3fb9	compact doc (#2402 ) * compact doc * add any value type * pass references when building CompactDoc * remove OwnedValue from API * clippy * clippy * fail on large documents * fmt * cleanup * cleanup * implement Value for different types fix serde_json date Value implementation * fmt * cleanup * fmt * cleanup * store positions instead of pos+len * remove nodes array * remove mediumvec * cleanup * infallible serialize into vec * remove positions indirection * remove 24MB limitation in document use u32 for Addr Remove the 3 byte addressing limitation and use VInt instead * cleanup * extend test * cleanup, add comments * rename, remove pub	2024-05-21 10:16:08 +02:00
trinity-1686a	8cd7ddc535	run block decompression from executor (#2386 ) * run block decompression from executor * add a wrapper with is_closed to oneshot channel * add cancelation test to Executor::spawn_blocking	2024-05-08 12:22:44 +02:00
Adam Reichold	b493743f8d	Fix trait bound of StoreReader::iter (#2360 ) * Fix trait bound of StoreReader::iter Similar to `StoreReader::get`, `StoreReader::iter` should only require `DocumentDeserialize` and not `Document`. * Mark the iterator returned by SegmentReader::doc_ids_alive as Send so it can be used in impls of Stream/AsyncIterator.	2024-04-15 15:50:02 +02:00
PSeitz	74940e9345	clippy (#2349 ) * fix clippy * fix clippy * fix duplicate imports	2024-04-09 07:54:44 +02:00
Adam Reichold	72002e8a89	Make test builds Clippy clean. (#2277 )	2024-01-31 02:47:06 +01:00
PSeitz	182f58cea6	remove Document: DocumentDeserialize dependency (#2211 ) * remove Document: DocumentDeserialize dependency The dependency requires users to implement an API they may not use. * remove unnecessary Document bounds	2023-10-13 07:59:54 +02:00
PSeitz	03a1f40767	rename DocValue to Value (#2197 ) rename DocValue to Value to avoid confusion with lucene DocValues rename Value to OwnedValue	2023-10-02 17:03:00 +02:00
Harrison Burt	1c7c6fd591	POC: Tantivy documents as a trait (#2071 ) * fix windows build (#1) * Fix windows build * Add doc traits * Add field value iter * Add value and serialization * Adjust order * Fix bug * Correct type * Fix generic bugs * Reformat code * Add generic to index writer which I forgot about * Fix missing generics on single segment writer * Add missing type export * Add default methods for convenience * Cleanup * Fix more-like-this query to use standard types * Update API and fix tests * Add doc traits * Add field value iter * Add value and serialization * Adjust order * Fix bug * Correct type * Rebase main and fix conflicts * Reformat code * Merge upstream * Fix missing generics on single segment writer * Add missing type export * Add default methods for convenience * Cleanup * Fix more-like-this query to use standard types * Update API and fix tests * Add tokenizer improvements from previous commits * Add tokenizer improvements from previous commits * Reformat * Fix unit tests * Fix unit tests * Use enum in changes * Stage changes * Add new deserializer logic * Add serializer integration * Add document deserializer * Implement new (de)serialization api for existing types * Fix bugs and type errors * Add helper implementations * Fix errors * Reformat code * Add unit tests and some code organisation for serialization * Add unit tests to deserializer * Add some small docs * Add support for deserializing serde values * Reformat * Fix typo * Fix typo * Change repr of facet * Remove unused trait methods * Add child value type * Resolve comments * Fix build * Fix more build errors * Fix more build errors * Fix the tests I missed * Fix examples * fix numerical order, serialize PreTok Str * fix coverage * rename Document to TantivyDocument, rename DocumentAccess to Document add Binary prefix to binary de/serialization * fix coverage --------- Co-authored-by: Pascal Seitz <pascal.seitz@gmail.com>	2023-10-02 10:01:16 +02:00
Adam Reichold	c805f08ca7	Fix a few more upcoming Clippy lints (#2133 )	2023-07-24 17:07:57 +09:00
Adam Reichold	820f126075	Remove support for Brotli and Snappy compression (#2123 ) LZ4 provides fast and simple compression whereas Zstd is exceptionally flexible so that the additional support for Brotli and Snappy does not really add any distinct functionality on top of those two algorithms. Removing them reduces our maintenance burden and reduces the number of choices users have to make when setting up their project based on Tantivy.	2023-07-14 16:54:59 +09:00
Adam Reichold	7e6c4a1856	Include only built-in compression algorithms as enum variants (#2121 ) * Include only built-in compression algorithms as enum variants This enables compile-time errors when a compression algorithm is requested which is not actually enabled for the current Cargo project. The cost is that indexes using other compression algorithms cannot even be loaded (even though they are not fully accessible in any case). As a drive-by, this also fixes `--no-default-features` on `cfg(unix)`. * Provide more instructive error messages for unsupported, but not unknown compression variants.	2023-07-14 11:02:49 +09:00
PSeitz	040554f2f9	Update to lz4_flex 0.11 (#2106 )	2023-06-29 14:16:00 +08:00
Yuri Astrakhan	74275b76a6	Inline format arguments where makes sense (#2038 ) Applied this command to the code, making it a bit shorter and slightly more readable. ``` cargo +nightly clippy --all-features --benches --tests --workspace --fix -- -A clippy::all -W clippy::uninlined_format_args cargo +nightly fmt --all ```	2023-05-10 18:03:59 +09:00
PSeitz	9e2faecf5b	add memory limit for aggregations (#1942 ) * add memory limit for aggregations introduce AggregationLimits to set memory consumption limit and bucket limits memory limit is checked during aggregation, bucket limit is checked before returning the aggregation request. * Apply suggestions from code review Co-authored-by: Paul Masurel <paul@quickwit.io> * add ByteCount with human readable format --------- Co-authored-by: Paul Masurel <paul@quickwit.io>	2023-03-16 06:21:07 +01:00
PSeitz	cbcafae04c	fix: doc store for files larger 4GB (#1856 ) Fixes an issue in the skip list deserialization, which deserialized the byte start offset incorrectly as u32. `get_doc` will fail for any docs that live in a block with start offset larger than u32::MAX (~4GB). Causes index corruption, if a segment with a doc store larger 4GB is merged. tantivy version 0.19 is affected	2023-02-10 14:29:43 +01:00
Paul Masurel	405e2cf4d9	Merge with main	2023-02-09 14:28:57 +01:00
Paul Masurel	bd5eea9852	Integrated columnar work.	2023-02-09 13:14:31 +01:00
PSeitz	0f20787917	fix doc store cache docs (#1821 ) * fix doc store cache docs addresses an issue reported in #1820 * rename doc_store_cache_size	2023-01-23 07:06:49 +01:00
Adam Reichold	82a183bc2d	Bump dependency on lru to from version 0.7.5 to version 0.9.0. (#1755 )	2023-01-10 13:35:37 +09:00
Paul Masurel	f39165e1e7	Moving FileSlice to tantivy-common (#1729 )	2022-12-21 16:35:11 +09:00
Paul Masurel	32cb1d22da	Removed AsyncIoResult. (#1728 )	2022-12-21 16:01:17 +09:00
PSeitz	f9171a3981	fix clippy (#1725 ) * fix clippy * fix clippy fastfield codecs * fix clippy bitpacker * fix clippy common * fix clippy stacker * fix clippy sstable * fmt	2022-12-20 07:30:06 +01:00
PSeitz	509a265659	add docstore version (#1652 ) * add docstore version closes #1589 * assert for docstore version	2022-11-04 10:19:16 +09:00
Pascal Seitz	a4485f7611	faster skipindex deserialization, larger blocksize on sort	2022-10-18 19:32:23 +08:00
Pascal Seitz	129f7422f5	remove unused buffer	2022-10-14 20:01:10 +08:00
PSeitz	8b69aab0fc	avoid prepare_doc allocation (#1610 ) avoid prepare_doc allocation, ~10% more thoughput best case	2022-10-11 14:15:55 +09:00
Bruce Mitchener	b3bf9a5716	Documentation improvements.	2022-10-05 14:18:10 +07:00
trinity-1686a	5945dbf0bd	change format for store to make it faster with small documents (#1569 ) * use new format for docstore blocks * move index to end of block it makes writing the block faster due to one less memcopy	2022-10-04 09:58:55 +02:00
PSeitz	fadd784a25	log improvements (#1564 )	2022-09-30 09:39:26 +09:00
Bruce Mitchener	cf02e32578	Improvements to doc linking, grammar, etc.	2022-09-19 18:10:22 +07:00
Bruce Mitchener	6a88ac3fe3	Documentation improvements. Fix some linking, some grammar, some typos, etc.	2022-09-18 18:05:37 +07:00
Paul Masurel	817225edfb	Allow for a same-thread doc compressor. (#1510 ) In addition, it isolates the doc compressor logic, better reports io::Result. In the case of the same-thread doc compressor, the blocks are also not copied.	2022-09-13 15:32:48 +09:00
Paul Masurel	8e775b6c3d	Refactoring dyn Column (#1502 )	2022-09-02 17:26:30 +09:00
Kian-Meng Ang	014b1adc3e	cargo +nightly fmt	2022-08-17 22:33:44 +08:00
Kian-Meng Ang	84295d5b35	cargo fmt	2022-08-15 21:07:01 +08:00
Kian-Meng Ang	625bcb4877	Fix typos and markdowns Found via these commands: codespell -L crate,ser,panting,beauti,hart,ue,atleast,childs,ond,pris,hel,mot markdownlint .md doc/src/.md --disable MD013 MD025 MD033 MD001 MD024 MD036 MD041 MD003	2022-08-13 18:25:47 +08:00
Pascal Seitz	5750224d4c	set docstore cache size at construction	2022-07-04 14:27:55 +08:00
Pascal Seitz	9db2f0e82b	expose doc store cache size expose lru doc store cache size optimize doc store cache size	2022-07-04 13:54:41 +08:00
PSeitz	9baefbe2ab	Update src/store/writer.rs Co-authored-by: Paul Masurel <paul@quickwit.io>	2022-06-23 15:34:21 +08:00
PSeitz	ad76d11008	Update src/store/writer.rs Co-authored-by: Paul Masurel <paul@quickwit.io>	2022-06-23 15:34:21 +08:00
PSeitz	c3220bece0	Update src/store/writer.rs Co-authored-by: Paul Masurel <paul@quickwit.io>	2022-06-23 15:34:21 +08:00
PSeitz	2b713f0977	Update src/store/writer.rs Co-authored-by: Paul Masurel <paul@quickwit.io>	2022-06-23 15:34:21 +08:00
Pascal Seitz	0bc6b4a117	renames and refactoring	2022-06-23 15:34:21 +08:00

1 2 3 4 5

203 Commits