tantivy

mirror of https://github.com/quickwit-oss/tantivy.git synced 2026-01-13 12:32:55 +00:00

Author	SHA1	Message	Date
PSeitz	182f58cea6	remove Document: DocumentDeserialize dependency (#2211 ) * remove Document: DocumentDeserialize dependency The dependency requires users to implement an API they may not use. * remove unnecessary Document bounds	2023-10-13 07:59:54 +02:00
PSeitz	03a1f40767	rename DocValue to Value (#2197 ) rename DocValue to Value to avoid confusion with lucene DocValues rename Value to OwnedValue	2023-10-02 17:03:00 +02:00
Harrison Burt	1c7c6fd591	POC: Tantivy documents as a trait (#2071 ) * fix windows build (#1) * Fix windows build * Add doc traits * Add field value iter * Add value and serialization * Adjust order * Fix bug * Correct type * Fix generic bugs * Reformat code * Add generic to index writer which I forgot about * Fix missing generics on single segment writer * Add missing type export * Add default methods for convenience * Cleanup * Fix more-like-this query to use standard types * Update API and fix tests * Add doc traits * Add field value iter * Add value and serialization * Adjust order * Fix bug * Correct type * Rebase main and fix conflicts * Reformat code * Merge upstream * Fix missing generics on single segment writer * Add missing type export * Add default methods for convenience * Cleanup * Fix more-like-this query to use standard types * Update API and fix tests * Add tokenizer improvements from previous commits * Add tokenizer improvements from previous commits * Reformat * Fix unit tests * Fix unit tests * Use enum in changes * Stage changes * Add new deserializer logic * Add serializer integration * Add document deserializer * Implement new (de)serialization api for existing types * Fix bugs and type errors * Add helper implementations * Fix errors * Reformat code * Add unit tests and some code organisation for serialization * Add unit tests to deserializer * Add some small docs * Add support for deserializing serde values * Reformat * Fix typo * Fix typo * Change repr of facet * Remove unused trait methods * Add child value type * Resolve comments * Fix build * Fix more build errors * Fix more build errors * Fix the tests I missed * Fix examples * fix numerical order, serialize PreTok Str * fix coverage * rename Document to TantivyDocument, rename DocumentAccess to Document add Binary prefix to binary de/serialization * fix coverage --------- Co-authored-by: Pascal Seitz <pascal.seitz@gmail.com>	2023-10-02 10:01:16 +02:00
Adam Reichold	c805f08ca7	Fix a few more upcoming Clippy lints (#2133 )	2023-07-24 17:07:57 +09:00
Adam Reichold	820f126075	Remove support for Brotli and Snappy compression (#2123 ) LZ4 provides fast and simple compression whereas Zstd is exceptionally flexible so that the additional support for Brotli and Snappy does not really add any distinct functionality on top of those two algorithms. Removing them reduces our maintenance burden and reduces the number of choices users have to make when setting up their project based on Tantivy.	2023-07-14 16:54:59 +09:00
Adam Reichold	7e6c4a1856	Include only built-in compression algorithms as enum variants (#2121 ) * Include only built-in compression algorithms as enum variants This enables compile-time errors when a compression algorithm is requested which is not actually enabled for the current Cargo project. The cost is that indexes using other compression algorithms cannot even be loaded (even though they are not fully accessible in any case). As a drive-by, this also fixes `--no-default-features` on `cfg(unix)`. * Provide more instructive error messages for unsupported, but not unknown compression variants.	2023-07-14 11:02:49 +09:00
PSeitz	040554f2f9	Update to lz4_flex 0.11 (#2106 )	2023-06-29 14:16:00 +08:00
Yuri Astrakhan	74275b76a6	Inline format arguments where makes sense (#2038 ) Applied this command to the code, making it a bit shorter and slightly more readable. ``` cargo +nightly clippy --all-features --benches --tests --workspace --fix -- -A clippy::all -W clippy::uninlined_format_args cargo +nightly fmt --all ```	2023-05-10 18:03:59 +09:00
PSeitz	9e2faecf5b	add memory limit for aggregations (#1942 ) * add memory limit for aggregations introduce AggregationLimits to set memory consumption limit and bucket limits memory limit is checked during aggregation, bucket limit is checked before returning the aggregation request. * Apply suggestions from code review Co-authored-by: Paul Masurel <paul@quickwit.io> * add ByteCount with human readable format --------- Co-authored-by: Paul Masurel <paul@quickwit.io>	2023-03-16 06:21:07 +01:00
PSeitz	cbcafae04c	fix: doc store for files larger 4GB (#1856 ) Fixes an issue in the skip list deserialization, which deserialized the byte start offset incorrectly as u32. `get_doc` will fail for any docs that live in a block with start offset larger than u32::MAX (~4GB). Causes index corruption, if a segment with a doc store larger 4GB is merged. tantivy version 0.19 is affected	2023-02-10 14:29:43 +01:00
Paul Masurel	405e2cf4d9	Merge with main	2023-02-09 14:28:57 +01:00
Paul Masurel	bd5eea9852	Integrated columnar work.	2023-02-09 13:14:31 +01:00
PSeitz	0f20787917	fix doc store cache docs (#1821 ) * fix doc store cache docs addresses an issue reported in #1820 * rename doc_store_cache_size	2023-01-23 07:06:49 +01:00
Adam Reichold	82a183bc2d	Bump dependency on lru to from version 0.7.5 to version 0.9.0. (#1755 )	2023-01-10 13:35:37 +09:00
Paul Masurel	f39165e1e7	Moving FileSlice to tantivy-common (#1729 )	2022-12-21 16:35:11 +09:00
Paul Masurel	32cb1d22da	Removed AsyncIoResult. (#1728 )	2022-12-21 16:01:17 +09:00
PSeitz	f9171a3981	fix clippy (#1725 ) * fix clippy * fix clippy fastfield codecs * fix clippy bitpacker * fix clippy common * fix clippy stacker * fix clippy sstable * fmt	2022-12-20 07:30:06 +01:00
PSeitz	509a265659	add docstore version (#1652 ) * add docstore version closes #1589 * assert for docstore version	2022-11-04 10:19:16 +09:00
Pascal Seitz	a4485f7611	faster skipindex deserialization, larger blocksize on sort	2022-10-18 19:32:23 +08:00
Pascal Seitz	129f7422f5	remove unused buffer	2022-10-14 20:01:10 +08:00
PSeitz	8b69aab0fc	avoid prepare_doc allocation (#1610 ) avoid prepare_doc allocation, ~10% more thoughput best case	2022-10-11 14:15:55 +09:00
Bruce Mitchener	b3bf9a5716	Documentation improvements.	2022-10-05 14:18:10 +07:00
trinity-1686a	5945dbf0bd	change format for store to make it faster with small documents (#1569 ) * use new format for docstore blocks * move index to end of block it makes writing the block faster due to one less memcopy	2022-10-04 09:58:55 +02:00
PSeitz	fadd784a25	log improvements (#1564 )	2022-09-30 09:39:26 +09:00
Bruce Mitchener	cf02e32578	Improvements to doc linking, grammar, etc.	2022-09-19 18:10:22 +07:00
Bruce Mitchener	6a88ac3fe3	Documentation improvements. Fix some linking, some grammar, some typos, etc.	2022-09-18 18:05:37 +07:00
Paul Masurel	817225edfb	Allow for a same-thread doc compressor. (#1510 ) In addition, it isolates the doc compressor logic, better reports io::Result. In the case of the same-thread doc compressor, the blocks are also not copied.	2022-09-13 15:32:48 +09:00
Paul Masurel	8e775b6c3d	Refactoring dyn Column (#1502 )	2022-09-02 17:26:30 +09:00
Kian-Meng Ang	014b1adc3e	cargo +nightly fmt	2022-08-17 22:33:44 +08:00
Kian-Meng Ang	84295d5b35	cargo fmt	2022-08-15 21:07:01 +08:00
Kian-Meng Ang	625bcb4877	Fix typos and markdowns Found via these commands: codespell -L crate,ser,panting,beauti,hart,ue,atleast,childs,ond,pris,hel,mot markdownlint .md doc/src/.md --disable MD013 MD025 MD033 MD001 MD024 MD036 MD041 MD003	2022-08-13 18:25:47 +08:00
Pascal Seitz	5750224d4c	set docstore cache size at construction	2022-07-04 14:27:55 +08:00
Pascal Seitz	9db2f0e82b	expose doc store cache size expose lru doc store cache size optimize doc store cache size	2022-07-04 13:54:41 +08:00
PSeitz	9baefbe2ab	Update src/store/writer.rs Co-authored-by: Paul Masurel <paul@quickwit.io>	2022-06-23 15:34:21 +08:00
PSeitz	ad76d11008	Update src/store/writer.rs Co-authored-by: Paul Masurel <paul@quickwit.io>	2022-06-23 15:34:21 +08:00
PSeitz	c3220bece0	Update src/store/writer.rs Co-authored-by: Paul Masurel <paul@quickwit.io>	2022-06-23 15:34:21 +08:00
PSeitz	2b713f0977	Update src/store/writer.rs Co-authored-by: Paul Masurel <paul@quickwit.io>	2022-06-23 15:34:21 +08:00
Pascal Seitz	0bc6b4a117	renames and refactoring	2022-06-23 15:34:21 +08:00
PSeitz	79e42d4a6d	Update src/store/writer.rs Co-authored-by: Paul Masurel <paul@quickwit.io>	2022-06-23 15:34:21 +08:00
PSeitz	0135fbc4c8	Update src/store/writer.rs Co-authored-by: Paul Masurel <paul@quickwit.io>	2022-06-23 15:34:21 +08:00
PSeitz	449594f67a	Update src/store/writer.rs Co-authored-by: Paul Masurel <paul@quickwit.io>	2022-06-23 15:34:21 +08:00
Pascal Seitz	8b6647e908	move writer to compressor thread	2022-06-23 15:34:21 +08:00
PSeitz	efabcbcdf5	Update src/store/writer.rs Co-authored-by: Paul Masurel <paul@quickwit.io>	2022-06-23 15:34:21 +08:00
Pascal Seitz	7bf5962554	merge match, explicit type	2022-06-23 15:34:21 +08:00
Pascal Seitz	4c7dedef29	use seperate thread to compress block store Use seperate thread to compress block store for increased indexing performance. This allows to use slower compressors with higher compression ratio, with less or no perfomance impact (with enough cores). A seperate thread is spawned to compress the docstore, which handles single blocks and stacking from other docstores. The spawned compressor thread does not write, instead it sends back the compressed data. This is done in order to avoid writing multithreaded on the same file.	2022-06-23 15:34:21 +08:00
Antoine G	11e4225f23	doc fix (#1391 ) Documentation fix.	2022-06-21 15:53:33 +09:00
Pascal Seitz	4d9d2b6db0	split into compressor/decompressor use custom de/serializer for compressor accept parameters like zstd(compression_level=5) as compressor	2022-06-02 23:29:24 +08:00
Pascal Seitz	ed868f93a3	enable setting compression level	2022-06-02 16:47:29 +08:00
Pascal Seitz	314ae43a45	fix fmt	2022-06-02 14:54:23 +08:00
Pascal Seitz	fce91b2f3a	vec without capacity	2022-06-02 13:50:18 +08:00

1 2 3 4

191 Commits