tantivy

mirror of https://github.com/quickwit-oss/tantivy.git synced 2026-05-31 23:50:41 +00:00

Author	SHA1	Message	Date
Paul Masurel	eca6628b3c	Minor refactoring (#1266 )	2022-01-28 15:55:55 +09:00
PSeitz	352e0cc58d	Adde demux operation (#1150 ) * add merge for DeleteBitSet, allow custom DeleteBitSet on merge * forward delete bitsets on merge, add tests * add demux operation and tests	2021-10-06 16:05:16 +09:00
Pascal Seitz	d7a6a409a1	renames	2021-09-23 20:33:11 +08:00
Pascal Seitz	a1f5cead96	AliveBitSet instead of DeleteBitSet	2021-09-23 20:03:57 +08:00
Pascal Seitz	3265f7bec3	dissolve common module	2021-08-19 23:26:34 +01:00
Pascal Seitz	10f056fbb4	apply clippy fixes	2021-07-01 17:08:44 +02:00
Andre-Philippe Paquet	57ae5b27dc	fix store reader iterator, take 2	2021-06-16 07:51:39 -04:00
Andre-Philippe Paquet	473a346814	remove debugging	2021-06-13 16:49:44 -04:00
Andre-Philippe Paquet	511dc8f87f	fix store reader iterator	2021-06-13 16:00:13 -04:00
PSeitz	8d32c3ba3a	Change Footer version handling, Make compression dynamic (#1060 ) Change Footer version handling, Make compression dynamic Change Footer version handling Simplify version handling by switching to JSON instead of binary serialization. fixes #1058 Make compression dynamic Instead of choosing the compression during compile time via a feature flag, you can now have multiple compression algorithms enabled and decide during runtime which one to choose via IndexSettings. Changing the compression algorithm on an index is also supported. The information which algorithm was used in the doc store is stored in the DocStoreFooter. The default is the lz4 block format. fixes #904 Handle merging of different compressors Fix feature flag names Add doc store test for all compressors	2021-05-28 14:57:20 +09:00
PSeitz	249bc6cf72	upgrade lz4_flex to 0.8 (#1049 ) * upgrade lz4_flex to 0.8 * fix set_len	2021-05-19 10:46:01 +09:00
PSeitz	1c0af5765d	fix doc store iter error handling, fixes #1047 (#1051 )	2021-05-18 21:43:57 +09:00
Paul Masurel	7ba771ed1b	Replaced RawDocument by OwnedBytes (#1046 )	2021-05-18 14:33:36 +09:00
PSeitz	a4002622f8	add iterator over documents in docstore (#1044 ) * add iterator over documents in docstore When profiling, I saw that around 8% of the time in a merge was spent in look-ups into the skip index. Since the documents in the merge case are read continuously, we can replace the random access with an iterator over the documents. Merge Time on Sorted Index Before/After: 24s / 19s Merge Time on Unsorted Index Before/After: 15s / 13,5s So we can expect 10-20% faster merges. This iterator is also important if we add sorting based on a field in the documents. * Update reader.rs Co-authored-by: Paul Masurel <paul@quickwit.io>	2021-05-18 10:29:02 +09:00
PSeitz	d523543dc7	Sort Index/Docids By Field (#1026 ) * sort index by field add sort info to IndexSettings generate docid mapping for sorted field (only fastfield) remap singlevalue fastfield * support docid mapping in multivalue fastfield move docid mapping to serialization step (less intermediate data for mapping) add support for docid mapping in multivalue fastfield * handle docid map in bytes fastfield * forward docid mapping, remap postings * fix merge conflicts * move test to index_sorter * add docid index mapping old->new add docid mapping for both directions old->new (used in postings) and new->old (used in fast field) handle mapping in postings recorder warn instead of info for MAX_TOKEN_LEN * remap docid in fielnorm * resort docids in recorder, more extensive tests * handle index sorting in docstore handle index sort in docstore, by saving all the docs in a temp docstore file (SegmentComponent::TempStore). On serialization the docid mapping is used to create a docstore in the correct order by reader the old docstore. add docstore sort tests refactor tests * refactor rename docid doc_id rename docid_map doc_id_map rename DocidMapping DocIdMapping fix typo * u32 to DocId * better doc_id_map creation remove unstable sort * add non mut method to FastFieldWriters add _mut prefix to &mut methods * remove sort_index * fix clippy issues * fix SegmentComponent iterator use std::mem::replace * fix test * fmt * handle indexsettings deserialize * add reading, writing bytes to doc store get bytes of document in doc store add store_bytes method doc writer to accept serialized document add serialization index settings test * rename index_sorter to doc_id_mapping use bufferlender in recorder * fix compile issue, make sort_by_field optional * fix test compile * validate index settings on merge validate index settings on merge forward merge info to SegmentSerializer (for TempStore) * fix doctest * add itertools, use kmerge add itertools, use kmerge push because rustfmt fails * implement/test merge for fastfield implement/test merge for fastfield rename len to num_deleted in DeleteBitSet * Use precalculated docid mapping in merger Use precalculated docid mapping in merger for sorted indices instead of on the fly calculation Add index creation macro benchmark, but commented out for now, since it is not really usable due to long runtimes, and extreme fluctuations. May be better suited in criterion or an external bench bin * fix fast field reader docs fix fast field reader docs, Error instead of None returned add u64s_lenient to fastreader add create docid mapping benchmark * add test for multifast field merge refactor test add test for multifast field merge * add num_bytes to BytesFastFieldReader equivalent to num_vals in MultiValuedFastFieldReader * add MultiValueLength trait add MultiValueLength trait in order to unify index creation for BytesFastFieldReader and MultiValuedFastFieldReader in merger * Add ReaderWithOrdinal, fix Add ReaderWithOrdinal to associate data to a reader in merger Fix bytes offset index creation in merger * add test for merging bytes with sorted docids * Merge fieldnorm for sorted index * handle posting list in merge in sorted index handle posting list in merge in sorted index by using doc id mapping for sorting reuse SegmentOrdinal type * handle doc store order in merge in sorted index * fix typo, cleanup * make IndexSetting non-optional * fix type, rename test file fix type rename test file add type * remove SegmentReaderWithOrdinal accessors * cargo fmt * add index sort & merge test to include deletes * Fix posting list merge issue Fix posting list merge issue - ensure serializer always gets monotonically increasing doc ids handle sorting and merging for facets field * performance: cache field readers, use bytes for doc store merge * change facet merge test to cover index sorting * add RawDocument abstraction to access bytes in doc store * fix deserialization, update changelog fix deserialization update changelog forward error on merge failed * cache store readers to utilize lru cache (4x performance) cache store readers, to utilize lru cache (4x faster performance, due to less decompress calls on the block) * add include_temp_doc_store flag in InnerSegmentMeta unset flag on deserialization and after finalize of a segment set flag when creating new instances	2021-05-17 22:20:57 +09:00
Paul Masurel	39dd8cfe24	Cargo clippy. Acronym should not be full uppercase apparently.	2021-04-26 11:49:18 +09:00
Pascal Seitz	a00049b879	add lz4 block format compressor as default docstore compressor add lz4 block compressor using lz4_flex, add lz4-block-compression feature flag add snappy-compression feature flag for snap compressor, make snap crate optional set lz4-block-compression as default feature flag	2021-04-16 15:24:35 +02:00
Paul Masurel	31137beea6	Replacing (start, end) by Range	2021-03-10 14:06:21 +09:00
Paul Masurel	aa9e79f957	Clippy warnings.	2021-01-21 18:23:20 +09:00
Paul Masurel	8ca0954b3b	Added a functional long running test to test store merging.	2021-01-07 14:07:15 +09:00
Paul Masurel	7f0e61b173	Refactoring of the skip index. The skip index now identifies both the start and the end offset of blocks. Checkpoints are compressed in blocks, reaching better compression.	2020-11-17 16:05:11 +09:00
Adrien Guillo	9ab25d2575	Cache store reader blocks in an LRU fashion	2020-11-16 19:09:10 -08:00
Pasha Podolsky	80cbe889ba	[tantivy] Add brotli codec for row storage (#885 ) * [tantivy] Add brotli codec for row storage * [tantivy] Fix not actual comments for code * [CR] Fixes for comment and cursor	2020-10-09 14:51:42 +09:00
Paul Masurel	c23a03ad81	Large API Change in the Directory API. (#901 ) Tantivy used to assume that all files could be somehow memory mapped. After this change, Directory return a `FileSlice` that can be reduced and eventually read into an `OwnedBytes` object. Long and blocking io operation are still required by they do not span over the entire file.	2020-10-08 16:36:51 +09:00
Paul Masurel	ae14022bf0	Removed `use::Result`. (#771 )	2020-01-31 18:47:02 +09:00
Paul Masurel	498057c5b7	Refactor deletes (#597 ) * Refactor deletes * Removing generation from SegmentUpdater. These have been obsolete for a long time * Number literal clippy * Removed clippy useless allow statement	2019-07-17 13:06:44 +09:00
Paul Masurel	462774b15c	Tiqb feature/2018 (#583 ) * rust 2018 * Added CHANGELOG comment	2019-07-01 10:01:46 +09:00
Paul Masurel	279a9eb5e3	Closes #449 (#450 ) Clippy working on stable. Clippy warnings addressed	2018-12-10 12:20:59 +09:00
Jason Wolfe	0098e3d428	Compute space usage of a Searcher / SegmentReader / CompositeFile (#282 ) * Compute space usage of a Searcher / SegmentReader / CompositeFile * Fix typo * Add serde Serialize/Deserialize for all the SpaceUsage structs * Fix indexing * Public methods for consuming space usage information * #281: Add a space usage method that takes a SegmentComponent to support code that is unaware of particular segment components, and to make it more likely to update methods when a new component type is added. * Add support for space usage computation of positions skip index file (#281) * Add some tests for space usage computation (#281)	2018-10-15 09:04:36 +09:00
Paul Masurel	10f6c07c53	Clippy (#422 ) * Cargo Format * Clippy	2018-09-15 20:20:22 +09:00
Paul Masurel	06e7bd18e7	Clippy (#421 ) * Cargo Format * Clippy * bugfix * still clippy stuff * clippy step 2	2018-09-15 14:56:14 +09:00
Paul Masurel	8ccbfdea5d	Preparing for release	2018-06-22 14:27:46 +09:00
Paul Masurel	0465876854	Issue/257 (#310 ) * Replaced lz4 by a pure rust implementation of snappy. Closes #257 * snappy is the default compression. One can use lz4 by enabling the lz4 feature flag. * Removed Compression trait	2018-06-12 19:02:57 +09:00
Paul Masurel	b59132966f	Better heap (#311 ) * Changed the heap to a paged memory arena. * Trying to simplify the indexing term hashmap * Exploding datastruct * Removed some complexity in bitpacker	2018-06-04 09:39:18 +09:00
Paul Masurel	78673172d0	Cargo fmt	2018-04-21 20:05:36 +09:00
Paul Masurel	0540ebb49e	Cargo clippy	2018-02-19 12:36:24 +09:00
Paul Masurel	9370427ae2	Terminfo blocks (#244 ) * Using u64 key in the store * Using Option<> for the next element, as opposed to u64 * Code simplification. * Added TermInfoStoreWriter. * Added a TermInfoStore * Added FixedSized for BinarySerialized.	2018-02-12 10:24:58 +09:00
Paul Masurel	df53dc4ceb	Format	2018-02-03 00:21:05 +09:00
Paul Masurel	930010aa88	Unit test passing	2018-01-28 00:03:51 +09:00
Paul Masurel	7f5b07d4e7	Fixing unit tests	2018-01-25 14:55:29 +09:00
Paul Masurel	3edb3dce6a	Test not passing	2018-01-25 12:46:32 +09:00
Paul Masurel	cb11b92505	Added comments	2018-01-04 12:27:14 +09:00
Paul Masurel	db7d784573	Issue 227 Faster merge when there are no deletes	2017-12-21 22:04:05 +09:00
Paul Masurel	79132e803a	NOBUG Switched to 64 bits addr	2017-12-21 11:06:46 +09:00
Paul Masurel	1e55189db1	NOBUG rustfmt	2017-12-14 19:30:31 +09:00
Paul Masurel	f8710bd4b0	Format	2017-08-28 18:22:41 +09:00
Paul Masurel	ac0b1a21eb	Term as a wrapper Small changes Plastic	2017-05-25 23:49:54 +09:00
Paul Masurel	6bbc789d84	Fmt fix	2017-05-25 23:49:54 +09:00
Paul Masurel	87152daef3	issue/174 Added doc, and made field private	2017-05-25 23:49:54 +09:00
Paul Masurel	57a5547ae8	Comments and cleaning up API	2017-05-19 11:20:27 +09:00

1 2

66 Commits