tantivy

mirror of https://github.com/quickwit-oss/tantivy.git synced 2026-01-06 09:12:55 +00:00

Author	SHA1	Message	Date
Stéphane Campinas	41ea14840d	add benchmark of term streams merge (#1024 ) * add benchmark of term streams merge * use union based on FST for merging the term dictionaries * Rename TermMerger benchmark	2021-05-31 23:15:01 +09:00
PSeitz	dff0ffd38a	prepare for multiple fastfield codecs (#1063 ) * prepare for multiple fastfield codecs prepare for multiple fastfield codecs by wrapping the codecs in an enum #1042 * add FastFieldSerializer trait, add DynamicFastFieldSerializer add FastFieldSerializer trait add DynamicFastFieldSerializer enum to wrap all implementors of the FastFieldSerializer trait * add estimation for fastfield bitpacker	2021-05-31 23:14:14 +09:00
PSeitz	8d32c3ba3a	Change Footer version handling, Make compression dynamic (#1060 ) Change Footer version handling, Make compression dynamic Change Footer version handling Simplify version handling by switching to JSON instead of binary serialization. fixes #1058 Make compression dynamic Instead of choosing the compression during compile time via a feature flag, you can now have multiple compression algorithms enabled and decide during runtime which one to choose via IndexSettings. Changing the compression algorithm on an index is also supported. The information which algorithm was used in the doc store is stored in the DocStoreFooter. The default is the lz4 block format. fixes #904 Handle merging of different compressors Fix feature flag names Add doc store test for all compressors	2021-05-28 14:57:20 +09:00
Moriyoshi Koizumi	4afba005f9	Provide a means to deal with malformed facet text representation for the query parser (#1056 ) * Provide a means to deal with malformed facet text representation for the query parser. * Specific error enum for the facet parse error.	2021-05-27 12:16:49 +09:00
PSeitz	85fb0cc20a	cache field norm reader in merge (#1061 )	2021-05-25 21:48:02 +09:00
PSeitz	5ef2d56ec2	Avoid docstore stacking for small segments, fixes #1053 (#1055 )	2021-05-24 15:38:49 +09:00
Paul Masurel	fd8e5bdf57	Rename more like this	2021-05-21 16:32:39 +09:00
PSeitz	4f8481a1e4	Detect if segments are stackackable with sorting, fixes #1038 (#1054 ) * Detect if segments are stackackable with sorting, fixes #1038 Detect if segments are stackable when their data ranges on the sort property are disjunct. Presort segments by thei min value on merge, to enable easier stacking. * move code to function	2021-05-21 15:23:17 +09:00
PSeitz	bcd72e5c14	fix and refactor log merge policy, fixes #1035 (#1043 ) * fix and refactor log merge policy, fixes #1035 fixes a bug in log merge policy where an index was wrongly referenced by its index * cleanup * fix sort order, improve method names * use itertools groupby, fix serialization test * minor improvments * update names	2021-05-19 10:48:46 +09:00
PSeitz	249bc6cf72	upgrade lz4_flex to 0.8 (#1049 ) * upgrade lz4_flex to 0.8 * fix set_len	2021-05-19 10:46:01 +09:00
PSeitz	1c0af5765d	fix doc store iter error handling, fixes #1047 (#1051 )	2021-05-18 21:43:57 +09:00
Paul Masurel	7ba771ed1b	Replaced RawDocument by OwnedBytes (#1046 )	2021-05-18 14:33:36 +09:00
PSeitz	a4002622f8	add iterator over documents in docstore (#1044 ) * add iterator over documents in docstore When profiling, I saw that around 8% of the time in a merge was spent in look-ups into the skip index. Since the documents in the merge case are read continuously, we can replace the random access with an iterator over the documents. Merge Time on Sorted Index Before/After: 24s / 19s Merge Time on Unsorted Index Before/After: 15s / 13,5s So we can expect 10-20% faster merges. This iterator is also important if we add sorting based on a field in the documents. * Update reader.rs Co-authored-by: Paul Masurel <paul@quickwit.io>	2021-05-18 10:29:02 +09:00
Kornel	8e21087ad7	Don't use overly-minimal dependencies (#1037 )	2021-05-17 22:30:04 +09:00
PSeitz	d523543dc7	Sort Index/Docids By Field (#1026 ) * sort index by field add sort info to IndexSettings generate docid mapping for sorted field (only fastfield) remap singlevalue fastfield * support docid mapping in multivalue fastfield move docid mapping to serialization step (less intermediate data for mapping) add support for docid mapping in multivalue fastfield * handle docid map in bytes fastfield * forward docid mapping, remap postings * fix merge conflicts * move test to index_sorter * add docid index mapping old->new add docid mapping for both directions old->new (used in postings) and new->old (used in fast field) handle mapping in postings recorder warn instead of info for MAX_TOKEN_LEN * remap docid in fielnorm * resort docids in recorder, more extensive tests * handle index sorting in docstore handle index sort in docstore, by saving all the docs in a temp docstore file (SegmentComponent::TempStore). On serialization the docid mapping is used to create a docstore in the correct order by reader the old docstore. add docstore sort tests refactor tests * refactor rename docid doc_id rename docid_map doc_id_map rename DocidMapping DocIdMapping fix typo * u32 to DocId * better doc_id_map creation remove unstable sort * add non mut method to FastFieldWriters add _mut prefix to &mut methods * remove sort_index * fix clippy issues * fix SegmentComponent iterator use std::mem::replace * fix test * fmt * handle indexsettings deserialize * add reading, writing bytes to doc store get bytes of document in doc store add store_bytes method doc writer to accept serialized document add serialization index settings test * rename index_sorter to doc_id_mapping use bufferlender in recorder * fix compile issue, make sort_by_field optional * fix test compile * validate index settings on merge validate index settings on merge forward merge info to SegmentSerializer (for TempStore) * fix doctest * add itertools, use kmerge add itertools, use kmerge push because rustfmt fails * implement/test merge for fastfield implement/test merge for fastfield rename len to num_deleted in DeleteBitSet * Use precalculated docid mapping in merger Use precalculated docid mapping in merger for sorted indices instead of on the fly calculation Add index creation macro benchmark, but commented out for now, since it is not really usable due to long runtimes, and extreme fluctuations. May be better suited in criterion or an external bench bin * fix fast field reader docs fix fast field reader docs, Error instead of None returned add u64s_lenient to fastreader add create docid mapping benchmark * add test for multifast field merge refactor test add test for multifast field merge * add num_bytes to BytesFastFieldReader equivalent to num_vals in MultiValuedFastFieldReader * add MultiValueLength trait add MultiValueLength trait in order to unify index creation for BytesFastFieldReader and MultiValuedFastFieldReader in merger * Add ReaderWithOrdinal, fix Add ReaderWithOrdinal to associate data to a reader in merger Fix bytes offset index creation in merger * add test for merging bytes with sorted docids * Merge fieldnorm for sorted index * handle posting list in merge in sorted index handle posting list in merge in sorted index by using doc id mapping for sorting reuse SegmentOrdinal type * handle doc store order in merge in sorted index * fix typo, cleanup * make IndexSetting non-optional * fix type, rename test file fix type rename test file add type * remove SegmentReaderWithOrdinal accessors * cargo fmt * add index sort & merge test to include deletes * Fix posting list merge issue Fix posting list merge issue - ensure serializer always gets monotonically increasing doc ids handle sorting and merging for facets field * performance: cache field readers, use bytes for doc store merge * change facet merge test to cover index sorting * add RawDocument abstraction to access bytes in doc store * fix deserialization, update changelog fix deserialization update changelog forward error on merge failed * cache store readers to utilize lru cache (4x performance) cache store readers, to utilize lru cache (4x faster performance, due to less decompress calls on the block) * add include_temp_doc_store flag in InnerSegmentMeta unset flag on deserialization and after finalize of a segment set flag when creating new instances	2021-05-17 22:20:57 +09:00
Abderrahmen Hanafi	6ca27b6dd4	link collector header in introduction section (#1036 )	2021-05-17 22:15:48 +09:00
Evance Soumaoro	8d51e9cc91	Capping IndexWriter Num thread (#1033 ) * capping num threads of index writter to MAX_NUM_THREAD = 8 * fixed formating * run ci * fix bug from max to min	2021-05-06 20:44:39 +09:00
Paul Masurel	2aced2d958	Merge pull request #1028 from tantivy-search/issue-more-like-this-query Support MoreLikeThisQuery	2021-05-04 22:15:43 +09:00
Paul Masurel	3fcba00a1f	Merge pull request #1029 from tantivy-search/dependabot/add-v2-config-file Upgrade to GitHub-native Dependabot	2021-05-03 21:11:06 +09:00
Evance Souamoro	372d12766a	fix cargo fmt	2021-05-03 10:26:56 +00:00
Evance Soumaoro	dfed8896b9	Merge branch 'main' into issue-more-like-this-query	2021-05-03 10:08:38 +00:00
Evance Souamoro	d71aa57077	reusing idf from bm25 module as it was the same logic	2021-05-03 10:05:40 +00:00
Paul Masurel	3e85fe57ac	Merge pull request #1031 from PSeitz/bitpack_writer upate CHANGELOG	2021-05-03 16:29:19 +09:00
Pascal Seitz	537021e12d	upate CHANGELOG	2021-05-03 09:09:42 +02:00
Paul Masurel	ec4834cd73	Merge pull request #1030 from PSeitz/bitpack_writer add BlockedBitpacker	2021-05-03 14:19:17 +09:00
Evance Souamoro	712c01aa93	fixed term sorting & moved it to a better place	2021-05-01 05:40:59 +00:00
Evance Souamoro	cde324d4b4	fixed issues based on comment, still need to check BM25 suggestion	2021-04-30 21:14:19 +00:00
Pascal Seitz	478571ebb4	move minmax to bitpacker move minmax to bitpacker use minmax in blocked bitpacker	2021-04-30 17:07:30 +02:00
Pascal Seitz	fde9d27482	refactor	2021-04-30 16:29:02 +02:00
Pascal Seitz	f38daab7f7	add base value to blocked bitpacker	2021-04-30 14:47:58 +02:00
Pascal Seitz	25b9429929	calc mem_usage of more structs calc mem_usage of more structs in index creation add some comments	2021-04-30 14:16:39 +02:00
Pascal Seitz	83cf638a2e	use 64bit encoded metadata fix memory_usage calculation	2021-04-30 07:23:44 +02:00
Pascal Seitz	a04e0bdaf1	use flushfree blocked bitpacker (10% slower)	2021-04-29 19:57:17 +02:00
Pascal Seitz	c200d59d1e	add blocked bitpacker, add benches	2021-04-29 19:53:54 +02:00
dependabot-preview[bot]	bbeac5888c	Upgrade to GitHub-native Dependabot	2021-04-29 15:02:36 +00:00
Pascal Seitz	daa53522b5	move tantivy bitpacker to crate, refator bitpacker remove byteorder dependency	2021-04-29 16:40:11 +02:00
Evance Souamoro	2c0f6e3319	add builder to the public for documentation	2021-04-29 12:38:16 +00:00
Evance Souamoro	27f587aa13	applied cargo fmt	2021-04-29 12:15:34 +00:00
Evance Souamoro	cfc27c9665	add support for more like this query	2021-04-29 11:49:27 +00:00
Paul Masurel	88a1a90c3c	Merge pull request #1025 from tamuhey/patch-1 Typo in readme README.md	2021-04-28 15:31:53 +09:00
Yohei Tamura	6d8581baae	Update README.md typo	2021-04-28 15:10:59 +09:00
Paul Masurel	2b4b16ae90	Merge pull request #1021 from PSeitz/indexmeta add Index::builder, add index_settings to IndexMeta	2021-04-27 16:13:48 +09:00
Paul Masurel	075c23eb8c	Disabling fetching fieldnorm in phrasequery if scoring is disabled.	2021-04-27 14:06:41 +09:00
Pascal Seitz	cbf805c3e6	fix build, skip serialize None	2021-04-26 13:30:34 +02:00
Pascal Seitz	46beb2a989	index_settings should be optional	2021-04-26 11:34:19 +02:00
Pascal Seitz	c01c175744	rename fix	2021-04-26 09:45:12 +02:00
Paul Masurel	eca496ee24	Merge branch 'main' into indexmeta	2021-04-26 14:34:58 +09:00
Paul Masurel	083bb3ec3f	Merge pull request #1023 from tantivy-search/issue/simpler-positions Issue/simpler positions Closes #1022	2021-04-26 14:02:11 +09:00
Paul Masurel	2dc5403e7b	Closes #1022	2021-04-26 14:01:14 +09:00
Paul Masurel	aead5d4068	First stab	2021-04-26 12:46:06 +09:00

1 2 3 4 5 ...

1903 Commits