tantivy

mirror of https://github.com/quickwit-oss/tantivy.git synced 2026-05-27 21:50:41 +00:00

Author	SHA1	Message	Date
Pascal Seitz	fb9f03118d	switch total_num_val to u32	2022-11-11 17:35:52 +08:00
Paul Masurel	3edf0a2724	Using the manual reload policy in IndexWriter. (#1667 )	2022-11-09 11:20:41 +01:00
Pascal Seitz	83325d8f3f	move multivalue index to own file start_doc parameter in positions to docids	2022-11-01 10:36:13 +08:00
Pascal Seitz	e772d3170d	switch get_val() to u32 Fixes #1638	2022-10-24 19:05:57 +08:00
Pascal Seitz	9cb8cfbea8	return Error instead panic in fastfields fixes #1572	2022-10-11 14:15:22 +08:00
Bruce Mitchener	cb252a42af	docs: "associated to" -> "associated with" (#1557 ) This reads better this way.	2022-09-26 20:23:37 +09:00
Pascal Seitz	f757471077	prepare for ip field	2022-09-26 16:27:35 +08:00
Bruce Mitchener	6a88ac3fe3	Documentation improvements. Fix some linking, some grammar, some typos, etc.	2022-09-18 18:05:37 +07:00
Pascal Seitz	edd9155b88	return `Write`, add documentation	2022-09-08 12:41:55 +08:00
Pascal Seitz	29d56111de	refactor, fix api refactor fix clippy fix docs remove unused code fix bytesfield index api flaw	2022-09-07 18:43:04 +08:00
Paul Masurel	c632fc014e	Refactoring fast fields codecs. This removes the GCD part as a codec, and makes it so that fastfield codecs all share the same normalization part (shift + gcd).	2022-09-05 23:07:12 +09:00
Paul Masurel	8e775b6c3d	Refactoring dyn Column (#1502 )	2022-09-02 17:26:30 +09:00
Paul Masurel	5331be800b	Introducing a column trait	2022-08-28 14:14:27 +02:00
PSeitz	23fe73a6c0	remove searcher pool and make Searcher cloneable (#1411 ) * remove searcher pool and make Searcher cloneable closes #1410 * use SearcherInner in InnerIndexReader	2022-07-12 18:07:48 +09:00
Saroh	cae34ffe47	update fastfield doc	2022-03-02 16:04:15 +01:00
PSeitz	972cb6c26d	Aggregation (#1276 ) Added support for aggregation compatible with Elasticsearch's API.	2022-02-21 09:59:11 +09:00
PSeitz	cef145790c	Fix opening bytes index with dynamic codec (#1279 ) * Fix opening bytes index with dynamic codec Fix #1278 * extend proptest to cover bytes field codec bug	2022-02-18 20:44:21 +09:00
Paul Masurel	eca6628b3c	Minor refactoring (#1266 )	2022-01-28 15:55:55 +09:00
Paul Masurel	7234bef0eb	Issue/1198 (#1201 ) * Unit test reproducing #1198 * Fixing unit test to handle the error from add_document. * Bump project version	2021-11-11 16:42:19 +09:00
Paul Masurel	de92f094aa	Closes #1101 fix delete documents with sort by field Closes #1101 * fix delete documents with sort by field Co-authored-by: Andre-Philippe Paquet <appaquet@gmail.com>	2021-06-30 15:51:32 +09:00
Pascal Seitz	b999e836b2	replace BitpackedFastFieldReader, delete FastFieldSerializer trait	2021-06-14 13:56:40 +02:00
PSeitz	dff0ffd38a	prepare for multiple fastfield codecs (#1063 ) * prepare for multiple fastfield codecs prepare for multiple fastfield codecs by wrapping the codecs in an enum #1042 * add FastFieldSerializer trait, add DynamicFastFieldSerializer add FastFieldSerializer trait add DynamicFastFieldSerializer enum to wrap all implementors of the FastFieldSerializer trait * add estimation for fastfield bitpacker	2021-05-31 23:14:14 +09:00
PSeitz	d523543dc7	Sort Index/Docids By Field (#1026 ) * sort index by field add sort info to IndexSettings generate docid mapping for sorted field (only fastfield) remap singlevalue fastfield * support docid mapping in multivalue fastfield move docid mapping to serialization step (less intermediate data for mapping) add support for docid mapping in multivalue fastfield * handle docid map in bytes fastfield * forward docid mapping, remap postings * fix merge conflicts * move test to index_sorter * add docid index mapping old->new add docid mapping for both directions old->new (used in postings) and new->old (used in fast field) handle mapping in postings recorder warn instead of info for MAX_TOKEN_LEN * remap docid in fielnorm * resort docids in recorder, more extensive tests * handle index sorting in docstore handle index sort in docstore, by saving all the docs in a temp docstore file (SegmentComponent::TempStore). On serialization the docid mapping is used to create a docstore in the correct order by reader the old docstore. add docstore sort tests refactor tests * refactor rename docid doc_id rename docid_map doc_id_map rename DocidMapping DocIdMapping fix typo * u32 to DocId * better doc_id_map creation remove unstable sort * add non mut method to FastFieldWriters add _mut prefix to &mut methods * remove sort_index * fix clippy issues * fix SegmentComponent iterator use std::mem::replace * fix test * fmt * handle indexsettings deserialize * add reading, writing bytes to doc store get bytes of document in doc store add store_bytes method doc writer to accept serialized document add serialization index settings test * rename index_sorter to doc_id_mapping use bufferlender in recorder * fix compile issue, make sort_by_field optional * fix test compile * validate index settings on merge validate index settings on merge forward merge info to SegmentSerializer (for TempStore) * fix doctest * add itertools, use kmerge add itertools, use kmerge push because rustfmt fails * implement/test merge for fastfield implement/test merge for fastfield rename len to num_deleted in DeleteBitSet * Use precalculated docid mapping in merger Use precalculated docid mapping in merger for sorted indices instead of on the fly calculation Add index creation macro benchmark, but commented out for now, since it is not really usable due to long runtimes, and extreme fluctuations. May be better suited in criterion or an external bench bin * fix fast field reader docs fix fast field reader docs, Error instead of None returned add u64s_lenient to fastreader add create docid mapping benchmark * add test for multifast field merge refactor test add test for multifast field merge * add num_bytes to BytesFastFieldReader equivalent to num_vals in MultiValuedFastFieldReader * add MultiValueLength trait add MultiValueLength trait in order to unify index creation for BytesFastFieldReader and MultiValuedFastFieldReader in merger * Add ReaderWithOrdinal, fix Add ReaderWithOrdinal to associate data to a reader in merger Fix bytes offset index creation in merger * add test for merging bytes with sorted docids * Merge fieldnorm for sorted index * handle posting list in merge in sorted index handle posting list in merge in sorted index by using doc id mapping for sorting reuse SegmentOrdinal type * handle doc store order in merge in sorted index * fix typo, cleanup * make IndexSetting non-optional * fix type, rename test file fix type rename test file add type * remove SegmentReaderWithOrdinal accessors * cargo fmt * add index sort & merge test to include deletes * Fix posting list merge issue Fix posting list merge issue - ensure serializer always gets monotonically increasing doc ids handle sorting and merging for facets field * performance: cache field readers, use bytes for doc store merge * change facet merge test to cover index sorting * add RawDocument abstraction to access bytes in doc store * fix deserialization, update changelog fix deserialization update changelog forward error on merge failed * cache store readers to utilize lru cache (4x performance) cache store readers, to utilize lru cache (4x faster performance, due to less decompress calls on the block) * add include_temp_doc_store flag in InnerSegmentMeta unset flag on deserialization and after finalize of a segment set flag when creating new instances	2021-05-17 22:20:57 +09:00
Pascal Seitz	25b9429929	calc mem_usage of more structs calc mem_usage of more structs in index creation add some comments	2021-04-30 14:16:39 +02:00
Stéphane Campinas	a0ec6e1e9d	Expand the DocAddress struct with named fields	2021-03-28 19:00:23 +02:00
Paul Masurel	52b1eb2c37	Clippy fix	2021-03-10 14:35:51 +09:00
Paul Masurel	d23aee76c9	Avoid loading fieldnorms when not necessary	2020-11-09 15:50:16 +09:00
Paul Masurel	36a0520a48	Added failing proptest and fixed it.	2020-11-05 15:40:00 +09:00
Paul Masurel	c23a03ad81	Large API Change in the Directory API. (#901 ) Tantivy used to assume that all files could be somehow memory mapped. After this change, Directory return a `FileSlice` that can be reduced and eventually read into an `OwnedBytes` object. Long and blocking io operation are still required by they do not span over the entire file.	2020-10-08 16:36:51 +09:00
Paul Masurel	96f946d4c3	Raultang master (#879 ) * add support for indexed bytes fast field * remove backup code file * refine test cases * Simplified unit test. Renamed it as it is testing the storable part. Not the indexed part. * Small refactoring and added unit test. If multivalued we only retain the first FAST value. Co-authored-by: Raul <raul.tang.lc@gmail.com>	2020-10-01 18:03:18 +09:00
Paul Masurel	439d6956a9	Returning Result in some of the API (#880 ) * Returning Result in some of the API * Introducing `.writer_for_test(..)`	2020-09-07 15:52:34 +09:00
Paul Masurel	462774b15c	Tiqb feature/2018 (#583 ) * rust 2018 * Added CHANGELOG comment	2019-07-01 10:01:46 +09:00
Paul Masurel	66b4615e4e	Issue/542 (#543 ) * Closes 542. Fast fields are all loaded when the segment reader is created.	2019-05-05 13:52:43 +09:00
Paul Masurel	663dd89c05	Feature/reader (#517 ) Adding IndexReader to the API. Making it possible to watch for changes. * Closes #500	2019-03-20 08:39:22 +09:00
Paul Masurel	07d87e154b	Collector refactoring and multithreaded search (#437 ) * Split Collector into an overall Collector and a per-segment SegmentCollector. Precursor to cross-segment parallelism, and as a side benefit cleans up any per-segment fields from being Option<T> to just T. * Attempt to add MultiCollector back * working. Chained collector is broken though * Fix chained collector * Fix test * Make Weight Send+Sync for parallelization purposes * Expose parameters of RangeQuery for external usage * Removed &mut self * fixing tests * Restored TestCollectors * blop * multicollector working * chained collector working * test broken * fixing unit test * blop * blop * Blop * simplifying APi * blop * better syntax * Simplifying top_collector * refactoring * blop * Sync with master * Added multithread search * Collector refactoring * Schema::builder * CR and rustdoc * CR comments * blop * Added an executor * Sorted the segment readers in the searcher * Update searcher.rs * Fixed unit testst * changed the place where we have the sort-segment-by-count heuristic * using crossbeam::channel * inlining * Comments about panics propagating * Added unit test for executor panicking * Readded default * Removed Default impl * Added unit test for executor	2018-11-30 22:46:59 +09:00
Paul Masurel	10f6c07c53	Clippy (#422 ) * Cargo Format * Clippy	2018-09-15 20:20:22 +09:00
Paul Masurel	9a0b7f9855	Rustfmt	2018-05-07 19:50:35 -07:00
Jason Wolfe	8e343b1ca3	Add fast field for associating arbitrary bytes to a document (#275 ) * Add fast field for associating arbitrary bytes to a document * Fix unused macro_use warning * Improvements from code review * Make BytesFastFieldWriter public * Fix json parsing validation failure * Add bytes fast field to CHANGELOG.md * Fix compile errors from merge * Support merging * Address misc code review comments * Fix comments from CR	2018-05-07 19:30:31 -07:00

38 Commits