tantivy

mirror of https://github.com/quickwit-oss/tantivy.git synced 2026-05-21 02:30:43 +00:00

Author	SHA1	Message	Date
François Massot	f4b2e71800	Handle field names with any characters with a known set of special (#1109 ) * Handle field names with any characters with a known set of special characters and an escape one * Update field name validation rule to check only if it has at least one character and does not start with `-` Closes #1087.	2021-07-05 22:31:36 +09:00
Pascal Seitz	9b3e508753	fix clippy	2021-07-01 18:06:09 +02:00
Pascal Seitz	1e4df54ab3	fix clippy	2021-07-01 17:41:53 +02:00
Pascal Seitz	86d0727659	add facet test closes #1100	2021-07-01 15:36:17 +02:00
PSeitz	f05e84f964	add FieldEntry constructor, closes #1086 (#1090 )	2021-06-17 10:15:48 +09:00
Moriyoshi Koizumi	4afba005f9	Provide a means to deal with malformed facet text representation for the query parser (#1056 ) * Provide a means to deal with malformed facet text representation for the query parser. * Specific error enum for the facet parse error.	2021-05-27 12:16:49 +09:00
PSeitz	d523543dc7	Sort Index/Docids By Field (#1026 ) * sort index by field add sort info to IndexSettings generate docid mapping for sorted field (only fastfield) remap singlevalue fastfield * support docid mapping in multivalue fastfield move docid mapping to serialization step (less intermediate data for mapping) add support for docid mapping in multivalue fastfield * handle docid map in bytes fastfield * forward docid mapping, remap postings * fix merge conflicts * move test to index_sorter * add docid index mapping old->new add docid mapping for both directions old->new (used in postings) and new->old (used in fast field) handle mapping in postings recorder warn instead of info for MAX_TOKEN_LEN * remap docid in fielnorm * resort docids in recorder, more extensive tests * handle index sorting in docstore handle index sort in docstore, by saving all the docs in a temp docstore file (SegmentComponent::TempStore). On serialization the docid mapping is used to create a docstore in the correct order by reader the old docstore. add docstore sort tests refactor tests * refactor rename docid doc_id rename docid_map doc_id_map rename DocidMapping DocIdMapping fix typo * u32 to DocId * better doc_id_map creation remove unstable sort * add non mut method to FastFieldWriters add _mut prefix to &mut methods * remove sort_index * fix clippy issues * fix SegmentComponent iterator use std::mem::replace * fix test * fmt * handle indexsettings deserialize * add reading, writing bytes to doc store get bytes of document in doc store add store_bytes method doc writer to accept serialized document add serialization index settings test * rename index_sorter to doc_id_mapping use bufferlender in recorder * fix compile issue, make sort_by_field optional * fix test compile * validate index settings on merge validate index settings on merge forward merge info to SegmentSerializer (for TempStore) * fix doctest * add itertools, use kmerge add itertools, use kmerge push because rustfmt fails * implement/test merge for fastfield implement/test merge for fastfield rename len to num_deleted in DeleteBitSet * Use precalculated docid mapping in merger Use precalculated docid mapping in merger for sorted indices instead of on the fly calculation Add index creation macro benchmark, but commented out for now, since it is not really usable due to long runtimes, and extreme fluctuations. May be better suited in criterion or an external bench bin * fix fast field reader docs fix fast field reader docs, Error instead of None returned add u64s_lenient to fastreader add create docid mapping benchmark * add test for multifast field merge refactor test add test for multifast field merge * add num_bytes to BytesFastFieldReader equivalent to num_vals in MultiValuedFastFieldReader * add MultiValueLength trait add MultiValueLength trait in order to unify index creation for BytesFastFieldReader and MultiValuedFastFieldReader in merger * Add ReaderWithOrdinal, fix Add ReaderWithOrdinal to associate data to a reader in merger Fix bytes offset index creation in merger * add test for merging bytes with sorted docids * Merge fieldnorm for sorted index * handle posting list in merge in sorted index handle posting list in merge in sorted index by using doc id mapping for sorting reuse SegmentOrdinal type * handle doc store order in merge in sorted index * fix typo, cleanup * make IndexSetting non-optional * fix type, rename test file fix type rename test file add type * remove SegmentReaderWithOrdinal accessors * cargo fmt * add index sort & merge test to include deletes * Fix posting list merge issue Fix posting list merge issue - ensure serializer always gets monotonically increasing doc ids handle sorting and merging for facets field * performance: cache field readers, use bytes for doc store merge * change facet merge test to cover index sorting * add RawDocument abstraction to access bytes in doc store * fix deserialization, update changelog fix deserialization update changelog forward error on merge failed * cache store readers to utilize lru cache (4x performance) cache store readers, to utilize lru cache (4x faster performance, due to less decompress calls on the block) * add include_temp_doc_store flag in InnerSegmentMeta unset flag on deserialization and after finalize of a segment set flag when creating new instances	2021-05-17 22:20:57 +09:00
Paul Masurel	39dd8cfe24	Cargo clippy. Acronym should not be full uppercase apparently.	2021-04-26 11:49:18 +09:00
Evance Souamoro	f82922b354	added a scratched of implementation but still need to craft one detail and write test to validate	2021-04-06 11:46:17 +00:00
Stéphane Campinas	a0ec6e1e9d	Expand the DocAddress struct with named fields	2021-03-28 19:00:23 +02:00
Laurent Pouget	4b34231f28	Make facet indexation and storage optional Added a FacetOptions for HierarchicalFacet which add indexed and stored flags to it. Propagate change and update tests accordingly Added a test to ensure that a not indexed flag was taken care of. Added on Value implem the `path()` function to return the stored facet.	2021-03-24 14:56:27 +01:00
Paul Masurel	fe3faf5b3f	Cargo fmt	2021-02-22 14:29:03 +09:00
Minoru Osuka	670b6eaff6	Make NamedFieldDocument deserializable	2020-12-21 16:51:31 +09:00
Paul Masurel	af6dfa1856	Small refactoring	2020-12-03 14:27:05 +09:00
Paul Masurel	9e27da8b4e	Added CR comments. Added Unit tests.	2020-10-28 17:35:34 +09:00
Pasha Podolsky	71c66a5405	[tantivy] Run clippy linter (#914 )	2020-10-27 14:36:02 +09:00
Paul Masurel	91e92fa8a3	Made public.	2020-10-20 14:59:41 +09:00
Pasha Podolsky	80cbe889ba	[tantivy] Add brotli codec for row storage (#885 ) * [tantivy] Add brotli codec for row storage * [tantivy] Fix not actual comments for code * [CR] Fixes for comment and cursor	2020-10-09 14:51:42 +09:00
Paul Masurel	c23a03ad81	Large API Change in the Directory API. (#901 ) Tantivy used to assume that all files could be somehow memory mapped. After this change, Directory return a `FileSlice` that can be reduced and eventually read into an `OwnedBytes` object. Long and blocking io operation are still required by they do not span over the entire file.	2020-10-08 16:36:51 +09:00
Paul Masurel	579e3d1ed8	Removed dev-deps to serde_yaml	2020-10-06 10:04:06 +09:00
Pasha Podolsky	687a36a49c	[tantivy] Fix for schema deserialization error (#902 ) Co-authored-by: Pasha <pasha@izihawa.net>	2020-10-05 11:24:48 +09:00
Paul Masurel	96f946d4c3	Raultang master (#879 ) * add support for indexed bytes fast field * remove backup code file * refine test cases * Simplified unit test. Renamed it as it is testing the storable part. Not the indexed part. * Small refactoring and added unit test. If multivalued we only retain the first FAST value. Co-authored-by: Raul <raul.tang.lc@gmail.com>	2020-10-01 18:03:18 +09:00
Paul Masurel	674cae8ee2	Issue/822 TopDocs sorted by i64, and date fastfield (in addition to u64) (#890 ) * Unsatisfactory implementation. The fastfield are hit. But for performance, we want the comparison to happen on u64, and the conversion to the FastType to be done only on the selected TopK elements. For i64, the current approach might be ok. For DateTime, it is most likely catastrophic. Closes #822 * Decoupled SegmentCollector Fruit from Collector Fruit. Deferred conversion from u64 to the proper FastField type to after the overall collection. (tantivy guarantees that u64 encoding is consistent with the original ordering of the fastfield) Closes #882	2020-09-30 17:51:11 +09:00
Paul Masurel	838c476733	Hirevo move to thiserror (#889 ) * Migrated from `failure` to `thiserror` * Refactoring Co-authored-by: Nicolas Polomack <nicolas@polomack.eu>	2020-09-30 16:34:10 +09:00
Paul Masurel	70bae7ce4c	Removing Term Vec allocation (#881 )	2020-09-08 23:11:00 +09:00
Paul Masurel	439d6956a9	Returning Result in some of the API (#880 ) * Returning Result in some of the API * Introducing `.writer_for_test(..)`	2020-09-07 15:52:34 +09:00
Paul Masurel	6530bf0eae	Make field types less strict when populating documents.	2020-09-06 10:24:03 +09:00
Paul Masurel	3a72b1cb98	Accept dash within field names. (#874 ) Accept dash in field names and enforce field names constraint at the creation of the schema. Closes #796	2020-09-01 13:38:52 +09:00
Paul Masurel	8e74bb98b5	Added field norm readers (#854 )	2020-07-20 13:05:05 +09:00
Fisher Darling	8b67877cd5	Made field methods const fns (#823 )	2020-05-16 10:59:50 +09:00
Paul Masurel	1e5ebdbf3c	Format and remove useless import (#819 )	2020-04-27 11:56:49 +09:00
Paul Masurel	262957717b	unit test fix and use of matches	2020-03-15 00:20:17 +09:00
Paul Masurel	873a808321	Removed itertools (#792 )	2020-03-11 18:41:04 +09:00
Paul Masurel	486b8fa9c5	Removing serde-derive dependency (#786 )	2020-03-06 23:33:58 +09:00
Paul Masurel	7d6cfa58e1	[WIP] Alternative take on boosted queries (#772 ) * Alternative take on boosted queries * Fixing unit test * Added boosting to the query grammar. * Made BoostQuery public. * Added support for boosting field in QueryParser Closes #547	2020-02-19 11:04:38 +09:00
Halvor Fladsrud Bø	ab13ffe377	Facet path string (#759 ) * Added to_path_string * Fixed logic. Found strange behavior with string comparisons. * ran formatter * Fixed test * Fixed format * Fixed comment	2020-01-30 10:11:29 +09:00
Minoru Osuka	749432f949	Make SchemaBuilder::add_field() public (#742 ) * Make add_field() to public * cargo format	2019-12-25 20:37:34 +09:00
Paul Masurel	401f74f7ae	Implement fast field for DateTime. (#736 )	2019-12-20 21:20:15 +09:00
Paul Masurel	daf64487b4	Fixing JSON se/deserialization of dates. (#721 ) Closes #719	2019-12-09 13:31:35 +09:00
Ximo Guanter	00816f5529	Fix outdated reference in documentation (#720 )	2019-12-08 18:10:50 +09:00
Paul Masurel	afe0134d0f	Kkoziara remove tokens from doc store (#715 ) * Prevent tokens from being stored in the document store. Commit adds prepare_for_store method to Document, which changes all PreTokenizedString values into String values. The method is called before adding document to the document store to prevent tokens from being saved there. Commit also adds small changes to comments in pre_tokenized_text example. * Avoid storing the pretokenized text.	2019-11-25 22:39:12 +09:00
Paul Masurel	ef3eddf3da	clippy first stab (#711 )	2019-11-22 13:09:35 +09:00
Paul Masurel	fb3d6fa332	Adding Value::From<PretokenizedText> (#697 )	2019-11-10 14:39:44 +09:00
kkoziara	0519056bd8	Added handling of pre-tokenized text fields (#642 ). (#669 ) * Added handling of pre-tokenized text fields (#642). * * Updated changelog and examples concerning #642. * Added tokenized_text method to Value implementation. * Implemented From<TokenizedString> for TokenizedStream. * * Removed tokenized flag from TextOptions and code reliance on the flag. * Changed naming to use word "pre-tokenized" instead of "tokenized". * Updated example code. * Fixed comments. * Minor code refactoring. Test improvements.	2019-11-07 10:10:56 +09:00
Paul Masurel	67bce6cbf2	Fixing the construction of the DeleteBitset. (#683 ) Closes #681	2019-11-04 15:39:11 +09:00
xiaoniu-578fa6bff964d005	e5316a4388	Reduce unnecessary clone. (#684 )	2019-11-04 13:57:59 +09:00
Paul Masurel	7b21b3f25a	Refactoring around Field (#673 ) * Refactoring around Field Removing the contract about the order of the field, and the field id allocation. * Update delete_queue.rs * Update field.rs	2019-10-25 09:06:44 +09:00
petr-tik	1187a02a3e	Fixed #664 (#667 ) Removed references to u8 and old documentation	2019-10-22 09:34:10 +09:00
Paul Masurel	5c6580eb15	fmt (#661 )	2019-10-04 12:10:01 +09:00
fdb-hiroshima	7e08e0047b	fix Term documentation (#655 ) u64-based fields are actually 4+8=12 bytes long	2019-09-11 18:49:35 +09:00

1 2 3 4 5

222 Commits