tantivy

mirror of https://github.com/quickwit-oss/tantivy.git synced 2026-05-20 10:10:42 +00:00

Author	SHA1	Message	Date
Paul Masurel	61e955039d	Added JSON type JSON field	2022-02-21 13:55:16 +09:00
Paul Masurel	bdedefe07d	Adding an IndexingContext object (#1268 )	2022-02-04 15:08:01 +09:00
Paul Masurel	2069e3e52b	Fixing clippy comments	2022-02-01 10:24:05 +09:00
Paul Masurel	eca6628b3c	Minor refactoring (#1266 )	2022-01-28 15:55:55 +09:00
Paul Masurel	732f6847c0	Field type with codes (#1255 ) * Term are now typed. This change is backward compatible: While the Term has a byte representation that is modified, a Term itself is a transient object that is not serialized as is in the index. Its .field() and .value_bytes() on the other hand are unchanged. This change offers better Debug information for terms. While not necessary it also will help in the support for JSON types. * Renamed Hierarchical Facet -> Facet	2022-01-07 20:49:00 +09:00
Paul Masurel	1c6d9bdc6a	Comparison of Value based on serialization. (#1250 )	2022-01-07 20:31:26 +09:00
Paul Masurel	3ea6800ac5	Pleasing clippy (#1253 )	2022-01-06 16:41:24 +09:00
Paul Masurel	c81b3030fa	Issue/922b (#1233 ) * Add a NORMED options on field Make fieldnorm indexation optional: * for all types except text => added a NORMED options * for text field if STRING, field has not fieldnorm retained if TEXT, field has fieldnorm computed * Finalize making fieldnorm optional for all field types. - Using Option for fieldnorm readers.	2021-12-10 21:12:29 +09:00
Paul Masurel	1d4e9a29db	Cargo fmt	2021-12-02 15:51:44 +09:00
Paul Masurel	dde49ac8e2	Closes #1195 (#1222 ) Removes the indexed option for facets. Facets are now always indexed. Closes #1195	2021-12-02 14:37:19 +09:00
PSeitz	c503c6e4fa	Switch to non-strict schema (#1216 ) Fixes #1211	2021-11-29 10:38:59 +09:00
Paul Masurel	7234bef0eb	Issue/1198 (#1201 ) * Unit test reproducing #1198 * Fixing unit test to handle the error from add_document. * Bump project version	2021-11-11 16:42:19 +09:00
azerowall	fcff91559b	Fix the deserialization error of FieldEntry when the 'options' field appears before the 'type' field (#1199 ) Co-authored-by: quel <azerowall>	2021-11-10 18:39:58 +09:00
Paul Masurel	02cffa4dea	Code simplification. (#1169 ) Code simplification and Clippy	2021-10-07 14:11:44 +09:00
sigaloid	096ce7488e	Resolve some clippys, format (#1144 ) * cargo +nightly clippy --fix -Z unstable-options	2021-08-26 08:46:00 +09:00
Pascal Seitz	3265f7bec3	dissolve common module	2021-08-19 23:26:34 +01:00
Pascal Seitz	dc141cdb29	more docs detail remove code duplicate	2021-08-13 17:40:13 +01:00
François Massot	f4b2e71800	Handle field names with any characters with a known set of special (#1109 ) * Handle field names with any characters with a known set of special characters and an escape one * Update field name validation rule to check only if it has at least one character and does not start with `-` Closes #1087.	2021-07-05 22:31:36 +09:00
Pascal Seitz	9b3e508753	fix clippy	2021-07-01 18:06:09 +02:00
Pascal Seitz	1e4df54ab3	fix clippy	2021-07-01 17:41:53 +02:00
Pascal Seitz	86d0727659	add facet test closes #1100	2021-07-01 15:36:17 +02:00
PSeitz	f05e84f964	add FieldEntry constructor, closes #1086 (#1090 )	2021-06-17 10:15:48 +09:00
Moriyoshi Koizumi	4afba005f9	Provide a means to deal with malformed facet text representation for the query parser (#1056 ) * Provide a means to deal with malformed facet text representation for the query parser. * Specific error enum for the facet parse error.	2021-05-27 12:16:49 +09:00
PSeitz	d523543dc7	Sort Index/Docids By Field (#1026 ) * sort index by field add sort info to IndexSettings generate docid mapping for sorted field (only fastfield) remap singlevalue fastfield * support docid mapping in multivalue fastfield move docid mapping to serialization step (less intermediate data for mapping) add support for docid mapping in multivalue fastfield * handle docid map in bytes fastfield * forward docid mapping, remap postings * fix merge conflicts * move test to index_sorter * add docid index mapping old->new add docid mapping for both directions old->new (used in postings) and new->old (used in fast field) handle mapping in postings recorder warn instead of info for MAX_TOKEN_LEN * remap docid in fielnorm * resort docids in recorder, more extensive tests * handle index sorting in docstore handle index sort in docstore, by saving all the docs in a temp docstore file (SegmentComponent::TempStore). On serialization the docid mapping is used to create a docstore in the correct order by reader the old docstore. add docstore sort tests refactor tests * refactor rename docid doc_id rename docid_map doc_id_map rename DocidMapping DocIdMapping fix typo * u32 to DocId * better doc_id_map creation remove unstable sort * add non mut method to FastFieldWriters add _mut prefix to &mut methods * remove sort_index * fix clippy issues * fix SegmentComponent iterator use std::mem::replace * fix test * fmt * handle indexsettings deserialize * add reading, writing bytes to doc store get bytes of document in doc store add store_bytes method doc writer to accept serialized document add serialization index settings test * rename index_sorter to doc_id_mapping use bufferlender in recorder * fix compile issue, make sort_by_field optional * fix test compile * validate index settings on merge validate index settings on merge forward merge info to SegmentSerializer (for TempStore) * fix doctest * add itertools, use kmerge add itertools, use kmerge push because rustfmt fails * implement/test merge for fastfield implement/test merge for fastfield rename len to num_deleted in DeleteBitSet * Use precalculated docid mapping in merger Use precalculated docid mapping in merger for sorted indices instead of on the fly calculation Add index creation macro benchmark, but commented out for now, since it is not really usable due to long runtimes, and extreme fluctuations. May be better suited in criterion or an external bench bin * fix fast field reader docs fix fast field reader docs, Error instead of None returned add u64s_lenient to fastreader add create docid mapping benchmark * add test for multifast field merge refactor test add test for multifast field merge * add num_bytes to BytesFastFieldReader equivalent to num_vals in MultiValuedFastFieldReader * add MultiValueLength trait add MultiValueLength trait in order to unify index creation for BytesFastFieldReader and MultiValuedFastFieldReader in merger * Add ReaderWithOrdinal, fix Add ReaderWithOrdinal to associate data to a reader in merger Fix bytes offset index creation in merger * add test for merging bytes with sorted docids * Merge fieldnorm for sorted index * handle posting list in merge in sorted index handle posting list in merge in sorted index by using doc id mapping for sorting reuse SegmentOrdinal type * handle doc store order in merge in sorted index * fix typo, cleanup * make IndexSetting non-optional * fix type, rename test file fix type rename test file add type * remove SegmentReaderWithOrdinal accessors * cargo fmt * add index sort & merge test to include deletes * Fix posting list merge issue Fix posting list merge issue - ensure serializer always gets monotonically increasing doc ids handle sorting and merging for facets field * performance: cache field readers, use bytes for doc store merge * change facet merge test to cover index sorting * add RawDocument abstraction to access bytes in doc store * fix deserialization, update changelog fix deserialization update changelog forward error on merge failed * cache store readers to utilize lru cache (4x performance) cache store readers, to utilize lru cache (4x faster performance, due to less decompress calls on the block) * add include_temp_doc_store flag in InnerSegmentMeta unset flag on deserialization and after finalize of a segment set flag when creating new instances	2021-05-17 22:20:57 +09:00
Paul Masurel	39dd8cfe24	Cargo clippy. Acronym should not be full uppercase apparently.	2021-04-26 11:49:18 +09:00
Evance Souamoro	f82922b354	added a scratched of implementation but still need to craft one detail and write test to validate	2021-04-06 11:46:17 +00:00
Stéphane Campinas	a0ec6e1e9d	Expand the DocAddress struct with named fields	2021-03-28 19:00:23 +02:00
Laurent Pouget	4b34231f28	Make facet indexation and storage optional Added a FacetOptions for HierarchicalFacet which add indexed and stored flags to it. Propagate change and update tests accordingly Added a test to ensure that a not indexed flag was taken care of. Added on Value implem the `path()` function to return the stored facet.	2021-03-24 14:56:27 +01:00
Paul Masurel	fe3faf5b3f	Cargo fmt	2021-02-22 14:29:03 +09:00
Minoru Osuka	670b6eaff6	Make NamedFieldDocument deserializable	2020-12-21 16:51:31 +09:00
Paul Masurel	af6dfa1856	Small refactoring	2020-12-03 14:27:05 +09:00
Paul Masurel	9e27da8b4e	Added CR comments. Added Unit tests.	2020-10-28 17:35:34 +09:00
Pasha Podolsky	71c66a5405	[tantivy] Run clippy linter (#914 )	2020-10-27 14:36:02 +09:00
Paul Masurel	91e92fa8a3	Made public.	2020-10-20 14:59:41 +09:00
Pasha Podolsky	80cbe889ba	[tantivy] Add brotli codec for row storage (#885 ) * [tantivy] Add brotli codec for row storage * [tantivy] Fix not actual comments for code * [CR] Fixes for comment and cursor	2020-10-09 14:51:42 +09:00
Paul Masurel	c23a03ad81	Large API Change in the Directory API. (#901 ) Tantivy used to assume that all files could be somehow memory mapped. After this change, Directory return a `FileSlice` that can be reduced and eventually read into an `OwnedBytes` object. Long and blocking io operation are still required by they do not span over the entire file.	2020-10-08 16:36:51 +09:00
Paul Masurel	579e3d1ed8	Removed dev-deps to serde_yaml	2020-10-06 10:04:06 +09:00
Pasha Podolsky	687a36a49c	[tantivy] Fix for schema deserialization error (#902 ) Co-authored-by: Pasha <pasha@izihawa.net>	2020-10-05 11:24:48 +09:00
Paul Masurel	96f946d4c3	Raultang master (#879 ) * add support for indexed bytes fast field * remove backup code file * refine test cases * Simplified unit test. Renamed it as it is testing the storable part. Not the indexed part. * Small refactoring and added unit test. If multivalued we only retain the first FAST value. Co-authored-by: Raul <raul.tang.lc@gmail.com>	2020-10-01 18:03:18 +09:00
Paul Masurel	674cae8ee2	Issue/822 TopDocs sorted by i64, and date fastfield (in addition to u64) (#890 ) * Unsatisfactory implementation. The fastfield are hit. But for performance, we want the comparison to happen on u64, and the conversion to the FastType to be done only on the selected TopK elements. For i64, the current approach might be ok. For DateTime, it is most likely catastrophic. Closes #822 * Decoupled SegmentCollector Fruit from Collector Fruit. Deferred conversion from u64 to the proper FastField type to after the overall collection. (tantivy guarantees that u64 encoding is consistent with the original ordering of the fastfield) Closes #882	2020-09-30 17:51:11 +09:00
Paul Masurel	838c476733	Hirevo move to thiserror (#889 ) * Migrated from `failure` to `thiserror` * Refactoring Co-authored-by: Nicolas Polomack <nicolas@polomack.eu>	2020-09-30 16:34:10 +09:00
Paul Masurel	70bae7ce4c	Removing Term Vec allocation (#881 )	2020-09-08 23:11:00 +09:00
Paul Masurel	439d6956a9	Returning Result in some of the API (#880 ) * Returning Result in some of the API * Introducing `.writer_for_test(..)`	2020-09-07 15:52:34 +09:00
Paul Masurel	6530bf0eae	Make field types less strict when populating documents.	2020-09-06 10:24:03 +09:00
Paul Masurel	3a72b1cb98	Accept dash within field names. (#874 ) Accept dash in field names and enforce field names constraint at the creation of the schema. Closes #796	2020-09-01 13:38:52 +09:00
Paul Masurel	8e74bb98b5	Added field norm readers (#854 )	2020-07-20 13:05:05 +09:00
Fisher Darling	8b67877cd5	Made field methods const fns (#823 )	2020-05-16 10:59:50 +09:00
Paul Masurel	1e5ebdbf3c	Format and remove useless import (#819 )	2020-04-27 11:56:49 +09:00
Paul Masurel	262957717b	unit test fix and use of matches	2020-03-15 00:20:17 +09:00
Paul Masurel	873a808321	Removed itertools (#792 )	2020-03-11 18:41:04 +09:00

1 2 3 4 5

239 Commits