tantivy

mirror of https://github.com/quickwit-oss/tantivy.git synced 2026-06-05 01:50:42 +00:00

Author	SHA1	Message	Date
PSeitz	24841f0b2a	update bitpacker dep (#2269 )	2023-12-01 13:45:52 +01:00
PSeitz	1a9fc10be9	add fields_metadata to SegmentReader, add columnar docs (#2222 ) * add fields_metadata to SegmentReader, add columnar docs * use schema to resolve field, add test * normalize paths * merge for FieldsMetadata, add fields_metadata on Index * Update src/core/segment_reader.rs Co-authored-by: Paul Masurel <paul@quickwit.io> * merge code paths * add Hash * move function oustide --------- Co-authored-by: Paul Masurel <paul@quickwit.io>	2023-11-22 12:29:53 +01:00
PSeitz	07573a7f19	update fst (#2267 ) update fst to 0.5 (deduplicates regex-syntax in the dep tree) deps cleanup	2023-11-21 16:06:57 +01:00
BlackHoleFox	daad2dc151	Take string references instead of owned values building Facet paths (#2265 )	2023-11-20 09:40:44 +01:00
PSeitz	054f49dc31	support escaped dot, add agg test (#2250 ) add agg test for nested JSON allow escaping of dot	2023-11-20 03:00:57 +01:00
PSeitz	47009ed2d3	remove unused deps (#2264 ) found with cargo machete remove pprof (doesn't work)	2023-11-20 02:59:59 +01:00
PSeitz	0aae31d7d7	reduce number of allocations (#2257 ) * reduce number of allocations Explanation makes up around 50% of all allocations (numbers not perf). It's created during serialization but not called. - Make Explanation optional in BM25 - Avoid allocations when using Explanation * use Cow	2023-11-16 13:47:36 +01:00
Paul Masurel	9caab45136	Preparing for 0.21.2 release. (#2256 )	2023-11-15 10:43:36 +09:00
Chris Tam	6d9a7b7eb0	Derive Debug for SchemaBuilder (#2254 )	2023-11-15 01:03:44 +01:00
dependabot[bot]	7a2c5804b1	Update itertools requirement from 0.11.0 to 0.12.0 (#2255 ) Updates the requirements on [itertools](https://github.com/rust-itertools/itertools) to permit the latest version. - [Changelog](https://github.com/rust-itertools/itertools/blob/master/CHANGELOG.md) - [Commits](https://github.com/rust-itertools/itertools/compare/v0.11.0...v0.12.0) --- updated-dependencies: - dependency-name: itertools dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2023-11-15 01:03:08 +01:00
François Massot	5319977171	Merge pull request #2253 from quickwit-oss/issue/2251-bug-merge-json-object-with-number Fix bug occuring when merging JSON object indexed with positions.	2023-11-14 17:28:29 +01:00
trinity-1686a	828632e8c4	rustfmt	2023-11-14 15:05:16 +01:00
Paul Masurel	6b59ec6fd5	Fix bug occuring when merging JSON object indexed with positions. In JSON Object field the presence of term frequencies depend on the field. Typically, a string with postiions indexed will have positions while numbers won't. The presence or absence of term freqs for a given term is unfortunately encoded in a very passive way. It is given by the presence of extra information in the skip info, or the lack of term freqs after decoding vint blocks. Before, after writing a segment, we would encode the segment correctly (without any term freq for number in json object field). However during merge, we would get the default term freq=1 value. (this is default in the absence of encoded term freqs) The merger would then proceed and attempt to decode 1 position when there are in fact none. This PR requires to explictly tell the posting serialize whether term frequencies should be serialized for each new term. Closes #2251	2023-11-14 22:41:48 +09:00
PSeitz	b60d862150	docid deltas while indexing (#2249 ) * docid deltas while indexing storing deltas is especially helpful for repetitive data like logs. In those cases, recording a doc on a term costed 4 bytes instead of 1 byte now. HDFS Indexing 1.1GB Total memory consumption: Before: 760 MB Now: 590 MB * use scan for delta decoding	2023-11-13 05:14:27 +01:00
PSeitz	4837c7811a	add missing inlines (#2245 )	2023-11-10 08:00:42 +01:00
PSeitz	5a2397d57e	add sstable ord_to_term benchmark (#2242 )	2023-11-10 07:27:48 +01:00
PSeitz	927b4432c9	Perf: use term hashmap in fastfield (#2243 ) * add shared arena hashmap * bench fastfield indexing * use shared arena hashmap in columnar lower minimum resize in hashtable * clippy * add comments	2023-11-09 13:44:02 +01:00
trinity-1686a	7a0064db1f	bump index version (#2237 ) * bump index version and add constant for lowest supported version * use range instead of handcoded bounds	2023-11-06 19:02:37 +01:00
PSeitz	2e7327205d	fix coverage run (#2232 ) coverage run uses the compare_hash_only feature which is not compativle with the test_hashmap_size test	2023-11-06 11:18:38 +00:00
Paul Masurel	7bc5bf78e2	Fixing functional tests. (#2239 )	2023-11-05 18:18:39 +09:00
giovannicuccu	ef603c8c7e	rename ReloadPolicy onCommit to onCommitWithDelay (#2235 ) * rename ReloadPolicy onCommit to onCommitWithDelay * fix format issues --------- Co-authored-by: Giovanni Cuccu <gcuccu@imolainformatica.it>	2023-11-03 12:22:10 +01:00
PSeitz	28dd6b6546	collect json paths in indexing (#2231 ) * collect json paths in indexing * remove unsafe iter_mut_keys	2023-11-01 11:25:17 +01:00
trinity-1686a	1dda2bb537	handle * inside term in query parser (#2228 )	2023-10-27 08:57:02 +02:00
PSeitz	bf6544cf28	fix mmap::Advice reexport (#2230 )	2023-10-27 14:09:25 +09:00
PSeitz	ccecf946f7	tantivy 0.21.1 (#2227 )	2023-10-27 05:01:44 +02:00
PSeitz	19a859d6fd	term hashmap remove copy in is_empty, unused unordered_id (#2229 )	2023-10-27 05:01:32 +02:00
PSeitz	83af14caa4	Fix range query (#2226 ) Fix range query end check in advance Rename vars to reduce ambiguity add tests Fixes #2225	2023-10-25 09:17:31 +02:00
PSeitz	4feeb2323d	fix clippy (#2223 )	2023-10-24 10:05:22 +02:00
PSeitz	07bf66a197	json path writer (#2224 ) * refactor logic to JsonPathWriter * use in encode_column_name * add inlines * move unsafe block	2023-10-24 09:45:50 +02:00
trinity-1686a	0d4589219b	encode some part of posting list as -1 instead of direct values (#2185 ) * add support for delta-1 encoding posting list * encode term frequency minus one * don't emit tf for json integer terms * make skipreader not pub(crate) mutable	2023-10-20 16:58:26 +02:00
PSeitz	c2b0469180	improve docs, rework exports (#2220 ) * rework exports move snippet and advice make indexer pub, remove indexer reexports * add deprecation warning * add architecture overview	2023-10-18 09:22:24 +02:00
PSeitz	7e1980b218	run coverage only after merge (#2212 ) * run coverage only after merge coverage is a quite slow step in CI. It can be run only after merging * Apply suggestions from code review Co-authored-by: Paul Masurel <paul@quickwit.io> --------- Co-authored-by: Paul Masurel <paul@quickwit.io>	2023-10-18 07:19:36 +02:00
PSeitz	ecb9a89a9f	add compat mode for JSON (#2219 )	2023-10-17 10:00:55 +02:00
PSeitz	5e06e504e6	split into ReferenceValueLeaf (#2217 )	2023-10-16 16:31:30 +02:00
PSeitz	182f58cea6	remove Document: DocumentDeserialize dependency (#2211 ) * remove Document: DocumentDeserialize dependency The dependency requires users to implement an API they may not use. * remove unnecessary Document bounds	2023-10-13 07:59:54 +02:00
dependabot[bot]	337ffadefd	Update lru requirement from 0.11.0 to 0.12.0 (#2208 ) Updates the requirements on [lru](https://github.com/jeromefroe/lru-rs) to permit the latest version. - [Changelog](https://github.com/jeromefroe/lru-rs/blob/master/CHANGELOG.md) - [Commits](https://github.com/jeromefroe/lru-rs/compare/0.11.0...0.12.0) --- updated-dependencies: - dependency-name: lru dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2023-10-12 12:09:56 +02:00
dependabot[bot]	22aa4daf19	Update zstd requirement from 0.12 to 0.13 (#2214 ) Updates the requirements on [zstd](https://github.com/gyscos/zstd-rs) to permit the latest version. - [Release notes](https://github.com/gyscos/zstd-rs/releases) - [Commits](https://github.com/gyscos/zstd-rs/compare/v0.12.0...v0.13.0) --- updated-dependencies: - dependency-name: zstd dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2023-10-12 04:24:44 +02:00
PSeitz	493f9b2f2a	Read list of JSON fields encoded in dictionary (#2184 ) * Read list of JSON fields encoded in dictionary add method to get list of fields on InvertedIndexReader * add field type	2023-10-09 12:06:22 +02:00
PSeitz	e246e5765d	replace ReferenceValue with Self in Value (#2210 )	2023-10-06 08:22:15 +02:00
PSeitz	6097235eff	fix numeric order, refactor Document (#2209 ) fix numeric order to prefer i64 rename and move Document stuff	2023-10-05 16:39:56 +02:00
PSeitz	b700c42246	add AsRef, expose object and array iter on Value (#2207 ) add AsRef expose object and array iter add to_json on Document	2023-10-05 03:55:35 +02:00
PSeitz	5b1bf1a993	replace Field with field name (#2196 )	2023-10-04 06:21:40 +02:00
PSeitz	041d4fced7	move to_named_doc to Document trait (#2205 )	2023-10-04 06:03:07 +02:00
dependabot[bot]	166fc15239	Update memmap2 requirement from 0.7.1 to 0.9.0 (#2204 ) Updates the requirements on [memmap2](https://github.com/RazrFalcon/memmap2-rs) to permit the latest version. - [Changelog](https://github.com/RazrFalcon/memmap2-rs/blob/master/CHANGELOG.md) - [Commits](https://github.com/RazrFalcon/memmap2-rs/compare/v0.7.1...v0.9.0) --- updated-dependencies: - dependency-name: memmap2 dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2023-10-04 05:00:46 +02:00
PSeitz	514a6e7fef	fix bench compile, fix Document reexport (#2203 )	2023-10-03 17:28:36 +02:00
dependabot[bot]	82d9127191	Update fs4 requirement from 0.6.3 to 0.7.0 (#2199 ) Updates the requirements on [fs4](https://github.com/al8n/fs4-rs) to permit the latest version. - [Release notes](https://github.com/al8n/fs4-rs/releases) - [Commits](https://github.com/al8n/fs4-rs/commits/0.7.0) --- updated-dependencies: - dependency-name: fs4 dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2023-10-03 04:43:09 +02:00
PSeitz	03a1f40767	rename DocValue to Value (#2197 ) rename DocValue to Value to avoid confusion with lucene DocValues rename Value to OwnedValue	2023-10-02 17:03:00 +02:00
Harrison Burt	1c7c6fd591	POC: Tantivy documents as a trait (#2071 ) * fix windows build (#1) * Fix windows build * Add doc traits * Add field value iter * Add value and serialization * Adjust order * Fix bug * Correct type * Fix generic bugs * Reformat code * Add generic to index writer which I forgot about * Fix missing generics on single segment writer * Add missing type export * Add default methods for convenience * Cleanup * Fix more-like-this query to use standard types * Update API and fix tests * Add doc traits * Add field value iter * Add value and serialization * Adjust order * Fix bug * Correct type * Rebase main and fix conflicts * Reformat code * Merge upstream * Fix missing generics on single segment writer * Add missing type export * Add default methods for convenience * Cleanup * Fix more-like-this query to use standard types * Update API and fix tests * Add tokenizer improvements from previous commits * Add tokenizer improvements from previous commits * Reformat * Fix unit tests * Fix unit tests * Use enum in changes * Stage changes * Add new deserializer logic * Add serializer integration * Add document deserializer * Implement new (de)serialization api for existing types * Fix bugs and type errors * Add helper implementations * Fix errors * Reformat code * Add unit tests and some code organisation for serialization * Add unit tests to deserializer * Add some small docs * Add support for deserializing serde values * Reformat * Fix typo * Fix typo * Change repr of facet * Remove unused trait methods * Add child value type * Resolve comments * Fix build * Fix more build errors * Fix more build errors * Fix the tests I missed * Fix examples * fix numerical order, serialize PreTok Str * fix coverage * rename Document to TantivyDocument, rename DocumentAccess to Document add Binary prefix to binary de/serialization * fix coverage --------- Co-authored-by: Pascal Seitz <pascal.seitz@gmail.com>	2023-10-02 10:01:16 +02:00
PSeitz	b525f653c0	replace BinaryHeap for TopN (#2186 ) * replace BinaryHeap for TopN replace BinaryHeap for TopN with variant that selects the median with QuickSort, which runs in O(n) time. add merge_fruits fast path * call truncate unconditionally, extend test * remove special early exit * add TODO, fmt * truncate top n instead median, return vec * simplify code	2023-09-27 09:25:30 +02:00
ethever.eth	90586bc1e2	chore: remove unused Seek impl for Writers (#2187 ) (#2189 ) Co-authored-by: famouscat <onismaa@gmail.com>	2023-09-26 17:03:28 +09:00

1 2 3 4 5 ...

3120 Commits