tantivy

mirror of https://github.com/quickwit-oss/tantivy.git synced 2026-01-05 00:32:55 +00:00

Author	SHA1	Message	Date
Pascal Seitz	6c53f9f37f	Fix off by one in optional index Fixes #2293 Fixes an off by one error in the metadata resize of the optional index when loading the index. Merge variables with the same meaning but different names	2024-01-09 15:09:49 +08:00
Adam Reichold	53f2fe1fbe	Forward regex parser errors to enable understandin their reason. (#2288 )	2023-12-22 11:01:10 +01:00
PSeitz	9c75942aaf	fix merge panic for JSON fields (#2284 ) Root cause was the positions buffer had residue positions from the previous term, when the terms were alternating between having and not having positions in JSON (terms have positions, but not numerics). Fixes #2283	2023-12-21 11:05:34 +01:00
PSeitz	bff7c58497	improve indexing benchmark (#2275 )	2023-12-11 09:04:42 +01:00
trinity-1686a	9ebc5ed053	use fst for sstable index (#2268 ) * read path for new fst based index * implement BlockAddrStoreWriter * extract slop/derivation computation * use better linear approximator and allow negative correction to approximator * document format and reorder some fields * optimize single block sstable size * plug backward compat	2023-12-04 15:13:15 +01:00
PSeitz	0b56c88e69	Revert "Preparing for 0.21.2 release." (#2258 ) * Revert "Preparing for 0.21.2 release. (#2256)" This reverts commit `9caab45136`. * bump version to 0.21.1 * set version to 0.22.0-dev	2023-12-01 13:46:12 +01:00
PSeitz	24841f0b2a	update bitpacker dep (#2269 )	2023-12-01 13:45:52 +01:00
PSeitz	1a9fc10be9	add fields_metadata to SegmentReader, add columnar docs (#2222 ) * add fields_metadata to SegmentReader, add columnar docs * use schema to resolve field, add test * normalize paths * merge for FieldsMetadata, add fields_metadata on Index * Update src/core/segment_reader.rs Co-authored-by: Paul Masurel <paul@quickwit.io> * merge code paths * add Hash * move function oustide --------- Co-authored-by: Paul Masurel <paul@quickwit.io>	2023-11-22 12:29:53 +01:00
PSeitz	07573a7f19	update fst (#2267 ) update fst to 0.5 (deduplicates regex-syntax in the dep tree) deps cleanup	2023-11-21 16:06:57 +01:00
BlackHoleFox	daad2dc151	Take string references instead of owned values building Facet paths (#2265 )	2023-11-20 09:40:44 +01:00
PSeitz	054f49dc31	support escaped dot, add agg test (#2250 ) add agg test for nested JSON allow escaping of dot	2023-11-20 03:00:57 +01:00
PSeitz	47009ed2d3	remove unused deps (#2264 ) found with cargo machete remove pprof (doesn't work)	2023-11-20 02:59:59 +01:00
PSeitz	0aae31d7d7	reduce number of allocations (#2257 ) * reduce number of allocations Explanation makes up around 50% of all allocations (numbers not perf). It's created during serialization but not called. - Make Explanation optional in BM25 - Avoid allocations when using Explanation * use Cow	2023-11-16 13:47:36 +01:00
Paul Masurel	9caab45136	Preparing for 0.21.2 release. (#2256 )	2023-11-15 10:43:36 +09:00
Chris Tam	6d9a7b7eb0	Derive Debug for SchemaBuilder (#2254 )	2023-11-15 01:03:44 +01:00
dependabot[bot]	7a2c5804b1	Update itertools requirement from 0.11.0 to 0.12.0 (#2255 ) Updates the requirements on [itertools](https://github.com/rust-itertools/itertools) to permit the latest version. - [Changelog](https://github.com/rust-itertools/itertools/blob/master/CHANGELOG.md) - [Commits](https://github.com/rust-itertools/itertools/compare/v0.11.0...v0.12.0) --- updated-dependencies: - dependency-name: itertools dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2023-11-15 01:03:08 +01:00
François Massot	5319977171	Merge pull request #2253 from quickwit-oss/issue/2251-bug-merge-json-object-with-number Fix bug occuring when merging JSON object indexed with positions.	2023-11-14 17:28:29 +01:00
trinity-1686a	828632e8c4	rustfmt	2023-11-14 15:05:16 +01:00
Paul Masurel	6b59ec6fd5	Fix bug occuring when merging JSON object indexed with positions. In JSON Object field the presence of term frequencies depend on the field. Typically, a string with postiions indexed will have positions while numbers won't. The presence or absence of term freqs for a given term is unfortunately encoded in a very passive way. It is given by the presence of extra information in the skip info, or the lack of term freqs after decoding vint blocks. Before, after writing a segment, we would encode the segment correctly (without any term freq for number in json object field). However during merge, we would get the default term freq=1 value. (this is default in the absence of encoded term freqs) The merger would then proceed and attempt to decode 1 position when there are in fact none. This PR requires to explictly tell the posting serialize whether term frequencies should be serialized for each new term. Closes #2251	2023-11-14 22:41:48 +09:00
PSeitz	b60d862150	docid deltas while indexing (#2249 ) * docid deltas while indexing storing deltas is especially helpful for repetitive data like logs. In those cases, recording a doc on a term costed 4 bytes instead of 1 byte now. HDFS Indexing 1.1GB Total memory consumption: Before: 760 MB Now: 590 MB * use scan for delta decoding	2023-11-13 05:14:27 +01:00
PSeitz	4837c7811a	add missing inlines (#2245 )	2023-11-10 08:00:42 +01:00
PSeitz	5a2397d57e	add sstable ord_to_term benchmark (#2242 )	2023-11-10 07:27:48 +01:00
PSeitz	927b4432c9	Perf: use term hashmap in fastfield (#2243 ) * add shared arena hashmap * bench fastfield indexing * use shared arena hashmap in columnar lower minimum resize in hashtable * clippy * add comments	2023-11-09 13:44:02 +01:00
trinity-1686a	7a0064db1f	bump index version (#2237 ) * bump index version and add constant for lowest supported version * use range instead of handcoded bounds	2023-11-06 19:02:37 +01:00
PSeitz	2e7327205d	fix coverage run (#2232 ) coverage run uses the compare_hash_only feature which is not compativle with the test_hashmap_size test	2023-11-06 11:18:38 +00:00
Paul Masurel	7bc5bf78e2	Fixing functional tests. (#2239 )	2023-11-05 18:18:39 +09:00
giovannicuccu	ef603c8c7e	rename ReloadPolicy onCommit to onCommitWithDelay (#2235 ) * rename ReloadPolicy onCommit to onCommitWithDelay * fix format issues --------- Co-authored-by: Giovanni Cuccu <gcuccu@imolainformatica.it>	2023-11-03 12:22:10 +01:00
PSeitz	28dd6b6546	collect json paths in indexing (#2231 ) * collect json paths in indexing * remove unsafe iter_mut_keys	2023-11-01 11:25:17 +01:00
trinity-1686a	1dda2bb537	handle * inside term in query parser (#2228 )	2023-10-27 08:57:02 +02:00
PSeitz	bf6544cf28	fix mmap::Advice reexport (#2230 )	2023-10-27 14:09:25 +09:00
PSeitz	ccecf946f7	tantivy 0.21.1 (#2227 )	2023-10-27 05:01:44 +02:00
PSeitz	19a859d6fd	term hashmap remove copy in is_empty, unused unordered_id (#2229 )	2023-10-27 05:01:32 +02:00
PSeitz	83af14caa4	Fix range query (#2226 ) Fix range query end check in advance Rename vars to reduce ambiguity add tests Fixes #2225	2023-10-25 09:17:31 +02:00
PSeitz	4feeb2323d	fix clippy (#2223 )	2023-10-24 10:05:22 +02:00
PSeitz	07bf66a197	json path writer (#2224 ) * refactor logic to JsonPathWriter * use in encode_column_name * add inlines * move unsafe block	2023-10-24 09:45:50 +02:00
trinity-1686a	0d4589219b	encode some part of posting list as -1 instead of direct values (#2185 ) * add support for delta-1 encoding posting list * encode term frequency minus one * don't emit tf for json integer terms * make skipreader not pub(crate) mutable	2023-10-20 16:58:26 +02:00
PSeitz	c2b0469180	improve docs, rework exports (#2220 ) * rework exports move snippet and advice make indexer pub, remove indexer reexports * add deprecation warning * add architecture overview	2023-10-18 09:22:24 +02:00
PSeitz	7e1980b218	run coverage only after merge (#2212 ) * run coverage only after merge coverage is a quite slow step in CI. It can be run only after merging * Apply suggestions from code review Co-authored-by: Paul Masurel <paul@quickwit.io> --------- Co-authored-by: Paul Masurel <paul@quickwit.io>	2023-10-18 07:19:36 +02:00
PSeitz	ecb9a89a9f	add compat mode for JSON (#2219 )	2023-10-17 10:00:55 +02:00
PSeitz	5e06e504e6	split into ReferenceValueLeaf (#2217 )	2023-10-16 16:31:30 +02:00
PSeitz	182f58cea6	remove Document: DocumentDeserialize dependency (#2211 ) * remove Document: DocumentDeserialize dependency The dependency requires users to implement an API they may not use. * remove unnecessary Document bounds	2023-10-13 07:59:54 +02:00
dependabot[bot]	337ffadefd	Update lru requirement from 0.11.0 to 0.12.0 (#2208 ) Updates the requirements on [lru](https://github.com/jeromefroe/lru-rs) to permit the latest version. - [Changelog](https://github.com/jeromefroe/lru-rs/blob/master/CHANGELOG.md) - [Commits](https://github.com/jeromefroe/lru-rs/compare/0.11.0...0.12.0) --- updated-dependencies: - dependency-name: lru dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2023-10-12 12:09:56 +02:00
dependabot[bot]	22aa4daf19	Update zstd requirement from 0.12 to 0.13 (#2214 ) Updates the requirements on [zstd](https://github.com/gyscos/zstd-rs) to permit the latest version. - [Release notes](https://github.com/gyscos/zstd-rs/releases) - [Commits](https://github.com/gyscos/zstd-rs/compare/v0.12.0...v0.13.0) --- updated-dependencies: - dependency-name: zstd dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2023-10-12 04:24:44 +02:00
PSeitz	493f9b2f2a	Read list of JSON fields encoded in dictionary (#2184 ) * Read list of JSON fields encoded in dictionary add method to get list of fields on InvertedIndexReader * add field type	2023-10-09 12:06:22 +02:00
PSeitz	e246e5765d	replace ReferenceValue with Self in Value (#2210 )	2023-10-06 08:22:15 +02:00
PSeitz	6097235eff	fix numeric order, refactor Document (#2209 ) fix numeric order to prefer i64 rename and move Document stuff	2023-10-05 16:39:56 +02:00
PSeitz	b700c42246	add AsRef, expose object and array iter on Value (#2207 ) add AsRef expose object and array iter add to_json on Document	2023-10-05 03:55:35 +02:00
PSeitz	5b1bf1a993	replace Field with field name (#2196 )	2023-10-04 06:21:40 +02:00
PSeitz	041d4fced7	move to_named_doc to Document trait (#2205 )	2023-10-04 06:03:07 +02:00
dependabot[bot]	166fc15239	Update memmap2 requirement from 0.7.1 to 0.9.0 (#2204 ) Updates the requirements on [memmap2](https://github.com/RazrFalcon/memmap2-rs) to permit the latest version. - [Changelog](https://github.com/RazrFalcon/memmap2-rs/blob/master/CHANGELOG.md) - [Commits](https://github.com/RazrFalcon/memmap2-rs/compare/v0.7.1...v0.9.0) --- updated-dependencies: - dependency-name: memmap2 dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2023-10-04 05:00:46 +02:00

1 2 3 4 5 ...

3126 Commits