tantivy

mirror of https://github.com/quickwit-oss/tantivy.git synced 2026-01-07 01:32:53 +00:00

Author	SHA1	Message	Date
PSeitz	5a2397d57e	add sstable ord_to_term benchmark (#2242 )	2023-11-10 07:27:48 +01:00
PSeitz	927b4432c9	Perf: use term hashmap in fastfield (#2243 ) * add shared arena hashmap * bench fastfield indexing * use shared arena hashmap in columnar lower minimum resize in hashtable * clippy * add comments	2023-11-09 13:44:02 +01:00
trinity-1686a	7a0064db1f	bump index version (#2237 ) * bump index version and add constant for lowest supported version * use range instead of handcoded bounds	2023-11-06 19:02:37 +01:00
PSeitz	2e7327205d	fix coverage run (#2232 ) coverage run uses the compare_hash_only feature which is not compativle with the test_hashmap_size test	2023-11-06 11:18:38 +00:00
Paul Masurel	7bc5bf78e2	Fixing functional tests. (#2239 )	2023-11-05 18:18:39 +09:00
giovannicuccu	ef603c8c7e	rename ReloadPolicy onCommit to onCommitWithDelay (#2235 ) * rename ReloadPolicy onCommit to onCommitWithDelay * fix format issues --------- Co-authored-by: Giovanni Cuccu <gcuccu@imolainformatica.it>	2023-11-03 12:22:10 +01:00
PSeitz	28dd6b6546	collect json paths in indexing (#2231 ) * collect json paths in indexing * remove unsafe iter_mut_keys	2023-11-01 11:25:17 +01:00
trinity-1686a	1dda2bb537	handle * inside term in query parser (#2228 )	2023-10-27 08:57:02 +02:00
PSeitz	bf6544cf28	fix mmap::Advice reexport (#2230 )	2023-10-27 14:09:25 +09:00
PSeitz	ccecf946f7	tantivy 0.21.1 (#2227 )	2023-10-27 05:01:44 +02:00
PSeitz	19a859d6fd	term hashmap remove copy in is_empty, unused unordered_id (#2229 )	2023-10-27 05:01:32 +02:00
PSeitz	83af14caa4	Fix range query (#2226 ) Fix range query end check in advance Rename vars to reduce ambiguity add tests Fixes #2225	2023-10-25 09:17:31 +02:00
PSeitz	4feeb2323d	fix clippy (#2223 )	2023-10-24 10:05:22 +02:00
PSeitz	07bf66a197	json path writer (#2224 ) * refactor logic to JsonPathWriter * use in encode_column_name * add inlines * move unsafe block	2023-10-24 09:45:50 +02:00
trinity-1686a	0d4589219b	encode some part of posting list as -1 instead of direct values (#2185 ) * add support for delta-1 encoding posting list * encode term frequency minus one * don't emit tf for json integer terms * make skipreader not pub(crate) mutable	2023-10-20 16:58:26 +02:00
PSeitz	c2b0469180	improve docs, rework exports (#2220 ) * rework exports move snippet and advice make indexer pub, remove indexer reexports * add deprecation warning * add architecture overview	2023-10-18 09:22:24 +02:00
PSeitz	7e1980b218	run coverage only after merge (#2212 ) * run coverage only after merge coverage is a quite slow step in CI. It can be run only after merging * Apply suggestions from code review Co-authored-by: Paul Masurel <paul@quickwit.io> --------- Co-authored-by: Paul Masurel <paul@quickwit.io>	2023-10-18 07:19:36 +02:00
PSeitz	ecb9a89a9f	add compat mode for JSON (#2219 )	2023-10-17 10:00:55 +02:00
PSeitz	5e06e504e6	split into ReferenceValueLeaf (#2217 )	2023-10-16 16:31:30 +02:00
PSeitz	182f58cea6	remove Document: DocumentDeserialize dependency (#2211 ) * remove Document: DocumentDeserialize dependency The dependency requires users to implement an API they may not use. * remove unnecessary Document bounds	2023-10-13 07:59:54 +02:00
dependabot[bot]	337ffadefd	Update lru requirement from 0.11.0 to 0.12.0 (#2208 ) Updates the requirements on [lru](https://github.com/jeromefroe/lru-rs) to permit the latest version. - [Changelog](https://github.com/jeromefroe/lru-rs/blob/master/CHANGELOG.md) - [Commits](https://github.com/jeromefroe/lru-rs/compare/0.11.0...0.12.0) --- updated-dependencies: - dependency-name: lru dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2023-10-12 12:09:56 +02:00
dependabot[bot]	22aa4daf19	Update zstd requirement from 0.12 to 0.13 (#2214 ) Updates the requirements on [zstd](https://github.com/gyscos/zstd-rs) to permit the latest version. - [Release notes](https://github.com/gyscos/zstd-rs/releases) - [Commits](https://github.com/gyscos/zstd-rs/compare/v0.12.0...v0.13.0) --- updated-dependencies: - dependency-name: zstd dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2023-10-12 04:24:44 +02:00
PSeitz	493f9b2f2a	Read list of JSON fields encoded in dictionary (#2184 ) * Read list of JSON fields encoded in dictionary add method to get list of fields on InvertedIndexReader * add field type	2023-10-09 12:06:22 +02:00
PSeitz	e246e5765d	replace ReferenceValue with Self in Value (#2210 )	2023-10-06 08:22:15 +02:00
PSeitz	6097235eff	fix numeric order, refactor Document (#2209 ) fix numeric order to prefer i64 rename and move Document stuff	2023-10-05 16:39:56 +02:00
PSeitz	b700c42246	add AsRef, expose object and array iter on Value (#2207 ) add AsRef expose object and array iter add to_json on Document	2023-10-05 03:55:35 +02:00
PSeitz	5b1bf1a993	replace Field with field name (#2196 )	2023-10-04 06:21:40 +02:00
PSeitz	041d4fced7	move to_named_doc to Document trait (#2205 )	2023-10-04 06:03:07 +02:00
dependabot[bot]	166fc15239	Update memmap2 requirement from 0.7.1 to 0.9.0 (#2204 ) Updates the requirements on [memmap2](https://github.com/RazrFalcon/memmap2-rs) to permit the latest version. - [Changelog](https://github.com/RazrFalcon/memmap2-rs/blob/master/CHANGELOG.md) - [Commits](https://github.com/RazrFalcon/memmap2-rs/compare/v0.7.1...v0.9.0) --- updated-dependencies: - dependency-name: memmap2 dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2023-10-04 05:00:46 +02:00
PSeitz	514a6e7fef	fix bench compile, fix Document reexport (#2203 )	2023-10-03 17:28:36 +02:00
dependabot[bot]	82d9127191	Update fs4 requirement from 0.6.3 to 0.7.0 (#2199 ) Updates the requirements on [fs4](https://github.com/al8n/fs4-rs) to permit the latest version. - [Release notes](https://github.com/al8n/fs4-rs/releases) - [Commits](https://github.com/al8n/fs4-rs/commits/0.7.0) --- updated-dependencies: - dependency-name: fs4 dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2023-10-03 04:43:09 +02:00
PSeitz	03a1f40767	rename DocValue to Value (#2197 ) rename DocValue to Value to avoid confusion with lucene DocValues rename Value to OwnedValue	2023-10-02 17:03:00 +02:00
Harrison Burt	1c7c6fd591	POC: Tantivy documents as a trait (#2071 ) * fix windows build (#1) * Fix windows build * Add doc traits * Add field value iter * Add value and serialization * Adjust order * Fix bug * Correct type * Fix generic bugs * Reformat code * Add generic to index writer which I forgot about * Fix missing generics on single segment writer * Add missing type export * Add default methods for convenience * Cleanup * Fix more-like-this query to use standard types * Update API and fix tests * Add doc traits * Add field value iter * Add value and serialization * Adjust order * Fix bug * Correct type * Rebase main and fix conflicts * Reformat code * Merge upstream * Fix missing generics on single segment writer * Add missing type export * Add default methods for convenience * Cleanup * Fix more-like-this query to use standard types * Update API and fix tests * Add tokenizer improvements from previous commits * Add tokenizer improvements from previous commits * Reformat * Fix unit tests * Fix unit tests * Use enum in changes * Stage changes * Add new deserializer logic * Add serializer integration * Add document deserializer * Implement new (de)serialization api for existing types * Fix bugs and type errors * Add helper implementations * Fix errors * Reformat code * Add unit tests and some code organisation for serialization * Add unit tests to deserializer * Add some small docs * Add support for deserializing serde values * Reformat * Fix typo * Fix typo * Change repr of facet * Remove unused trait methods * Add child value type * Resolve comments * Fix build * Fix more build errors * Fix more build errors * Fix the tests I missed * Fix examples * fix numerical order, serialize PreTok Str * fix coverage * rename Document to TantivyDocument, rename DocumentAccess to Document add Binary prefix to binary de/serialization * fix coverage --------- Co-authored-by: Pascal Seitz <pascal.seitz@gmail.com>	2023-10-02 10:01:16 +02:00
PSeitz	b525f653c0	replace BinaryHeap for TopN (#2186 ) * replace BinaryHeap for TopN replace BinaryHeap for TopN with variant that selects the median with QuickSort, which runs in O(n) time. add merge_fruits fast path * call truncate unconditionally, extend test * remove special early exit * add TODO, fmt * truncate top n instead median, return vec * simplify code	2023-09-27 09:25:30 +02:00
ethever.eth	90586bc1e2	chore: remove unused Seek impl for Writers (#2187 ) (#2189 ) Co-authored-by: famouscat <onismaa@gmail.com>	2023-09-26 17:03:28 +09:00
PSeitz	832f1633de	handle exclusive out of bounds ranges on fastfield range queries (#2174 ) closes https://github.com/quickwit-oss/quickwit/issues/3790	2023-09-26 08:00:40 +02:00
PSeitz	38db53c465	make column_index pub (#2181 )	2023-09-22 08:06:45 +02:00
PSeitz	34920d31f5	Fix DateHistogram bucket gap (#2183 ) * Fix DateHistogram bucket gap Fixes a computation issue of the number of buckets needed in the DateHistogram. This is due to a missing normalization from request values (ms) to fast field values (ns), when converting an intermediate result to the final result. This results in a wrong computation by a factor 1_000_000. The Histogram normalizes values to nanoseconds, to make the user input like extended_bounds (ms precision) and the values from the fast field (ns precision for date type) compatible. This normalization happens only for date type fields, as other field types don't have precision settings. The normalization does not happen due a missing `column_type`, which is not correctly passed after merging an empty aggregation (which does not have a `column_type` set), with a regular aggregation. Another related issue is an empty aggregation, which will not have `column_type` set, will not convert the result to human readable format. This PR fixes the issue by: - Limit the allowed field types of DateHistogram to DateType - Instead of passing the column_type, which is only available on the segment level, we flag the aggregation as `is_date_agg`. - Fix the merge logic Add a flag to to normalization only once. This is not an issue currently, but it could become easily one. closes https://github.com/quickwit-oss/quickwit/issues/3837 * use older nightly for time crate (breaks build)	2023-09-21 10:41:35 +02:00
trinity-1686a	0241a05b90	add support for exists query syntax in query parser (#2170 ) * add support for exists query syntax in query parser * rustfmt * make Exists require a field	2023-09-19 11:10:39 +02:00
PSeitz	e125f3b041	fix test (#2178 )	2023-09-19 08:21:50 +02:00
PSeitz	c520ac46fc	add support for date in term agg (#2172 ) support DateTime in TermsAggregation Format dates with Rfc3339	2023-09-14 09:22:18 +02:00
PSeitz	2d7390341c	increase min memory to 15MB for indexing (#2176 ) With tantivy 0.20 the minimum memory consumption per SegmentWriter increased to 12MB. 7MB are for the different fast field collectors types (they could be lazily created). Increase the minimum memory from 3MB to 15MB. Change memory variable naming from arena to budget. closes #2156	2023-09-13 07:38:34 +02:00
dependabot[bot]	03fcdce016	Bump actions/checkout from 3 to 4 (#2171 ) Bumps [actions/checkout](https://github.com/actions/checkout) from 3 to 4. - [Release notes](https://github.com/actions/checkout/releases) - [Changelog](https://github.com/actions/checkout/blob/main/CHANGELOG.md) - [Commits](https://github.com/actions/checkout/compare/v3...v4) --- updated-dependencies: - dependency-name: actions/checkout dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2023-09-11 10:47:33 +02:00
Ping Xia	e4e416ac42	extend FuzzyTermQuery to support json field (#2173 ) * extend fuzzy search for json field * comments * comments * fmt fix * comments	2023-09-11 05:59:40 +02:00
Igor Motov	19325132b7	Fast-field based implementation of ExistsQuery (#2160 ) Adds an implementation of ExistsQuery that takes advantage of fast fields. Fixes #2159	2023-09-07 11:51:49 +09:00
Paul Masurel	389d36f760	Added comments	2023-09-04 11:06:56 +09:00
PSeitz	49448b31c6	chore: Release (#2168 ) * chore: Release * update CHANGELOG 0.21	2023-09-01 13:58:58 +02:00
PSeitz	ebede0bed7	update CHANGELOG (#2167 )	2023-08-31 10:01:44 +02:00
PSeitz	b1d8b072db	add missing aggregation part 2 (#2149 ) * add missing aggregation part 2 Add missing support for: - Mixed types columns - Key of type string on numerical fields The special aggregation is slower than the integrated one in TermsAggregation and therefore not chosen by default, although it can cover all use cases. * simplify, add num_docs to empty	2023-08-31 07:55:33 +02:00
ethever.eth	ee6a7c2bbb	fix a small typo (#2165 ) Co-authored-by: famouscat <onismaa@gmail.com>	2023-08-30 20:14:26 +02:00

1 2 3 4 5 ...

3105 Commits