tantivy

mirror of https://github.com/quickwit-oss/tantivy.git synced 2026-01-17 06:22:54 +00:00

Author	SHA1	Message	Date
PSeitz	67ebba3c3c	expose collect_block buffer size (#2326 ) * expose buffer of collect_block * flip shard_size segment_size	2024-03-15 08:02:08 +01:00
PSeitz	48630ceec9	move into new index module (#2259 ) move core modules to index module	2024-01-31 10:30:04 +01:00
Adam Reichold	53f2fe1fbe	Forward regex parser errors to enable understandin their reason. (#2288 )	2023-12-22 11:01:10 +01:00
PSeitz	0aae31d7d7	reduce number of allocations (#2257 ) * reduce number of allocations Explanation makes up around 50% of all allocations (numbers not perf). It's created during serialization but not called. - Make Explanation optional in BM25 - Avoid allocations when using Explanation * use Cow	2023-11-16 13:47:36 +01:00
Paul Masurel	7bc5bf78e2	Fixing functional tests. (#2239 )	2023-11-05 18:18:39 +09:00
PSeitz	83af14caa4	Fix range query (#2226 ) Fix range query end check in advance Rename vars to reduce ambiguity add tests Fixes #2225	2023-10-25 09:17:31 +02:00
trinity-1686a	0d4589219b	encode some part of posting list as -1 instead of direct values (#2185 ) * add support for delta-1 encoding posting list * encode term frequency minus one * don't emit tf for json integer terms * make skipreader not pub(crate) mutable	2023-10-20 16:58:26 +02:00
PSeitz	03a1f40767	rename DocValue to Value (#2197 ) rename DocValue to Value to avoid confusion with lucene DocValues rename Value to OwnedValue	2023-10-02 17:03:00 +02:00
Harrison Burt	1c7c6fd591	POC: Tantivy documents as a trait (#2071 ) * fix windows build (#1) * Fix windows build * Add doc traits * Add field value iter * Add value and serialization * Adjust order * Fix bug * Correct type * Fix generic bugs * Reformat code * Add generic to index writer which I forgot about * Fix missing generics on single segment writer * Add missing type export * Add default methods for convenience * Cleanup * Fix more-like-this query to use standard types * Update API and fix tests * Add doc traits * Add field value iter * Add value and serialization * Adjust order * Fix bug * Correct type * Rebase main and fix conflicts * Reformat code * Merge upstream * Fix missing generics on single segment writer * Add missing type export * Add default methods for convenience * Cleanup * Fix more-like-this query to use standard types * Update API and fix tests * Add tokenizer improvements from previous commits * Add tokenizer improvements from previous commits * Reformat * Fix unit tests * Fix unit tests * Use enum in changes * Stage changes * Add new deserializer logic * Add serializer integration * Add document deserializer * Implement new (de)serialization api for existing types * Fix bugs and type errors * Add helper implementations * Fix errors * Reformat code * Add unit tests and some code organisation for serialization * Add unit tests to deserializer * Add some small docs * Add support for deserializing serde values * Reformat * Fix typo * Fix typo * Change repr of facet * Remove unused trait methods * Add child value type * Resolve comments * Fix build * Fix more build errors * Fix more build errors * Fix the tests I missed * Fix examples * fix numerical order, serialize PreTok Str * fix coverage * rename Document to TantivyDocument, rename DocumentAccess to Document add Binary prefix to binary de/serialization * fix coverage --------- Co-authored-by: Pascal Seitz <pascal.seitz@gmail.com>	2023-10-02 10:01:16 +02:00
PSeitz	832f1633de	handle exclusive out of bounds ranges on fastfield range queries (#2174 ) closes https://github.com/quickwit-oss/quickwit/issues/3790	2023-09-26 08:00:40 +02:00
trinity-1686a	0241a05b90	add support for exists query syntax in query parser (#2170 ) * add support for exists query syntax in query parser * rustfmt * make Exists require a field	2023-09-19 11:10:39 +02:00
PSeitz	2d7390341c	increase min memory to 15MB for indexing (#2176 ) With tantivy 0.20 the minimum memory consumption per SegmentWriter increased to 12MB. 7MB are for the different fast field collectors types (they could be lazily created). Increase the minimum memory from 3MB to 15MB. Change memory variable naming from arena to budget. closes #2156	2023-09-13 07:38:34 +02:00
Ping Xia	e4e416ac42	extend FuzzyTermQuery to support json field (#2173 ) * extend fuzzy search for json field * comments * comments * fmt fix * comments	2023-09-11 05:59:40 +02:00
Igor Motov	19325132b7	Fast-field based implementation of ExistsQuery (#2160 ) Adds an implementation of ExistsQuery that takes advantage of fast fields. Fixes #2159	2023-09-07 11:51:49 +09:00
PSeitz	c4e2708901	fix clippy, fmt (#2162 )	2023-08-30 08:04:26 +02:00
PSeitz	48d4847b38	Improve aggregation error message (#2150 ) * Improve aggregation error message Improve aggregation error message by wrapping the deserialization with a custom struct. This deserialization variant is slower, since we need to keep the deserialized data around twice with this approach. For now the valid variants list is manually updated. This could be replaced with a proc macro. closes #2143 * Simpler implementation --------- Co-authored-by: Paul Masurel <paul@quickwit.io>	2023-08-23 20:52:15 +02:00
PSeitz	480763db0d	track memory arena memory usage (#2148 )	2023-08-16 18:19:42 +02:00
Caleb Hattingh	47b315ff18	doc: escape the backslash (#2144 )	2023-08-14 19:10:07 +02:00
Adam Reichold	22c35b1e00	Fix explanation of boost queries seeking beyond query result. (#2142 ) * Make current nightly Clippy happy. * Fix explanation of boost queries seeking beyond query result.	2023-08-14 11:59:11 +09:00
trinity-1686a	b92082b748	implement lenient parser (#2129 ) * move query parser to nom * add suupport for term grouping * initial work on infallible parser * fmt * add tests and fix minor parsing bugs * address review comments * add support for lenient queries in tantivy * make lenient parser report errors * allow mixing occur and bool in query	2023-08-08 15:41:29 +02:00
Adam Reichold	42acd334f4	Fixes the new deny-by-default incorrect_partial_ord_impl_on_ord_type Clippy lint (#2131 )	2023-07-21 11:36:17 +09:00
Adam Reichold	5fafe4b1ab	Add missing query_terms impl for TermSetQuery. (#2120 )	2023-07-13 14:54:29 +02:00
François Massot	0a23201338	Fix stackoverflow and add docs.	2023-07-03 22:05:11 +09:00
Paul Masurel	ad4c940fa3	proof of concept for dynamic tokenizer.	2023-07-03 22:05:10 +09:00
Paul Masurel	910b0b0c61	Cargo fmt	2023-07-03 22:03:31 +09:00
PSeitz	3fef052bf1	fix flaky test (#2107 ) closes #2099	2023-06-29 14:30:56 +08:00
François Massot	0cb53207ec	Fix tests.	2023-06-11 12:13:35 +02:00
PSeitz	fdecb79273	tokenizer-api: reduce Tokenizer overhead (#2062 ) * tokenizer-api: reduce Tokenizer overhead Previously a new `Token` for each text encountered was created, which contains `String::with_capacity(200)` In the new API the token_stream gets mutable access to the tokenizer, this allows state to be shared (in this PR Token is shared). Ideally the allocation for the BoxTokenStream would also be removed, but this may require some lifetime tricks. * simplify api * move lowercase and ascii folding buffer to global * empty Token text as default	2023-06-08 18:37:58 +08:00
Adam Reichold	b325d569ad	Expose phrase-prefix queries via the built-in query parser (#2044 ) * Expose phrase-prefix queries via the built-in query parser This proposes the less-than-imaginative syntax `field:"phrase ter"` to perform a phrase prefix query against `field` using `phrase` and `ter` as the terms. The aim of this is to make this type of query more discoverable and simplify manual testing. I did consider exposing the `max_expansions` parameter similar to how slop is handled, but I think that this is rather something that should be configured via the querser parser (similar to `set_field_boost` and `set_field_fuzzy`) as choosing it requires rather intimiate knowledge of the backing index. Prevent construction of zero or one term phrase-prefix queries via the query parser. * Add example using phrase-prefix search via surface API to improve feature discoverability.	2023-06-01 13:03:16 +02:00
trinity-1686a	6564e0c467	fix phrase prefix query (#2043 ) * fix phrase prefix query it would fail spectacularly when no doc in the segment would match the phrase part of the query * clippy	2023-05-22 12:36:20 +02:00
Paul Masurel	62709b8094	Change in the query grammar. (#2050 ) * Change in the query grammar. Quotation mark can now be used for phrase queries. The delimiter is part of the `UserInputLeaf`. That information is meant to be used in Quickwit to solve #3364. This PR also adds support for quotation marks escaping in phrase queries. * Apply suggestions from code review	2023-05-19 12:07:10 +09:00
Adam Reichold	fedd9559e7	Expose create a query from a user input AST. (#2039 )	2023-05-11 21:53:18 +09:00
PSeitz	0eafbaab8e	fix slop (#2031 ) Fix slop by carrying slop so far for multiterms. Define slop contract in the API	2023-05-10 11:45:14 +02:00
Yuri Astrakhan	74275b76a6	Inline format arguments where makes sense (#2038 ) Applied this command to the code, making it a bit shorter and slightly more readable. ``` cargo +nightly clippy --all-features --benches --tests --workspace --fix -- -A clippy::all -W clippy::uninlined_format_args cargo +nightly fmt --all ```	2023-05-10 18:03:59 +09:00
PSeitz	4c58b0086d	allow slop in both directions (#2020 ) * allow slop in both directions allow slop in both directions so "big wolf"~3 can also match "wolf big" This also fixes #1934, when the docsets were reordered by size and didn't match the terms. * remove count * add test for repeating tokens, unduplicate tests	2023-05-07 12:05:21 +09:00
Paul Masurel	f28ddb711e	Exposing u64-based FastFieldRangeWeight (#2024 )	2023-05-03 18:32:00 +09:00
PSeitz	ba309e18a1	switch to nanosecond precision (#2016 )	2023-05-01 03:32:20 +02:00
PSeitz	74f9eafefc	refactor Term (#2006 ) * refactor Term add ValueBytes for serialized term values add missing debug for ip skip unnecessary json path validation remove code duplication add DATE_TIME_PRECISION_INDEXED constant add missing Term clarification remove weird value_bytes_mut() API * fix naming	2023-04-20 15:31:43 +02:00
RT_Enzyme	ff3d3313c4	fix BooleanQuery document (#1999 ) * fix BooleanQuery document * Update src/query/boolean_query/boolean_query.rs --------- Co-authored-by: Paul Masurel <paul@quickwit.io>	2023-04-20 11:37:20 +02:00
Paul Masurel	fbda511a1a	Making more things public for quickwit. (#2005 )	2023-04-20 11:37:45 +09:00
Paul Masurel	4b01cc4c49	Made BooleanWeight and BoostWeight public (#1991 )	2023-04-12 10:26:30 +09:00
PSeitz	5c380b76e7	Better mixed types support in aggs and fix serialization issue (#1971 ) * Better mixed types support in aggs and fix serialization issue - Improve support for mixed types in JSON field aggregations (pick the right field, #1913) - Resolve the issue with JSON serialization for numeric keys (fixes #1967) - Add JSON round-trip test for term buckets - Remove `u64_lenient`, as this is a footgun without the type - move aggregation benchmarks * remove shadowing	2023-03-31 05:52:11 +02:00
PSeitz	6a7a1106d6	work in batches of docs (#1937 ) * work in batches of docs * add fill_buffer test	2023-03-21 06:57:44 +01:00
trinity-1686a	064518156f	refactor tokenization pipeline to use GATs (#1924 ) * refactor tokenization pipeline to use GATs * fix doctests * fix clippy lints * remove commented code	2023-03-09 09:39:37 +01:00
Paul Masurel	7fae4d98d7	Adapting for quickwit2 (#1912 ) * Adapting tantivy to make it possible to be plugged to quickwit. * Apply suggestions from code review Co-authored-by: PSeitz <PSeitz@users.noreply.github.com> * Added unit test --------- Co-authored-by: PSeitz <PSeitz@users.noreply.github.com>	2023-03-01 16:27:46 +09:00
trinity-1686a	8a71e00da3	allow limiting the number of matched term in range query (#1899 )	2023-02-27 10:44:08 +01:00
Paul Masurel	d25fc155b2	Making some of the column/termdict operations async-friendly (#1902 )	2023-02-27 15:34:47 +09:00
Paul Masurel	66ff53b0f4	Various minor code cleanup (#1909 )	2023-02-27 13:48:34 +09:00
Paul Masurel	d002698008	Re-export of query grammar. (#1908 )	2023-02-27 12:26:34 +09:00
trinity-1686a	533ad99cd5	add PhrasePrefixQuery (#1842 ) * add PhrasePrefixQuery	2023-02-22 11:18:33 +01:00

1 2 3 4 5 ...

539 Commits