tantivy

mirror of https://github.com/quickwit-oss/tantivy.git synced 2026-01-07 09:32:54 +00:00

Author	SHA1	Message	Date
PSeitz	735c588f4f	fix union performance regression (#2790 ) * add inlines * fix union performance regression Remove unwrap from hotpath generates better assembly. closes #2788	2026-01-02 12:06:51 +01:00
PSeitz	923f0508f2	seek_exact + cost based intersection (#2538 ) * seek_exact + cost based intersection Adds `seek_exact` and `cost` to `DocSet` for a more efficient intersection. Unlike `seek`, `seek_exact` does not require the DocSet to advance to the next hit, if the target does not exist. `cost` allows to address the different DocSet types and their cost model and is used to determine the DocSet that drives the intersection. E.g. fast field range queries may do a full scan. Phrase queries load the positions to check if a we have a hit. They both have a higher cost than their size_hint would suggest. Improves `size_hint` estimation for intersection and union, by having a estimation based on random distribution with a co-location factor. Refactor range query benchmark. Closes #2531 Future Work Implement `seek_exact` for BufferedUnionScorer and RangeDocSet (fast field range queries) Evaluate replacing `seek` with `seek_exact` to reduce code complexity * Apply suggestions from code review Co-authored-by: Paul Masurel <paul@quickwit.io> * add API contract verfication * impl seek_exact on union * rename seek_exact * add mixed AND OR test, fix buffered_union * Add a proptest of BooleanQuery. (#2690) * fix build * Increase the document count. * fix merge conflict * fix debug assert * Fix compilation errors after rebase - Remove duplicate proptest_boolean_query module - Remove duplicate cost() method implementations - Fix TopDocs API usage (add .order_by_score()) - Remove duplicate imports - Remove unused variable assignments --------- Co-authored-by: Paul Masurel <paul@quickwit.io> Co-authored-by: Pascal Seitz <pascal.seitz@datadoghq.com> Co-authored-by: Stu Hood <stuhood@gmail.com>	2025-12-30 14:43:25 +01:00
PSeitz	33835b6a01	Add DocSet::cost() (#2707 ) * query: add DocSet cost hint and use it for intersection ordering - Add DocSet::cost() - Use cost() instead of size_hint() to order scorers in intersect_scorers This isolates cost-related changes without the new seek APIs from PR #2538 * add comments --------- Co-authored-by: Pascal Seitz <pascal.seitz@datadoghq.com>	2025-10-13 16:25:49 +02:00
PSeitz	21d057059e	clippy (#2527 ) * clippy * clippy * clippy * clippy * convert allow to expect and remove unused * cargo fmt * cleanup * export sample * clippy	2024-10-22 09:26:54 +08:00
PSeitz	dca508b4ca	remove read_postings_no_deletes (#2526 ) closes #2525	2024-10-22 09:52:43 +09:00
Bruce Mitchener	c17e513377	Reduce typo count. (#2510 )	2024-10-10 09:55:37 +08:00
trinity-1686a	85395d942a	fix clippy lints from 1.80-1.81 (#2488 ) * fix some clippy lints * fix clippy::doc_lazy_continuation * fix some lints for 1.82	2024-09-05 14:33:05 +02:00
PSeitz	3d1c4b313a	support ff range queries on json fields (#2456 ) * support ff range queries on json fields * fix term date truncation * use inverted index range query for phrase prefix queries * rename to InvertedIndexRangeQuery * fix column filter, add mixed column test	2024-08-02 00:06:50 +08:00
PSeitz	1b4076691f	refactor fast field query (#2452 ) As preparation of #2023 and #1709 * Use Term to pass parameters * merge u64 and ip fast field range query Side note: I did not rename range_query_u64_fastfield, because then git can't track the changes.	2024-07-15 18:08:05 +08:00
Paul Masurel	e453848134	Recycling buffer in PrefixPhraseScorer (#2443 )	2024-06-24 17:11:53 +09:00
PSeitz	48630ceec9	move into new index module (#2259 ) move core modules to index module	2024-01-31 10:30:04 +01:00
Harrison Burt	1c7c6fd591	POC: Tantivy documents as a trait (#2071 ) * fix windows build (#1) * Fix windows build * Add doc traits * Add field value iter * Add value and serialization * Adjust order * Fix bug * Correct type * Fix generic bugs * Reformat code * Add generic to index writer which I forgot about * Fix missing generics on single segment writer * Add missing type export * Add default methods for convenience * Cleanup * Fix more-like-this query to use standard types * Update API and fix tests * Add doc traits * Add field value iter * Add value and serialization * Adjust order * Fix bug * Correct type * Rebase main and fix conflicts * Reformat code * Merge upstream * Fix missing generics on single segment writer * Add missing type export * Add default methods for convenience * Cleanup * Fix more-like-this query to use standard types * Update API and fix tests * Add tokenizer improvements from previous commits * Add tokenizer improvements from previous commits * Reformat * Fix unit tests * Fix unit tests * Use enum in changes * Stage changes * Add new deserializer logic * Add serializer integration * Add document deserializer * Implement new (de)serialization api for existing types * Fix bugs and type errors * Add helper implementations * Fix errors * Reformat code * Add unit tests and some code organisation for serialization * Add unit tests to deserializer * Add some small docs * Add support for deserializing serde values * Reformat * Fix typo * Fix typo * Change repr of facet * Remove unused trait methods * Add child value type * Resolve comments * Fix build * Fix more build errors * Fix more build errors * Fix the tests I missed * Fix examples * fix numerical order, serialize PreTok Str * fix coverage * rename Document to TantivyDocument, rename DocumentAccess to Document add Binary prefix to binary de/serialization * fix coverage --------- Co-authored-by: Pascal Seitz <pascal.seitz@gmail.com>	2023-10-02 10:01:16 +02:00
Ping Xia	e4e416ac42	extend FuzzyTermQuery to support json field (#2173 ) * extend fuzzy search for json field * comments * comments * fmt fix * comments	2023-09-11 05:59:40 +02:00
Adam Reichold	b325d569ad	Expose phrase-prefix queries via the built-in query parser (#2044 ) * Expose phrase-prefix queries via the built-in query parser This proposes the less-than-imaginative syntax `field:"phrase ter"` to perform a phrase prefix query against `field` using `phrase` and `ter` as the terms. The aim of this is to make this type of query more discoverable and simplify manual testing. I did consider exposing the `max_expansions` parameter similar to how slop is handled, but I think that this is rather something that should be configured via the querser parser (similar to `set_field_boost` and `set_field_fuzzy`) as choosing it requires rather intimiate knowledge of the backing index. Prevent construction of zero or one term phrase-prefix queries via the query parser. * Add example using phrase-prefix search via surface API to improve feature discoverability.	2023-06-01 13:03:16 +02:00
trinity-1686a	6564e0c467	fix phrase prefix query (#2043 ) * fix phrase prefix query it would fail spectacularly when no doc in the segment would match the phrase part of the query * clippy	2023-05-22 12:36:20 +02:00
Yuri Astrakhan	74275b76a6	Inline format arguments where makes sense (#2038 ) Applied this command to the code, making it a bit shorter and slightly more readable. ``` cargo +nightly clippy --all-features --benches --tests --workspace --fix -- -A clippy::all -W clippy::uninlined_format_args cargo +nightly fmt --all ```	2023-05-10 18:03:59 +09:00
PSeitz	74f9eafefc	refactor Term (#2006 ) * refactor Term add ValueBytes for serialized term values add missing debug for ip skip unnecessary json path validation remove code duplication add DATE_TIME_PRECISION_INDEXED constant add missing Term clarification remove weird value_bytes_mut() API * fix naming	2023-04-20 15:31:43 +02:00
trinity-1686a	8a71e00da3	allow limiting the number of matched term in range query (#1899 )	2023-02-27 10:44:08 +01:00
Paul Masurel	d25fc155b2	Making some of the column/termdict operations async-friendly (#1902 )	2023-02-27 15:34:47 +09:00
trinity-1686a	533ad99cd5	add PhrasePrefixQuery (#1842 ) * add PhrasePrefixQuery	2023-02-22 11:18:33 +01:00

20 Commits