tantivy

mirror of https://github.com/quickwit-oss/tantivy.git synced 2026-01-06 17:22:54 +00:00

Author	SHA1	Message	Date
Paul Masurel	63c66005db	Lazy scorers (#2726 ) * Refactoring of the score tweaker into `SortKeyComputer`s to unlock two features. - Allow lazy evaluation of score. As soon as we identified that a doc won't reach the topK threshold, we can stop the evaluation. - Allow for a different segment level score, segment level score and their conversion. This PR breaks public API, but fixing code is straightforward. * Bumping tantivy version --------- Co-authored-by: Paul Masurel <paul.masurel@datadoghq.com>	2025-12-01 15:38:57 +01:00
Paul Masurel	f88b7200b2	Optimization when posting list are saturated. (#2745 ) * Optimization when posting list are saturated. If a posting list doc freq is the segment reader's max_doc, and if scoring does not matter, we can replace it by a AllScorer. In turn, in a boolean query, we can dismiss all scorers and empty scorers, to accelerate the request. * Added range query optimization * CR comment * CR comments * CR comment --------- Co-authored-by: Paul Masurel <paul.masurel@datadoghq.com>	2025-11-26 15:50:57 +01:00
PSeitz-dd	7963b0b4aa	Add fast field fallback for term query if not indexed (#2693 ) * Add fast field fallback for term query if not indexed * only fallback without scores	2025-09-12 14:58:21 +02:00
Paul Masurel	7bc5bf78e2	Fixing functional tests. (#2239 )	2023-11-05 18:18:39 +09:00
Harrison Burt	1c7c6fd591	POC: Tantivy documents as a trait (#2071 ) * fix windows build (#1) * Fix windows build * Add doc traits * Add field value iter * Add value and serialization * Adjust order * Fix bug * Correct type * Fix generic bugs * Reformat code * Add generic to index writer which I forgot about * Fix missing generics on single segment writer * Add missing type export * Add default methods for convenience * Cleanup * Fix more-like-this query to use standard types * Update API and fix tests * Add doc traits * Add field value iter * Add value and serialization * Adjust order * Fix bug * Correct type * Rebase main and fix conflicts * Reformat code * Merge upstream * Fix missing generics on single segment writer * Add missing type export * Add default methods for convenience * Cleanup * Fix more-like-this query to use standard types * Update API and fix tests * Add tokenizer improvements from previous commits * Add tokenizer improvements from previous commits * Reformat * Fix unit tests * Fix unit tests * Use enum in changes * Stage changes * Add new deserializer logic * Add serializer integration * Add document deserializer * Implement new (de)serialization api for existing types * Fix bugs and type errors * Add helper implementations * Fix errors * Reformat code * Add unit tests and some code organisation for serialization * Add unit tests to deserializer * Add some small docs * Add support for deserializing serde values * Reformat * Fix typo * Fix typo * Change repr of facet * Remove unused trait methods * Add child value type * Resolve comments * Fix build * Fix more build errors * Fix more build errors * Fix the tests I missed * Fix examples * fix numerical order, serialize PreTok Str * fix coverage * rename Document to TantivyDocument, rename DocumentAccess to Document add Binary prefix to binary de/serialization * fix coverage --------- Co-authored-by: Pascal Seitz <pascal.seitz@gmail.com>	2023-10-02 10:01:16 +02:00
Yuri Astrakhan	74275b76a6	Inline format arguments where makes sense (#2038 ) Applied this command to the code, making it a bit shorter and slightly more readable. ``` cargo +nightly clippy --all-features --benches --tests --workspace --fix -- -A clippy::all -W clippy::uninlined_format_args cargo +nightly fmt --all ```	2023-05-10 18:03:59 +09:00
PSeitz	74f9eafefc	refactor Term (#2006 ) * refactor Term add ValueBytes for serialized term values add missing debug for ip skip unnecessary json path validation remove code duplication add DATE_TIME_PRECISION_INDEXED constant add missing Term clarification remove weird value_bytes_mut() API * fix naming	2023-04-20 15:31:43 +02:00
Alex Cole	f2f38c43ce	Make BM25 scoring more flexible (#1855 ) * Introduce Bm25StatisticsProvider to inject statistics * fix formatting I accidentally changed	2023-02-16 19:14:12 +09:00
Shikhar Bhushan	2650111b76	EnableScoring::Disabled - optional Searcher (#1780 )	2023-01-12 09:26:50 -05:00
Paul Masurel	3edf0a2724	Using the manual reload policy in IndexWriter. (#1667 )	2022-11-09 11:20:41 +01:00
PSeitz	23fe73a6c0	remove searcher pool and make Searcher cloneable (#1411 ) * remove searcher pool and make Searcher cloneable closes #1410 * use SearcherInner in InnerIndexReader	2022-07-12 18:07:48 +09:00
Paul Masurel	d7b46d2137	Added JSON Type (#1270 ) - Removed useless copy when ingesting JSON. - Bugfix in phrase query with a missing field norms. - Disabled range query on default fields Closes #1251	2022-02-24 16:25:22 +09:00
Paul Masurel	2069e3e52b	Fixing clippy comments	2022-02-01 10:24:05 +09:00
Paul Masurel	eca6628b3c	Minor refactoring (#1266 )	2022-01-28 15:55:55 +09:00
Paul Masurel	732f6847c0	Field type with codes (#1255 ) * Term are now typed. This change is backward compatible: While the Term has a byte representation that is modified, a Term itself is a transient object that is not serialized as is in the index. Its .field() and .value_bytes() on the other hand are unchanged. This change offers better Debug information for terms. While not necessary it also will help in the support for JSON types. * Renamed Hierarchical Facet -> Facet	2022-01-07 20:49:00 +09:00
Paul Masurel	7234bef0eb	Issue/1198 (#1201 ) * Unit test reproducing #1198 * Fixing unit test to handle the error from add_document. * Bump project version	2021-11-11 16:42:19 +09:00
Stéphane Campinas	a0ec6e1e9d	Expand the DocAddress struct with named fields	2021-03-28 19:00:23 +02:00
Paul Masurel	36a0520a48	Added failing proptest and fixed it.	2020-11-05 15:40:00 +09:00
Paul Masurel	730ccefffb	Fixes a bug in TermQuery::explain. Closes #915	2020-10-28 22:29:15 +09:00
Paul Masurel	439d6956a9	Returning Result in some of the API (#880 ) * Returning Result in some of the API * Introducing `.writer_for_test(..)`	2020-09-07 15:52:34 +09:00
Paul Masurel	2481c87be8	Block wand (#856 )	2020-08-19 22:36:36 +09:00
Paul Masurel	6db8bb49d6	Assert nearly equals macro (#853 ) * Assert nearly equals macro * Renamed specialized_scorer in TermScorer	2020-07-17 16:40:41 +09:00
Paul Masurel	f71b04acb0	Bugfix. (#849 ) go_to_first_doc was typically calling seek with a target smaller than doc. Since SegmentPostings typically do a linear search on the full block, regardless of the current position, it could have our segment postings go backward.	2020-07-16 10:57:51 +09:00
Paul Masurel	e25284bafe	Major change in the DocSet/Scorer API (#824 ) - Change in the DocSet and Scorer API. (@fulmicoton). A freshly created DocSet point directly to their first doc. A sentinel value called TERMINATED marks the end of a DocSet. `.advance()` returns the new DocId. `Scorer::skip(target)` has been replaced by `Scorer::seek(target)` and returns the resulting DocId. As a result, iterating through DocSet now looks as follows ```rust let mut doc = docset.doc(); while doc != TERMINATED { // ... doc = docset.advance(); } ``` The change made it possible to greatly simplify a lot of the docset's code. - Misc internal optimization and introduction of the `Scorer::for_each_pruning` function. (@fulmicoton)	2020-05-16 16:33:36 +09:00
Paul Masurel	7d6cfa58e1	[WIP] Alternative take on boosted queries (#772 ) * Alternative take on boosted queries * Fixing unit test * Added boosting to the query grammar. * Made BoostQuery public. * Added support for boosting field in QueryParser Closes #547	2020-02-19 11:04:38 +09:00
Paul Masurel	7b21b3f25a	Refactoring around Field (#673 ) * Refactoring around Field Removing the contract about the order of the field, and the field id allocation. * Update delete_queue.rs * Update field.rs	2019-10-25 09:06:44 +09:00
Paul Masurel	b3b0138b82	Change for tantivy-py Schema.convert_named_doc Better Debug string for Terms and TermQueries	2019-08-14 17:44:25 +09:00
Paul Masurel	462774b15c	Tiqb feature/2018 (#583 ) * rust 2018 * Added CHANGELOG comment	2019-07-01 10:01:46 +09:00
Paul Masurel	b7c2d0de97	Clippy2 (#534 ) * Clippy comments Clippy complaints that about the cast of &[u32] to a const __m128i, because of the lack of alignment constraints. This commit passes the OutputBuffer object (which enforces proper alignment) instead of `&[u32]`. Clippy. Block alignment * Code simplification * Added comment. Code simplification * Removed the extraneous freq block len hack.	2019-04-24 12:31:32 +09:00
Paul Masurel	d823163d52	Closes #527 . (#529 ) Fixing the bug that affects the result of `query.count()` in presence of deletes.	2019-04-19 09:19:50 +09:00
Paul Masurel	663dd89c05	Feature/reader (#517 ) Adding IndexReader to the API. Making it possible to watch for changes. * Closes #500	2019-03-20 08:39:22 +09:00
Paul Masurel	63b593bd0a	Lower RAM usage in tests.	2019-01-24 09:10:38 +09:00
Paul Masurel	279a9eb5e3	Closes #449 (#450 ) Clippy working on stable. Clippy warnings addressed	2018-12-10 12:20:59 +09:00
Paul Masurel	a6e767c877	Cargo fmt	2018-11-30 22:52:45 +09:00
Paul Masurel	07d87e154b	Collector refactoring and multithreaded search (#437 ) * Split Collector into an overall Collector and a per-segment SegmentCollector. Precursor to cross-segment parallelism, and as a side benefit cleans up any per-segment fields from being Option<T> to just T. * Attempt to add MultiCollector back * working. Chained collector is broken though * Fix chained collector * Fix test * Make Weight Send+Sync for parallelization purposes * Expose parameters of RangeQuery for external usage * Removed &mut self * fixing tests * Restored TestCollectors * blop * multicollector working * chained collector working * test broken * fixing unit test * blop * blop * Blop * simplifying APi * blop * better syntax * Simplifying top_collector * refactoring * blop * Sync with master * Added multithread search * Collector refactoring * Schema::builder * CR and rustdoc * CR comments * blop * Added an executor * Sorted the segment readers in the searcher * Update searcher.rs * Fixed unit testst * changed the place where we have the sort-segment-by-count heuristic * using crossbeam::channel * inlining * Comments about panics propagating * Added unit test for executor panicking * Readded default * Removed Default impl * Added unit test for executor	2018-11-30 22:46:59 +09:00
pentlander	8600b8ea25	Top collector (#413 ) * Make TopCollector generic Make TopCollector take a generic type instead of only being tied to score. This will allow for sharing code between a TopCollector that sorts results by Score and a TopCollector that sorts documents by a fast field. This commit makes no functional changes to TopCollector. * Add TopFieldCollector and TopScoreCollector Create two new collectors that use the refactored TopCollector. TopFieldCollector has the same functionality that TopCollector originally had. TopFieldCollector allows for sorting results by a given fast field. Closes tantivy-search/tantivy#388 * Make TopCollector private Make TopCollector package private and export TopFieldCollector as TopCollector to maintain backwards compatibility. Mark TopCollector as deprecated to encourage use of the non-aliased TopFieldCollector. Remove Collector implementation for TopCollector since it is not longer used.	2018-09-14 09:22:17 +09:00
Paul Masurel	78673172d0	Cargo fmt	2018-04-21 20:05:36 +09:00
Paul Masurel	1d9566e73c	Making mmap a feature	2018-03-31 13:23:43 +09:00
Paul Masurel	ffa03bad71	TermScorer does not handle deletes	2018-03-27 17:35:20 +09:00
Paul Masurel	3ae03b91ae	PhraseScorer's score aligned with that of Lucene.)	2018-03-25 12:44:16 +09:00
Paul Masurel	b7f8884246	Closes #245 = BM25. (#260 ) * Closes #245 = BM25. Scores are the same as Lucene. * Fixing travis conf	2018-03-22 15:06:56 +09:00
Paul Masurel	e22f767fda	Backmerge	2018-03-21 21:18:46 +09:00
Paul Masurel	3ecfc36e53	Total field norm fixed.	2018-03-21 20:43:02 +09:00
Paul Masurel	1c9450174e	Fieldnorm reader working except merge	2018-03-21 17:36:16 +09:00
Paul Masurel	4ee2db25a0	Generic on Postings rather than deletes in TermScorer	2018-02-22 08:26:45 +09:00
Paul Masurel	f16cc6367e	Refactoring of fastfields	2018-02-20 12:52:30 +09:00
Paul Masurel	2f242d5f52	Moving docset around	2018-02-19 12:07:05 +09:00
Paul Masurel	eb50e92ec4	Removed specialized postings on SegmentPostings	2018-02-18 00:09:15 +09:00
Paul Masurel	292bb17346	Disable scoring - Disabling scoring is an argument of the `.weight()` method - Collectors declare whether they need scoring	2018-02-17 12:43:16 +09:00
Paul Masurel	1e55189db1	NOBUG rustfmt	2017-12-14 19:30:31 +09:00

1 2

59 Commits