tantivy

mirror of https://github.com/quickwit-oss/tantivy.git synced 2026-01-08 10:02:55 +00:00

Author	SHA1	Message	Date
PSeitz	aebae9965d	add RegexPhraseQuery (#2516 ) * add RegexPhraseQuery RegexPhraseQuery supports phrase queries with regex. It supports regex and wildcards. E.g. a query with wildcards: "b* b* wolf" matches "big bad wolf" Slop is supported as well: "b* wolf"~2 matches "big bad wolf" Regex queries may match a lot of terms where we still need to keep track which term hit to load the positions. The phrase query algorithm groups terms by their frequency together in the union to prefilter groups early. This PR comes with some new datastructures: SimpleUnion - A union docset for a list of docsets. It doesn't do any caching and is therefore well suited for datasets with lots of skipping. (phrase search, but intersections in general) LoadedPostings - Like SegmentPostings, but all docs and positions are loaded in memory. SegmentPostings uses 1840 bytes per instance with its caches, which is equivalent to 460 docids. LoadedPostings is used for terms which have less than 100 docs. LoadedPostings is only used to reduce memory consumption. BitSetPostingUnion - Creates a `Posting` that uses the bitset for docid hits and the docsets for positions. The BitSet is the precalculated union of the docsets In the RegexPhraseQuery there is a size limit of 512 docsets per PreAggregatedUnion, before creating a new one. Renamed Union to BufferedUnionScorer Added proptests to test different union types. * cleanup * use Box instead of Vec * use RefCell instead of term_freq(&mut) * remove wildcard mode * move RefCell to outer * clippy	2024-10-21 18:29:17 +08:00
PSeitz	27be6aed91	lift clauses in LogicalAst (#2449 ) (a OR b) OR (c OR d) can be simplified to (a OR b OR c OR d) (a AND b) AND (c AND d) can be simplified to (a AND b AND c AND d) This directly affects how queries are executed remove unused SumWithCoordsCombiner the number of fields is unused and private	2024-08-14 19:21:26 +02:00
PSeitz	3d1c4b313a	support ff range queries on json fields (#2456 ) * support ff range queries on json fields * fix term date truncation * use inverted index range query for phrase prefix queries * rename to InvertedIndexRangeQuery * fix column filter, add mixed column test	2024-08-02 00:06:50 +08:00
PSeitz	1b4076691f	refactor fast field query (#2452 ) As preparation of #2023 and #1709 * Use Term to pass parameters * merge u64 and ip fast field range query Side note: I did not rename range_query_u64_fastfield, because then git can't track the changes.	2024-07-15 18:08:05 +08:00
落叶乌龟	f9ae295507	feat(query): Make `BooleanQuery` supports `minimum_number_should_match` (#2405 ) * feat(query): Make `BooleanQuery` supports `minimum_number_should_match`. see issue #2398 In this commit, a novel scorer named DisjunctionScorer is introduced, which performs the union of inverted chains with the minimal required elements. BTW, it's implemented via a min-heap. Necessary modifications on `BooleanQuery` and `BooleanWeight` are performed as well. * fixup! fix test * fixup!: refactor code. 1. More meaningful names. 2. Add Cache for `Disjunction`'s scorers, and fix bug. 3. Optimize `BooleanWeight::complex_scorer` Thanks Paul Masurel <paul@quickwit.io> * squash!: come up with better variable naming. * squash!: fix naming issues. * squash!: fix typo. * squash!: Remove CombinationMethod::FullIntersection	2024-07-01 15:39:41 +08:00
Igor Motov	19325132b7	Fast-field based implementation of ExistsQuery (#2160 ) Adds an implementation of ExistsQuery that takes advantage of fast fields. Fixes #2159	2023-09-07 11:51:49 +09:00
Paul Masurel	f28ddb711e	Exposing u64-based FastFieldRangeWeight (#2024 )	2023-05-03 18:32:00 +09:00
Paul Masurel	4b01cc4c49	Made BooleanWeight and BoostWeight public (#1991 )	2023-04-12 10:26:30 +09:00
Paul Masurel	d002698008	Re-export of query grammar. (#1908 )	2023-02-27 12:26:34 +09:00
trinity-1686a	533ad99cd5	add PhrasePrefixQuery (#1842 ) * add PhrasePrefixQuery	2023-02-22 11:18:33 +01:00
Alex Cole	f2f38c43ce	Make BM25 scoring more flexible (#1855 ) * Introduce Bm25StatisticsProvider to inject statistics * fix formatting I accidentally changed	2023-02-16 19:14:12 +09:00
PSeitz	07a51eb7c8	refactor multivalue fastfield, refactor range query (#1749 ) Introduce MakeZero trait, remove make_zero from FastValue Merge two multivalue fastfield implementations into one prepare range query on fastfield for different types	2023-01-05 12:09:50 +01:00
Paul Masurel	3edf0a2724	Using the manual reload policy in IndexWriter. (#1667 )	2022-11-09 11:20:41 +01:00
Pascal Seitz	6bb73a527f	add range query via ip fast field	2022-10-24 16:00:38 +08:00
trinity-1686a	a86b0df6f4	Add query matching terms in a set (#1539 )	2022-09-28 09:43:18 +02:00
Shikhar Bhushan	4c6c6e4a9c	`ConstScoreQuery` (#1463 )	2022-08-24 06:37:34 +09:00
Adam Reichold	71ab482720	RFC: Use a more general but still object-safe signature for Query::query_terms. (#1468 ) * Use a more general but still object-safe signature for Query::query_terms. * Further constraint the generalized Query::query_terms signature to allow extracting references to terms.	2022-08-24 06:34:07 +09:00
Pasha Podolsky	09aae134e6	[feat] Implement `DisjunctionMaxQuery` and refactor `ScoreCombiner`	2022-07-28 20:47:20 +03:00
Paul Masurel	eca6628b3c	Minor refactoring (#1266 )	2022-01-28 15:55:55 +09:00
Paul Masurel	6e4b61154f	Issue/1070 (#1071 ) Add a boolean flag in the Query::query_terms informing on whether position information is required. Closes #1070	2021-06-03 22:33:20 +09:00
Paul Masurel	fd8e5bdf57	Rename more like this	2021-05-21 16:32:39 +09:00
Evance Souamoro	2c0f6e3319	add builder to the public for documentation	2021-04-29 12:38:16 +00:00
Evance Souamoro	cfc27c9665	add support for more like this query	2021-04-29 11:49:27 +00:00
Paul Masurel	39dd8cfe24	Cargo clippy. Acronym should not be full uppercase apparently.	2021-04-26 11:49:18 +09:00
Paul Masurel	9e27da8b4e	Added CR comments. Added Unit tests.	2020-10-28 17:35:34 +09:00
Paul Masurel	2481c87be8	Block wand (#856 )	2020-08-19 22:36:36 +09:00
Paul Masurel	c0be461191	Removing tantivy-fst conf and removing warning. (#813 )	2020-04-18 20:19:23 +09:00
Paul Masurel	186d7fc20e	Fix build	2020-04-01 09:32:45 +09:00
Paul Masurel	7d6cfa58e1	[WIP] Alternative take on boosted queries (#772 ) * Alternative take on boosted queries * Fixing unit test * Added boosting to the query grammar. * Made BoostQuery public. * Added support for boosting field in QueryParser Closes #547	2020-02-19 11:04:38 +09:00
Paul Masurel	4b9c1dce69	Moving queyr grammar to a different crate. (#645 )	2019-09-05 09:37:28 +09:00
Paul Masurel	462774b15c	Tiqb feature/2018 (#583 ) * rust 2018 * Added CHANGELOG comment	2019-07-01 10:01:46 +09:00
Paul Masurel	4822940b19	Issue/36 (#559 ) * Added explanation * Explain * Splitting weight and idf * Added comments Closes #36	2019-06-06 10:03:54 +09:00
Paul Masurel	a6e767c877	Cargo fmt	2018-11-30 22:52:45 +09:00
Paul Masurel	07d87e154b	Collector refactoring and multithreaded search (#437 ) * Split Collector into an overall Collector and a per-segment SegmentCollector. Precursor to cross-segment parallelism, and as a side benefit cleans up any per-segment fields from being Option<T> to just T. * Attempt to add MultiCollector back * working. Chained collector is broken though * Fix chained collector * Fix test * Make Weight Send+Sync for parallelization purposes * Expose parameters of RangeQuery for external usage * Removed &mut self * fixing tests * Restored TestCollectors * blop * multicollector working * chained collector working * test broken * fixing unit test * blop * blop * Blop * simplifying APi * blop * better syntax * Simplifying top_collector * refactoring * blop * Sync with master * Added multithread search * Collector refactoring * Schema::builder * CR and rustdoc * CR comments * blop * Added an executor * Sorted the segment readers in the searcher * Update searcher.rs * Fixed unit testst * changed the place where we have the sort-segment-by-count heuristic * using crossbeam::channel * inlining * Comments about panics propagating * Added unit test for executor panicking * Readded default * Removed Default impl * Added unit test for executor	2018-11-30 22:46:59 +09:00
Paul Masurel	5449ec3c11	Snippet term score (#423 )	2018-09-16 10:21:02 +09:00
Paul Masurel	37e4280c0a	Cargo Format (#420 )	2018-09-15 07:44:22 +09:00
Paul Masurel	e32dba1a97	Phrase weight	2018-09-10 09:26:33 +09:00
Paul Masurel	6704ab6987	Added methods to extract the matching terms. First stab	2018-08-30 09:47:19 +09:00
Paul Masurel	a0a284fe91	Added a full fledge empty query and relyign on it in QueryParser, instead of using an empty clause.	2018-08-20 09:21:32 +09:00
Dru Sellers	e301e0bc87	Add some simple doc tests (#320 ) * Add TopCollector doc test * Add CountCollector Doc Test * Add Doc Test for MultiCollector * Add ChainedCollector Doc Test * Expose Fuzzy Query where it should be * Add FuzzyTermQuery Doc Test * Expose RegexQuery * Regex Query Doc Test * Add TermQuery Doc Test * Add doc comments * fix test 🤦 * Added explanation about the complexity variables * Fixing unit tests * Single threads if you check docids	2018-06-19 10:45:20 +09:00
Dru Sellers	317baf4e75	Add in simple regex query support (#319 ) * Add fst_regex crate in * Reduce API surface area This doesn't need to be public * better test name * Pull Automaton weight out so it can be shared * Implement Regex Query	2018-06-16 14:08:30 +09:00
Dru Sellers	6f7b099370	Add AutomatonWeight to a fuzzy_search module and FuzzyQuery (#300 ) * Add AutomatonWeight to a fuzzy_search module * Hacking around ownership issues * Working through lifetime issues * Working through tests * fix test by lower casing the words (reducing distance) * code review changes * Suggestion on how to solve the borrow problem * clean up	2018-06-11 22:23:03 +09:00
Paul Masurel	78673172d0	Cargo fmt	2018-04-21 20:05:36 +09:00
Paul Masurel	3ae03b91ae	PhraseScorer's score aligned with that of Lucene.)	2018-03-25 12:44:16 +09:00
Paul Masurel	1b94a3e382	Phrase query optimisation	2018-02-23 00:00:22 +09:00
Paul Masurel	e423784fd0	Added specialized SegmentPostings when there are no DeleteSet	2018-02-21 23:49:20 +09:00
Paul Masurel	2f242d5f52	Moving docset around	2018-02-19 12:07:05 +09:00
Paul Masurel	1da06d867b	Using the same logic when score is enabled.	2018-02-16 17:36:33 +09:00
Paul Masurel	76e8db6ed3	blop	2018-02-16 14:57:08 +09:00
Paul Masurel	31e5580bfa	Renaming intersection / exclude	2018-02-16 11:55:56 +09:00

1 2

84 Commits