Commit Graph

70 Commits

Author SHA1 Message Date
Harrison Burt
1c7c6fd591 POC: Tantivy documents as a trait (#2071)
* fix windows build (#1)

* Fix windows build

* Add doc traits

* Add field value iter

* Add value and serialization

* Adjust order

* Fix bug

* Correct type

* Fix generic bugs

* Reformat code

* Add generic to index writer which I forgot about

* Fix missing generics on single segment writer

* Add missing type export

* Add default methods for convenience

* Cleanup

* Fix more-like-this query to use standard types

* Update API and fix tests

* Add doc traits

* Add field value iter

* Add value and serialization

* Adjust order

* Fix bug

* Correct type

* Rebase main and fix conflicts

* Reformat code

* Merge upstream

* Fix missing generics on single segment writer

* Add missing type export

* Add default methods for convenience

* Cleanup

* Fix more-like-this query to use standard types

* Update API and fix tests

* Add tokenizer improvements from previous commits

* Add tokenizer improvements from previous commits

* Reformat

* Fix unit tests

* Fix unit tests

* Use enum in changes

* Stage changes

* Add new deserializer logic

* Add serializer integration

* Add document deserializer

* Implement new (de)serialization api for existing types

* Fix bugs and type errors

* Add helper implementations

* Fix errors

* Reformat code

* Add unit tests and some code organisation for serialization

* Add unit tests to deserializer

* Add some small docs

* Add support for deserializing serde values

* Reformat

* Fix typo

* Fix typo

* Change repr of facet

* Remove unused trait methods

* Add child value type

* Resolve comments

* Fix build

* Fix more build errors

* Fix more build errors

* Fix the tests I missed

* Fix examples

* fix numerical order, serialize PreTok Str

* fix coverage

* rename Document to TantivyDocument, rename DocumentAccess to Document

add Binary prefix to binary de/serialization

* fix coverage

---------

Co-authored-by: Pascal Seitz <pascal.seitz@gmail.com>
2023-10-02 10:01:16 +02:00
PSeitz
2d7390341c increase min memory to 15MB for indexing (#2176)
With tantivy 0.20 the minimum memory consumption per SegmentWriter increased to
12MB. 7MB are for the different fast field collectors types (they could be
lazily created). Increase the minimum memory from 3MB to 15MB.

Change memory variable naming from arena to budget.

closes #2156
2023-09-13 07:38:34 +02:00
Paul Masurel
4b01cc4c49 Made BooleanWeight and BoostWeight public (#1991) 2023-04-12 10:26:30 +09:00
Alex Cole
f2f38c43ce Make BM25 scoring more flexible (#1855)
* Introduce Bm25StatisticsProvider to inject statistics

* fix formatting I accidentally changed
2023-02-16 19:14:12 +09:00
Shikhar Bhushan
2650111b76 EnableScoring::Disabled - optional Searcher (#1780) 2023-01-12 09:26:50 -05:00
Paul Masurel
3edf0a2724 Using the manual reload policy in IndexWriter. (#1667) 2022-11-09 11:20:41 +01:00
Pasha Podolsky
09aae134e6 [feat] Implement DisjunctionMaxQuery and refactor ScoreCombiner 2022-07-28 20:47:20 +03:00
Paul Masurel
eca6628b3c Minor refactoring (#1266) 2022-01-28 15:55:55 +09:00
Paul Masurel
7234bef0eb Issue/1198 (#1201)
* Unit test reproducing #1198
* Fixing unit test to handle the error from add_document.
* Bump project version
2021-11-11 16:42:19 +09:00
François Massot
0462754673 Optimize block wand for one and several TermScorer. (#1190)
* Added optimisation using block wand for single TermScorer.

A proptest was also added.

* Fix block wand algorithm by taking the last doc id of scores until the pivot scorer (included).
* In block wand, when block max score is lower than the threshold, advance the scorer with best score.
* Fix wrong condition in block_wand_single_scorer and add debug_assert to have an equality check on doc to break the loop.
2021-11-01 09:18:05 +09:00
sigaloid
096ce7488e Resolve some clippys, format (#1144)
* cargo +nightly clippy --fix -Z unstable-options
2021-08-26 08:46:00 +09:00
Stéphane Campinas
a0ec6e1e9d Expand the DocAddress struct with named fields 2021-03-28 19:00:23 +02:00
Paul Masurel
36a0520a48 Added failing proptest and fixed it. 2020-11-05 15:40:00 +09:00
Paul Masurel
730ccefffb Fixes a bug in TermQuery::explain.
Closes #915
2020-10-28 22:29:15 +09:00
Adrien Guillo
7f373f232a Add helper methods for BooleanQuery 2020-10-28 17:35:34 +09:00
Paul Masurel
439d6956a9 Returning Result in some of the API (#880)
* Returning Result in some of the API

* Introducing `.writer_for_test(..)`
2020-09-07 15:52:34 +09:00
Paul Masurel
2481c87be8 Block wand (#856) 2020-08-19 22:36:36 +09:00
Paul Masurel
6db8bb49d6 Assert nearly equals macro (#853)
* Assert nearly equals macro

* Renamed specialized_scorer in TermScorer
2020-07-17 16:40:41 +09:00
Paul Masurel
f71b04acb0 Bugfix. (#849)
go_to_first_doc was typically calling seek with a target smaller than
doc.

Since SegmentPostings typically do a linear search on the full block,
regardless of the current position, it could have our segment postings
go backward.
2020-07-16 10:57:51 +09:00
Ype Kingma
7d773abc92 Boolean query: do not combine excluded scores. (#840)
* Do nothing when combining score values of excluded scores.

* Add test case for two excluded.

* Test score for two excluded terms.

* Use TopDocs in test_boolean_query_two_excluded
2020-06-08 20:01:19 +09:00
Paul Masurel
e25284bafe Major change in the DocSet/Scorer API (#824)
- Change in the DocSet and Scorer API. (@fulmicoton). 
A freshly created DocSet point directly to their first doc. A sentinel value called TERMINATED marks the end of a DocSet.
`.advance()` returns the new DocId. `Scorer::skip(target)` has been replaced by `Scorer::seek(target)` and returns the resulting DocId.
As a result, iterating through DocSet now looks as follows
```rust
let mut doc = docset.doc();
while doc != TERMINATED {
   // ...
   doc = docset.advance();
}
```
The change made it possible to greatly simplify a lot of the docset's code.
- Misc internal optimization and introduction of the `Scorer::for_each_pruning` function. (@fulmicoton)
2020-05-16 16:33:36 +09:00
Paul Masurel
d04368b1d4 Closes #788. OR not working when using conjunction by default. (#802) 2020-03-31 21:13:50 +09:00
Paul Masurel
7d6cfa58e1 [WIP] Alternative take on boosted queries (#772)
* Alternative take on boosted queries

* Fixing unit test

* Added boosting to the query grammar.

* Made BoostQuery public.

* Added support for boosting field in QueryParser

Closes #547
2020-02-19 11:04:38 +09:00
Paul Masurel
5c6580eb15 fmt (#661) 2019-10-04 12:10:01 +09:00
Paul Masurel
c3231ca252 Added phrase query tests (#601) 2019-07-22 13:43:00 +09:00
Paul Masurel
462774b15c Tiqb feature/2018 (#583)
* rust 2018

* Added CHANGELOG comment
2019-07-01 10:01:46 +09:00
Paul Masurel
4822940b19 Issue/36 (#559)
* Added explanation

* Explain

* Splitting weight and idf

* Added comments

Closes #36
2019-06-06 10:03:54 +09:00
Paul Masurel
663dd89c05 Feature/reader (#517)
Adding IndexReader to the API. Making it possible to watch for changes.

* Closes #500
2019-03-20 08:39:22 +09:00
Paul Masurel
50ed6fb534 Code cleanup
Fixed compilation without the mmap directory
2019-02-05 12:39:30 +01:00
Paul Masurel
4c93b096eb Rustfmt 2019-01-29 11:45:30 +01:00
Paul Masurel
6a547b0b5f Issue/483 (#484)
* Downcast_ref

* fixing unit test
2019-01-28 11:43:42 +01:00
Paul Masurel
63b593bd0a Lower RAM usage in tests. 2019-01-24 09:10:38 +09:00
Paul Masurel
a6e767c877 Cargo fmt 2018-11-30 22:52:45 +09:00
Paul Masurel
07d87e154b Collector refactoring and multithreaded search (#437)
* Split Collector into an overall Collector and a per-segment SegmentCollector. Precursor to cross-segment parallelism, and as a side benefit cleans up any per-segment fields from being Option<T> to just T.

* Attempt to add MultiCollector back

* working. Chained collector is broken though

* Fix chained collector

* Fix test

* Make Weight Send+Sync for parallelization purposes

* Expose parameters of RangeQuery for external usage

* Removed &mut self

* fixing tests

* Restored TestCollectors

* blop

* multicollector working

* chained collector working

* test broken

* fixing unit test

* blop

* blop

* Blop

* simplifying APi

* blop

* better syntax

* Simplifying top_collector

* refactoring

* blop

* Sync with master

* Added multithread search

* Collector refactoring

* Schema::builder

* CR and rustdoc

* CR comments

* blop

* Added an executor

* Sorted the segment readers in the searcher

* Update searcher.rs

* Fixed unit testst

* changed the place where we have the sort-segment-by-count heuristic

* using crossbeam::channel

* inlining

* Comments about panics propagating

* Added unit test for executor panicking

* Readded default

* Removed Default impl

* Added unit test for executor
2018-11-30 22:46:59 +09:00
Paul Masurel
0ba1cf93f7 Remove Searcher dereference (#419) 2018-09-14 09:54:26 +09:00
Paul Masurel
78673172d0 Cargo fmt 2018-04-21 20:05:36 +09:00
Paul Masurel
e44782bf14 No more 2018-04-12 13:01:11 +09:00
Paul Masurel
1d9566e73c Making mmap a feature 2018-03-31 13:23:43 +09:00
Paul Masurel
ffa03bad71 TermScorer does not handle deletes 2018-03-27 17:35:20 +09:00
Paul Masurel
9712a75399 Added unit test for intersection score 2018-03-25 12:58:24 +09:00
Paul Masurel
b0e5e1f61d Back merged master 2018-03-19 12:19:08 +09:00
Paul Masurel
a3b44773bb Bugfix and rustfmt 2018-03-10 12:21:50 +09:00
Paul Masurel
4ee2db25a0 Generic on Postings rather than deletes in TermScorer 2018-02-22 08:26:45 +09:00
Paul Masurel
e423784fd0 Added specialized SegmentPostings when there are no DeleteSet 2018-02-21 23:49:20 +09:00
Paul Masurel
2c3e33895a Added unit tests 2018-02-21 00:03:41 +09:00
Paul Masurel
d512b53688 Added handling of parenthesis in query parser 2018-02-20 23:18:02 +09:00
Paul Masurel
eb50e92ec4 Removed specialized postings on SegmentPostings 2018-02-18 00:09:15 +09:00
Paul Masurel
6676fe5717 Added a count method 2018-02-17 15:02:51 +09:00
Paul Masurel
1da06d867b Using the same logic when score is enabled. 2018-02-16 17:36:33 +09:00
Paul Masurel
76e8db6ed3 blop 2018-02-16 14:57:08 +09:00