Commit Graph

153 Commits

Author SHA1 Message Date
PSeitz
5d4535de83 Changelog fix (#1717) 2022-12-12 14:28:42 +09:00
PSeitz
485a8f507e Update CHANGELOG.md 2022-11-28 15:41:31 +01:00
Paul Masurel
8ca12a5683 Added stop word filter to CHANGELOG.md 2022-11-09 17:00:45 +09:00
Pascal Seitz
af839753e0 No score calls if score is not requested 2022-10-26 12:18:35 +08:00
Pascal Seitz
e772d3170d switch get_val() to u32
Fixes #1638
2022-10-24 19:05:57 +08:00
Paul Masurel
94313b62f8 Hotfix issue/1629 - position broken (#1633)
* Bugfix position broken.

For Field with several FieldValues, with a
value that contained no token at all, the token position
was reinitialized to 0.

As a result, PhraseQueries can show some false positives.
In addition, after the computation of the position delta, we can
underflow u32, and end up with gigantic delta.

We haven't been able to actually explain the bug in 1629, but it
is assumed that in some corner case these delta can cause a panic.

Closes #1629
2022-10-20 11:03:55 +09:00
Pascal Seitz
84f9e77e1d update CHANGELOG 2022-10-14 15:10:33 +08:00
Kian-Meng Ang
625bcb4877 Fix typos and markdowns
Found via these commands:

    codespell -L crate,ser,panting,beauti,hart,ue,atleast,childs,ond,pris,hel,mot
    markdownlint *.md doc/src/*.md --disable MD013 MD025 MD033 MD001 MD024 MD036 MD041 MD003
2022-08-13 18:25:47 +08:00
PSeitz
23fe73a6c0 remove searcher pool and make Searcher cloneable (#1411)
* remove searcher pool and make Searcher cloneable

closes #1410

* use SearcherInner in InnerIndexReader
2022-07-12 18:07:48 +09:00
Evance Soumaoro
a4be239d38 Updated DateTime to hold timestamp in microseconds, while making date field precision configurable (#1396) 2022-07-12 10:04:28 +09:00
Ryan Russell
ca836b6414 Improve Docs Readability (#1380)
Signed-off-by: Ryan Russell <git@ryanrussell.org>
2022-06-02 09:32:57 +09:00
Paul Masurel
3171f0b9ba Added ZSTD support in CHANGELOG 2022-05-25 21:51:46 +09:00
Pascal Seitz
24432bf523 add term aggregation 2022-04-13 19:51:18 +08:00
Pascal Seitz
dac73537d2 update changelog 2022-04-04 14:15:40 +08:00
Uwe Klotz
125707dbe0 Replace chrono with time (#1307)
For date values `chrono` has been replaced with `time` 
- The `time` crate is re-exported as `tantivy::time` instead of `tantivy::chrono`.
- The type alias `tantivy::DateTime` has been removed.
- `Value::Date` wraps `time::PrimitiveDateTime` without time zone information.
- Internally date/time values are stored as seconds since UNIX epoch in UTC.
- Converting a `time::OffsetDateTime` to `Value::Date` implicitly converts the value into UTC.
If this is not desired do the time zone conversion yourself and use `time::PrimitiveDateTime`
directly instead.

Closes #1304
2022-03-21 10:50:19 +09:00
Paul Masurel
387592809f Updated CHANGELOG 2022-03-07 15:31:35 +09:00
Paul Masurel
d7b46d2137 Added JSON Type (#1270)
- Removed useless copy when ingesting JSON.
- Bugfix in phrase query with a missing field norms.
- Disabled range query on default fields

Closes #1251
2022-02-24 16:25:22 +09:00
PSeitz
972cb6c26d Aggregation (#1276)
Added support for aggregation compatible with Elasticsearch's API.
2022-02-21 09:59:11 +09:00
Paul Masurel
9679c5f306 Rename quickwit-inc -> quickwit-oss 2022-01-27 15:37:09 +09:00
Shikhar Bhushan
99d4b1a177 Searcher Warming API (#1261)
Adds an API to register Warmers in the IndexReader.

Co-authored-by: Paul Masurel <paul@quickwit.io>
2022-01-20 23:40:25 +09:00
Shikhar Bhushan
e5e252cbc0 LogMergePolicy knob del_docs_percentage_before_merge (#1238)
Add a knob to LogMergePolicy to always merge segments that exceed a threshold of deleted docs

Closes #115
2021-12-20 13:14:56 +09:00
Paul Masurel
c81b3030fa Issue/922b (#1233)
* Add a NORMED options on field

Make fieldnorm indexation optional:

* for all types except text => added a NORMED options
* for text field
** if STRING, field has not fieldnorm retained
** if TEXT, field has fieldnorm computed

* Finalize making fieldnorm optional for all field types.

- Using Option for fieldnorm readers.
2021-12-10 21:12:29 +09:00
Paul Masurel
098eea843a Reducing the number of call to fsync on the directory. (#1228)
This work by introducing a new API method in the Directory
trait. The user needs to explicitely call this method.
(In particular, once before a commmit)

Closes #1225
2021-12-03 03:10:52 +00:00
Paul Masurel
dde49ac8e2 Closes #1195 (#1222)
Removes the indexed option for facets.
Facets are now always indexed.

Closes #1195
2021-12-02 14:37:19 +09:00
PSeitz
c503c6e4fa Switch to non-strict schema (#1216)
Fixes #1211
2021-11-29 10:38:59 +09:00
PSeitz
3a78402496 update links (#1176) 2021-10-18 20:45:40 +09:00
Paul Masurel
b52abbc771 Bugfix transposition_cost_one in FuzzyQuery (#1167) 2021-10-07 09:38:39 +09:00
PSeitz
352e0cc58d Adde demux operation (#1150)
* add merge for DeleteBitSet, allow custom DeleteBitSet on merge
* forward delete bitsets on merge, add tests
* add demux operation and tests
2021-10-06 16:05:16 +09:00
Paul Masurel
46b86a7976 Bounced version and edited changelog 2021-09-10 23:05:09 +09:00
Paul Masurel
3b247fd968 Version bump 2021-08-19 10:12:30 +09:00
Paul Masurel
d58497529b Fixed CHANGELOG to include 0.15.2. 2021-06-30 16:34:47 +09:00
Paul Masurel
de92f094aa Closes #1101 fix delete documents with sort by field
Closes #1101

* fix delete documents with sort by field

Co-authored-by: Andre-Philippe Paquet <appaquet@gmail.com>
2021-06-30 15:51:32 +09:00
Paul Masurel
7ef25ec400 Bump to 0.15.1 to publish bugfix 2021-06-14 18:45:38 +09:00
Paul Masurel
abe6b4baec Bumped tantivy version to 0.15 2021-06-07 09:52:48 +09:00
Paul Masurel
6e4b61154f Issue/1070 (#1071)
Add a boolean flag in the Query::query_terms informing on whether
position information is required.

Closes #1070
2021-06-03 22:33:20 +09:00
Stéphane Campinas
41ea14840d add benchmark of term streams merge (#1024)
* add benchmark of term streams merge
* use union based on FST for merging the term dictionaries
* Rename TermMerger benchmark
2021-05-31 23:15:01 +09:00
PSeitz
8d32c3ba3a Change Footer version handling, Make compression dynamic (#1060)
Change Footer version handling, Make compression dynamic

Change Footer version handling
Simplify version handling by switching to JSON instead of binary serialization.
fixes #1058

Make compression dynamic
Instead of choosing the compression during compile time via a feature flag, you can now have multiple compression algorithms enabled and decide during runtime which one to choose via IndexSettings. Changing the compression algorithm on an index is also supported. The information which algorithm was used in the doc store is stored in the DocStoreFooter. The default is the lz4 block format.
fixes #904

Handle merging of different compressors
Fix feature flag names
Add doc store test for all compressors
2021-05-28 14:57:20 +09:00
PSeitz
d523543dc7 Sort Index/Docids By Field (#1026)
* sort index by field

add sort info to IndexSettings
generate docid mapping for sorted field (only fastfield)
remap singlevalue fastfield

* support docid mapping in multivalue fastfield

move docid mapping to serialization step (less intermediate data for mapping)
add support for docid mapping in multivalue fastfield

* handle docid map in bytes fastfield

* forward docid mapping, remap postings

* fix merge conflicts

* move test to index_sorter

* add docid index mapping old->new

add docid mapping for both directions old->new (used in postings) and new->old (used in fast field)
handle mapping in postings recorder
warn instead of info for MAX_TOKEN_LEN

* remap docid in fielnorm

* resort docids in recorder, more extensive tests

* handle index sorting in docstore

handle index sort in docstore, by saving all the docs in a temp docstore file (SegmentComponent::TempStore). On serialization the docid mapping is used to create a docstore in the correct order by reader the old docstore.

add docstore sort tests
refactor tests

* refactor

rename docid doc_id
rename docid_map doc_id_map
rename DocidMapping DocIdMapping
fix  typo

* u32 to DocId

* better doc_id_map creation

remove unstable sort

* add non mut method to FastFieldWriters

add _mut prefix to &mut methods

* remove sort_index

* fix clippy issues

* fix SegmentComponent iterator

use std::mem::replace

* fix test

* fmt

* handle indexsettings deserialize

* add reading, writing bytes to doc store

get bytes of document in doc store
add store_bytes method doc writer to accept serialized document
add serialization index settings test

* rename index_sorter to doc_id_mapping

use bufferlender in recorder

* fix compile issue, make sort_by_field optional

* fix test compile

* validate index settings on merge

validate index settings on merge
forward merge info to SegmentSerializer (for TempStore)

* fix doctest

* add itertools, use kmerge

add itertools, use kmerge
push because rustfmt fails

* implement/test merge for fastfield

implement/test merge for fastfield
rename len to num_deleted in DeleteBitSet

* Use precalculated docid mapping in merger

Use precalculated docid mapping in merger for sorted indices instead of on the fly calculation 
Add index creation macro benchmark, but commented out for now, since it is not really usable due to long runtimes, and extreme fluctuations. May be better suited in criterion or an external bench bin

* fix fast field reader docs

fix fast field reader docs, Error instead of None returned
add u64s_lenient to fastreader
add create docid mapping benchmark

* add test for multifast field merge

refactor test 
add test for multifast field merge

* add num_bytes to BytesFastFieldReader

equivalent to num_vals in MultiValuedFastFieldReader

* add MultiValueLength trait

add MultiValueLength trait in order to unify index creation for BytesFastFieldReader and MultiValuedFastFieldReader in merger

* Add ReaderWithOrdinal, fix 

Add ReaderWithOrdinal to associate data to a reader in merger
Fix bytes offset index creation in merger

* add test for merging bytes with sorted docids

* Merge fieldnorm for sorted index

* handle posting list in merge in sorted index

handle posting list in merge in sorted index by using doc id mapping for sorting
reuse SegmentOrdinal type

* handle doc store order in merge in sorted index

* fix typo, cleanup

* make IndexSetting non-optional

* fix type, rename test file

fix type
rename test file
add  type

* remove SegmentReaderWithOrdinal accessors

* cargo fmt

* add index sort & merge test to include deletes

* Fix posting list merge issue

Fix posting list merge issue - ensure serializer always gets monotonically increasing doc ids
handle sorting and merging for facets field

* performance: cache field readers, use bytes for doc store merge

* change facet merge test to cover index sorting

* add RawDocument abstraction to access bytes in doc store

* fix deserialization, update changelog

fix deserialization
update changelog
forward error on merge failed

* cache store readers to utilize lru cache (4x performance)

cache store readers, to utilize lru cache (4x faster performance, due to less decompress calls on the block)

* add include_temp_doc_store flag in InnerSegmentMeta

unset flag on deserialization and after finalize of a segment
set flag when creating new instances
2021-05-17 22:20:57 +09:00
Evance Soumaoro
dfed8896b9 Merge branch 'main' into issue-more-like-this-query 2021-05-03 10:08:38 +00:00
Evance Souamoro
d71aa57077 reusing idf from bm25 module as it was the same logic 2021-05-03 10:05:40 +00:00
Pascal Seitz
537021e12d upate CHANGELOG 2021-05-03 09:09:42 +02:00
Paul Masurel
2dc5403e7b Closes #1022 2021-04-26 14:01:14 +09:00
Paul Masurel
39dd8cfe24 Cargo clippy. Acronym should not be full uppercase apparently. 2021-04-26 11:49:18 +09:00
Paul Masurel
fef428a9c6 Updated CHANGELOG 2021-04-19 21:58:52 +09:00
Paul Masurel
39320f953c Update CHANGELOG with date range queries 2021-04-19 10:03:41 +09:00
Paul Masurel
5c1ce5b0e1 Edited CHANGELOG 2021-04-12 12:02:25 +09:00
Paul Masurel
38a20ae269 Renamed SegmentLocalId to SegmentOrdinal for more homogeneity and edited
changelog
2021-03-29 09:25:42 +09:00
Paul Masurel
d2d0873fdb Added support for Option<TCollector>. 2021-03-18 17:28:09 +09:00
Paul Masurel
761298ff00 Added an histogram collector.
Closes #994
2021-03-18 16:54:42 +09:00
Paul Masurel
2ab25d994f Updated Changelog. Closing #991 2021-03-10 14:14:21 +09:00