Commit Graph

263 Commits

Author SHA1 Message Date
boraarslan
38c2ea6a5d Remove unnecessary line 2022-06-07 10:09:37 +03:00
boraarslan
26a0fd1fbe cargo fmt 2022-06-07 10:09:37 +03:00
boraarslan
811b91ecb3 Edit and add tests 2022-06-07 10:09:37 +03:00
boraarslan
bc4cd9ffaa typo fix 2022-06-07 10:09:37 +03:00
boraarslan
7cca7e6a47 Fix of last commit 2022-06-07 10:09:37 +03:00
boraarslan
ef2492dba6 Broken commit 2022-06-07 10:09:37 +03:00
boraarslan
2981e6c1df First commit 2022-06-07 10:09:37 +03:00
Ryan Russell
b33b4c0092 Fix various occurrence var names and references (#1385)
Thank you Ryan!

Signed-off-by: Ryan Russell <git@ryanrussell.org>
2022-06-07 11:08:19 +09:00
Paul Masurel
617ba1f0c0 Bugfix in the document deserialization. (#1368)
Deserializing a json field does not expect the
end of the document anymore.

This behavior is well documented in serde_json.
https://docs.serde.rs/serde_json/fn.from_reader.html

Closes #1366
2022-05-11 11:38:10 +09:00
Paul Masurel
2f1cd7e7f0 Bugfix in the document deserialization. (#1367)
Deserializing a json field does not expect the
end of the document anymore.

This behavior is well documented in serde_json.
https://docs.serde.rs/serde_json/fn.from_reader.html

Closes #1366
2022-05-11 11:27:04 +09:00
Pascal Seitz
bc607a921b add alias shard_size split_size for quickwit
improve some docs
2022-05-06 17:52:36 +08:00
Pascal Seitz
6614a2cba0 fix is_fast for bytes field 2022-04-20 12:02:38 +08:00
Pascal Seitz
706fbd6886 fix DateTime naming, fix docs, cleanup 2022-04-13 13:01:00 +08:00
Pascal Seitz
bb5254de12 always serialize, use enum as param 2022-04-04 13:50:23 +08:00
Pascal Seitz
ec9478830a add text test
move get multiple values to test code
remove sorting term ids per docidi for non facets
2022-03-30 11:31:33 +08:00
Pascal Seitz
8807bfd13d fast field on string
enables FAST on string fields, which creates a fastfield containing the term ordinals
2022-03-29 12:40:10 +08:00
Uwe Klotz
125707dbe0 Replace chrono with time (#1307)
For date values `chrono` has been replaced with `time` 
- The `time` crate is re-exported as `tantivy::time` instead of `tantivy::chrono`.
- The type alias `tantivy::DateTime` has been removed.
- `Value::Date` wraps `time::PrimitiveDateTime` without time zone information.
- Internally date/time values are stored as seconds since UNIX epoch in UTC.
- Converting a `time::OffsetDateTime` to `Value::Date` implicitly converts the value into UTC.
If this is not desired do the time zone conversion yourself and use `time::PrimitiveDateTime`
directly instead.

Closes #1304
2022-03-21 10:50:19 +09:00
Paul Masurel
46d5de920d Removes all usage of block_on, and use a oneshot channel instead. (#1315)
* Removes all usage of block_on, and use a oneshot channel instead.

Calling `block_on` panics in certain context.
For instance, it panics when it is called in a the context of another
call to block.

Using it in tantivy is unnecessary. We replace it by a thin wrapper
around a oneshot channel that supports both async/sync.

* Removing needless uses of async in the API.

Co-authored-by: PSeitz <PSeitz@users.noreply.github.com>
2022-03-18 16:54:58 +09:00
PSeitz
8a5a12d961 add setter to json object options (#1311) 2022-03-16 10:36:30 +09:00
PSeitz
b105bf72e1 use defaults in meta.json (#1310)
This change allows to have unset fields in meta.json and fall back to their defaults
Currently it is required to explicitly put e.g. fieldnorms: false
2022-03-14 13:54:06 +09:00
Antoine G
e37775fe21 iff->if or if and only if (#1298)
* has_xxx is_xxx -> if, these function usualy define equivalence
xxx returns bool -> specify equivalence when appropriate

* fix doc
2022-03-02 11:00:00 +09:00
PSeitz
c4f66eb185 improve validation in aggregation, extend invalid field test (#1292)
* improve validation in aggregation, extend invalid field test

improve validation in aggregation
extend invalid field test
Fixes #1291

* collect fast field names on request structure

* fix visibility of AggregationSegmentCollector
2022-02-25 15:21:19 +09:00
Paul Masurel
d7b46d2137 Added JSON Type (#1270)
- Removed useless copy when ingesting JSON.
- Bugfix in phrase query with a missing field norms.
- Disabled range query on default fields

Closes #1251
2022-02-24 16:25:22 +09:00
Pascal Seitz
704498a1ac rename IntOptions to NumericOptions
keep IntOptions with deprecation warning
Fixes #1286
2022-02-21 22:20:07 +01:00
Paul Masurel
d37633e034 Minor changes in indexing. (#1285) 2022-02-21 17:16:52 +09:00
Paul Masurel
bdedefe07d Adding an IndexingContext object (#1268) 2022-02-04 15:08:01 +09:00
Paul Masurel
2069e3e52b Fixing clippy comments 2022-02-01 10:24:05 +09:00
Paul Masurel
eca6628b3c Minor refactoring (#1266) 2022-01-28 15:55:55 +09:00
Paul Masurel
732f6847c0 Field type with codes (#1255)
* Term are now typed.

This change is backward compatible:
While the Term has a byte representation that is modified, a Term itself
is a transient object that is not serialized as is in the index.

Its .field() and .value_bytes() on the other hand are unchanged.
This change offers better Debug information for terms.

While not necessary it also will help in the support for JSON types.

* Renamed Hierarchical Facet -> Facet
2022-01-07 20:49:00 +09:00
Paul Masurel
1c6d9bdc6a Comparison of Value based on serialization. (#1250) 2022-01-07 20:31:26 +09:00
Paul Masurel
3ea6800ac5 Pleasing clippy (#1253) 2022-01-06 16:41:24 +09:00
Paul Masurel
c81b3030fa Issue/922b (#1233)
* Add a NORMED options on field

Make fieldnorm indexation optional:

* for all types except text => added a NORMED options
* for text field
** if STRING, field has not fieldnorm retained
** if TEXT, field has fieldnorm computed

* Finalize making fieldnorm optional for all field types.

- Using Option for fieldnorm readers.
2021-12-10 21:12:29 +09:00
Paul Masurel
1d4e9a29db Cargo fmt 2021-12-02 15:51:44 +09:00
Paul Masurel
dde49ac8e2 Closes #1195 (#1222)
Removes the indexed option for facets.
Facets are now always indexed.

Closes #1195
2021-12-02 14:37:19 +09:00
PSeitz
c503c6e4fa Switch to non-strict schema (#1216)
Fixes #1211
2021-11-29 10:38:59 +09:00
Paul Masurel
7234bef0eb Issue/1198 (#1201)
* Unit test reproducing #1198
* Fixing unit test to handle the error from add_document.
* Bump project version
2021-11-11 16:42:19 +09:00
azerowall
fcff91559b Fix the deserialization error of FieldEntry when the 'options' field appears before the 'type' field (#1199)
Co-authored-by: quel <azerowall>
2021-11-10 18:39:58 +09:00
Paul Masurel
02cffa4dea Code simplification. (#1169)
Code simplification and Clippy
2021-10-07 14:11:44 +09:00
sigaloid
096ce7488e Resolve some clippys, format (#1144)
* cargo +nightly clippy --fix -Z unstable-options
2021-08-26 08:46:00 +09:00
Pascal Seitz
3265f7bec3 dissolve common module 2021-08-19 23:26:34 +01:00
Pascal Seitz
dc141cdb29 more docs detail
remove code duplicate
2021-08-13 17:40:13 +01:00
François Massot
f4b2e71800 Handle field names with any characters with a known set of special (#1109)
* Handle field names with any characters with a known set of special characters and an escape one

* Update field name validation rule to check only if it has at least one character and does not start with `-`

Closes #1087.
2021-07-05 22:31:36 +09:00
Pascal Seitz
9b3e508753 fix clippy 2021-07-01 18:06:09 +02:00
Pascal Seitz
1e4df54ab3 fix clippy 2021-07-01 17:41:53 +02:00
Pascal Seitz
86d0727659 add facet test
closes #1100
2021-07-01 15:36:17 +02:00
PSeitz
f05e84f964 add FieldEntry constructor, closes #1086 (#1090) 2021-06-17 10:15:48 +09:00
Moriyoshi Koizumi
4afba005f9 Provide a means to deal with malformed facet text representation for the query parser (#1056)
* Provide a means to deal with malformed facet text representation for the query parser.
* Specific error enum for the facet parse error.
2021-05-27 12:16:49 +09:00
PSeitz
d523543dc7 Sort Index/Docids By Field (#1026)
* sort index by field

add sort info to IndexSettings
generate docid mapping for sorted field (only fastfield)
remap singlevalue fastfield

* support docid mapping in multivalue fastfield

move docid mapping to serialization step (less intermediate data for mapping)
add support for docid mapping in multivalue fastfield

* handle docid map in bytes fastfield

* forward docid mapping, remap postings

* fix merge conflicts

* move test to index_sorter

* add docid index mapping old->new

add docid mapping for both directions old->new (used in postings) and new->old (used in fast field)
handle mapping in postings recorder
warn instead of info for MAX_TOKEN_LEN

* remap docid in fielnorm

* resort docids in recorder, more extensive tests

* handle index sorting in docstore

handle index sort in docstore, by saving all the docs in a temp docstore file (SegmentComponent::TempStore). On serialization the docid mapping is used to create a docstore in the correct order by reader the old docstore.

add docstore sort tests
refactor tests

* refactor

rename docid doc_id
rename docid_map doc_id_map
rename DocidMapping DocIdMapping
fix  typo

* u32 to DocId

* better doc_id_map creation

remove unstable sort

* add non mut method to FastFieldWriters

add _mut prefix to &mut methods

* remove sort_index

* fix clippy issues

* fix SegmentComponent iterator

use std::mem::replace

* fix test

* fmt

* handle indexsettings deserialize

* add reading, writing bytes to doc store

get bytes of document in doc store
add store_bytes method doc writer to accept serialized document
add serialization index settings test

* rename index_sorter to doc_id_mapping

use bufferlender in recorder

* fix compile issue, make sort_by_field optional

* fix test compile

* validate index settings on merge

validate index settings on merge
forward merge info to SegmentSerializer (for TempStore)

* fix doctest

* add itertools, use kmerge

add itertools, use kmerge
push because rustfmt fails

* implement/test merge for fastfield

implement/test merge for fastfield
rename len to num_deleted in DeleteBitSet

* Use precalculated docid mapping in merger

Use precalculated docid mapping in merger for sorted indices instead of on the fly calculation 
Add index creation macro benchmark, but commented out for now, since it is not really usable due to long runtimes, and extreme fluctuations. May be better suited in criterion or an external bench bin

* fix fast field reader docs

fix fast field reader docs, Error instead of None returned
add u64s_lenient to fastreader
add create docid mapping benchmark

* add test for multifast field merge

refactor test 
add test for multifast field merge

* add num_bytes to BytesFastFieldReader

equivalent to num_vals in MultiValuedFastFieldReader

* add MultiValueLength trait

add MultiValueLength trait in order to unify index creation for BytesFastFieldReader and MultiValuedFastFieldReader in merger

* Add ReaderWithOrdinal, fix 

Add ReaderWithOrdinal to associate data to a reader in merger
Fix bytes offset index creation in merger

* add test for merging bytes with sorted docids

* Merge fieldnorm for sorted index

* handle posting list in merge in sorted index

handle posting list in merge in sorted index by using doc id mapping for sorting
reuse SegmentOrdinal type

* handle doc store order in merge in sorted index

* fix typo, cleanup

* make IndexSetting non-optional

* fix type, rename test file

fix type
rename test file
add  type

* remove SegmentReaderWithOrdinal accessors

* cargo fmt

* add index sort & merge test to include deletes

* Fix posting list merge issue

Fix posting list merge issue - ensure serializer always gets monotonically increasing doc ids
handle sorting and merging for facets field

* performance: cache field readers, use bytes for doc store merge

* change facet merge test to cover index sorting

* add RawDocument abstraction to access bytes in doc store

* fix deserialization, update changelog

fix deserialization
update changelog
forward error on merge failed

* cache store readers to utilize lru cache (4x performance)

cache store readers, to utilize lru cache (4x faster performance, due to less decompress calls on the block)

* add include_temp_doc_store flag in InnerSegmentMeta

unset flag on deserialization and after finalize of a segment
set flag when creating new instances
2021-05-17 22:20:57 +09:00
Paul Masurel
39dd8cfe24 Cargo clippy. Acronym should not be full uppercase apparently. 2021-04-26 11:49:18 +09:00
Evance Souamoro
f82922b354 added a scratched of implementation but still need to craft one detail and write test to validate 2021-04-06 11:46:17 +00:00