Commit Graph

428 Commits

Author SHA1 Message Date
Bruce Mitchener
cb252a42af docs: "associated to" -> "associated with" (#1557)
This reads better this way.
2022-09-26 20:23:37 +09:00
Bruce Mitchener
ea8e6d7b1d Tidy up clippy config. (#1547)
* Checking cfg_attr is no longer necessary.
* Don't need multiple `clippy::` prefixes on a name.
2022-09-26 09:37:55 +09:00
Bruce Mitchener
6a88ac3fe3 Documentation improvements.
Fix some linking, some grammar, some typos, etc.
2022-09-18 18:05:37 +07:00
Paul Masurel
4e350c5f1b Clippy 2022-09-02 13:05:00 +09:00
Paul Masurel
08c4412d73 Adding dragon API to build index without any thread. (#1496)
Closes #1487
2022-09-01 10:32:36 +09:00
Paul Masurel
a451f6d60d Minor refactoring. (#1495) 2022-08-31 12:00:58 +09:00
Kian-Meng Ang
014b1adc3e cargo +nightly fmt 2022-08-17 22:33:44 +08:00
Kian-Meng Ang
84295d5b35 cargo fmt 2022-08-15 21:07:01 +08:00
Kian-Meng Ang
625bcb4877 Fix typos and markdowns
Found via these commands:

    codespell -L crate,ser,panting,beauti,hart,ue,atleast,childs,ond,pris,hel,mot
    markdownlint *.md doc/src/*.md --disable MD013 MD025 MD033 MD001 MD024 MD036 MD041 MD003
2022-08-13 18:25:47 +08:00
Kanji Yomoda
af84e74284 Replace deprecated std package's constants on floats and integers (#1420) 2022-07-22 08:05:08 +09:00
Antoine G
11e4225f23 doc fix (#1391)
Documentation fix.
2022-06-21 15:53:33 +09:00
boraarslan
26a0fd1fbe cargo fmt 2022-06-07 10:09:37 +03:00
boraarslan
2981e6c1df First commit 2022-06-07 10:09:37 +03:00
Pascal Seitz
8807bfd13d fast field on string
enables FAST on string fields, which creates a fastfield containing the term ordinals
2022-03-29 12:40:10 +08:00
Antoine G
e37775fe21 iff->if or if and only if (#1298)
* has_xxx is_xxx -> if, these function usualy define equivalence
xxx returns bool -> specify equivalence when appropriate

* fix doc
2022-03-02 11:00:00 +09:00
Paul Masurel
5004290daa Return an error on certain type of corruption. (#1296) 2022-03-01 11:35:56 +09:00
Paul Masurel
2ead010c83 Tantivy quickwit (#1293)
* Added sstable and enabling it by default, and parallel boolean query.
* Added async API for FileSlice.
* Added async get_doc
* Reduce blocksize to 32_000
* Added debug logs

Quickwit specific feature a hidden behind the quickwit feature flag.
2022-02-25 17:32:49 +09:00
Paul Masurel
d7b46d2137 Added JSON Type (#1270)
- Removed useless copy when ingesting JSON.
- Bugfix in phrase query with a missing field norms.
- Disabled range query on default fields

Closes #1251
2022-02-24 16:25:22 +09:00
Paul Masurel
d37633e034 Minor changes in indexing. (#1285) 2022-02-21 17:16:52 +09:00
Paul Masurel
4dc80cfa25 Removes TokenStream chain. (#1283)
This change is mostly motivated by the introduction of json object.

We need to be able to inject a position object to make the position
shift.
2022-02-21 09:51:27 +09:00
Paul Masurel
e028515caf Simplified expull code. (#1281) 2022-02-18 18:57:10 +09:00
Paul Masurel
bdedefe07d Adding an IndexingContext object (#1268) 2022-02-04 15:08:01 +09:00
Paul Masurel
eca6628b3c Minor refactoring (#1266) 2022-01-28 15:55:55 +09:00
Paul Masurel
732f6847c0 Field type with codes (#1255)
* Term are now typed.

This change is backward compatible:
While the Term has a byte representation that is modified, a Term itself
is a transient object that is not serialized as is in the index.

Its .field() and .value_bytes() on the other hand are unchanged.
This change offers better Debug information for terms.

While not necessary it also will help in the support for JSON types.

* Renamed Hierarchical Facet -> Facet
2022-01-07 20:49:00 +09:00
Paul Masurel
3ea6800ac5 Pleasing clippy (#1253) 2022-01-06 16:41:24 +09:00
Paul Masurel
c81b3030fa Issue/922b (#1233)
* Add a NORMED options on field

Make fieldnorm indexation optional:

* for all types except text => added a NORMED options
* for text field
** if STRING, field has not fieldnorm retained
** if TEXT, field has fieldnorm computed

* Finalize making fieldnorm optional for all field types.

- Using Option for fieldnorm readers.
2021-12-10 21:12:29 +09:00
Paul Masurel
ebdbb6bd2e Fixing compilation warnings & clippy comments. 2021-12-10 16:47:59 +09:00
Paul Masurel
dde49ac8e2 Closes #1195 (#1222)
Removes the indexed option for facets.
Facets are now always indexed.

Closes #1195
2021-12-02 14:37:19 +09:00
Kanji Yomoda
bd0f9211da Remove unused sort for segmenta meta list (#1218)
* Remove unused sort for segment meta list
* Fix segment meta order dependent test
2021-12-01 11:18:17 +09:00
Paul Masurel
7234bef0eb Issue/1198 (#1201)
* Unit test reproducing #1198
* Fixing unit test to handle the error from add_document.
* Bump project version
2021-11-11 16:42:19 +09:00
Paul Masurel
d18ac136c0 Search simplified (#1175) 2021-10-18 12:52:43 +09:00
Paul Masurel
02cffa4dea Code simplification. (#1169)
Code simplification and Clippy
2021-10-07 14:11:44 +09:00
Paul Masurel
0855649986 Leaning more on the alive (vs delete) semantics. (#1164) 2021-10-05 18:53:29 +09:00
Pascal Seitz
8d8315f8d0 prealloc vec in postinglist 2021-09-29 09:02:38 +08:00
Pascal Seitz
d7a6a409a1 renames 2021-09-23 20:33:11 +08:00
Pascal Seitz
a1f5cead96 AliveBitSet instead of DeleteBitSet 2021-09-23 20:03:57 +08:00
sigaloid
096ce7488e Resolve some clippys, format (#1144)
* cargo +nightly clippy --fix -Z unstable-options
2021-08-26 08:46:00 +09:00
Pascal Seitz
3265f7bec3 dissolve common module 2021-08-19 23:26:34 +01:00
Pascal Seitz
ee0881712a move bitset to common crate, move composite file to directory 2021-08-19 17:45:09 +01:00
Pascal Seitz
f379a80233 test doc_freq and term_freq in sorted index 2021-08-03 11:38:05 +01:00
Paul Masurel
44e8cf98a5 Cargo fmt 2021-07-30 15:30:01 +09:00
Paul Masurel
f0ee69d9e9 Remove the complicated block search logic for a simpler branchless (#1124)
binary search

The code is simpler and faster.

Before
test postings::bench::bench_segment_intersection                                                                         ... bench:   2,093,697 ns/iter (+/- 115,509)
test postings::bench::bench_skip_next_p01                                                                                ... bench:      58,585 ns/iter (+/- 796)
test postings::bench::bench_skip_next_p1                                                                                 ... bench:     160,872 ns/iter (+/- 5,164)
test postings::bench::bench_skip_next_p10                                                                                ... bench:     615,229 ns/iter (+/- 25,108)
test postings::bench::bench_skip_next_p90                                                                                ... bench:   1,120,509 ns/iter (+/- 22,271)

After
test postings::bench::bench_segment_intersection                                                                         ... bench:   1,747,726 ns/iter (+/- 52,867)
test postings::bench::bench_skip_next_p01                                                                                ... bench:      55,205 ns/iter (+/- 714)
test postings::bench::bench_skip_next_p1                                                                                 ... bench:     131,433 ns/iter (+/- 2,814)
test postings::bench::bench_skip_next_p10                                                                                ... bench:     478,830 ns/iter (+/- 12,794)
test postings::bench::bench_skip_next_p90                                                                                ... bench:     931,082 ns/iter (+/- 31,468)
2021-07-30 14:38:42 +09:00
Pascal Seitz
0062fe705d cargo fmt 2021-07-01 18:17:08 +02:00
Pascal Seitz
9b3e508753 fix clippy 2021-07-01 18:06:09 +02:00
Pascal Seitz
e496ae0470 clippy fixes 2021-07-01 17:43:50 +02:00
Pascal Seitz
1e4df54ab3 fix clippy 2021-07-01 17:41:53 +02:00
Pascal Seitz
2de249af74 clippy fixes 2021-07-01 17:37:37 +02:00
Pascal Seitz
10f056fbb4 apply clippy fixes 2021-07-01 17:08:44 +02:00
Pascal Seitz
167d88b449 fix tests behind unstable feature flag 2021-06-14 15:31:12 +02:00
PSeitz
d523543dc7 Sort Index/Docids By Field (#1026)
* sort index by field

add sort info to IndexSettings
generate docid mapping for sorted field (only fastfield)
remap singlevalue fastfield

* support docid mapping in multivalue fastfield

move docid mapping to serialization step (less intermediate data for mapping)
add support for docid mapping in multivalue fastfield

* handle docid map in bytes fastfield

* forward docid mapping, remap postings

* fix merge conflicts

* move test to index_sorter

* add docid index mapping old->new

add docid mapping for both directions old->new (used in postings) and new->old (used in fast field)
handle mapping in postings recorder
warn instead of info for MAX_TOKEN_LEN

* remap docid in fielnorm

* resort docids in recorder, more extensive tests

* handle index sorting in docstore

handle index sort in docstore, by saving all the docs in a temp docstore file (SegmentComponent::TempStore). On serialization the docid mapping is used to create a docstore in the correct order by reader the old docstore.

add docstore sort tests
refactor tests

* refactor

rename docid doc_id
rename docid_map doc_id_map
rename DocidMapping DocIdMapping
fix  typo

* u32 to DocId

* better doc_id_map creation

remove unstable sort

* add non mut method to FastFieldWriters

add _mut prefix to &mut methods

* remove sort_index

* fix clippy issues

* fix SegmentComponent iterator

use std::mem::replace

* fix test

* fmt

* handle indexsettings deserialize

* add reading, writing bytes to doc store

get bytes of document in doc store
add store_bytes method doc writer to accept serialized document
add serialization index settings test

* rename index_sorter to doc_id_mapping

use bufferlender in recorder

* fix compile issue, make sort_by_field optional

* fix test compile

* validate index settings on merge

validate index settings on merge
forward merge info to SegmentSerializer (for TempStore)

* fix doctest

* add itertools, use kmerge

add itertools, use kmerge
push because rustfmt fails

* implement/test merge for fastfield

implement/test merge for fastfield
rename len to num_deleted in DeleteBitSet

* Use precalculated docid mapping in merger

Use precalculated docid mapping in merger for sorted indices instead of on the fly calculation 
Add index creation macro benchmark, but commented out for now, since it is not really usable due to long runtimes, and extreme fluctuations. May be better suited in criterion or an external bench bin

* fix fast field reader docs

fix fast field reader docs, Error instead of None returned
add u64s_lenient to fastreader
add create docid mapping benchmark

* add test for multifast field merge

refactor test 
add test for multifast field merge

* add num_bytes to BytesFastFieldReader

equivalent to num_vals in MultiValuedFastFieldReader

* add MultiValueLength trait

add MultiValueLength trait in order to unify index creation for BytesFastFieldReader and MultiValuedFastFieldReader in merger

* Add ReaderWithOrdinal, fix 

Add ReaderWithOrdinal to associate data to a reader in merger
Fix bytes offset index creation in merger

* add test for merging bytes with sorted docids

* Merge fieldnorm for sorted index

* handle posting list in merge in sorted index

handle posting list in merge in sorted index by using doc id mapping for sorting
reuse SegmentOrdinal type

* handle doc store order in merge in sorted index

* fix typo, cleanup

* make IndexSetting non-optional

* fix type, rename test file

fix type
rename test file
add  type

* remove SegmentReaderWithOrdinal accessors

* cargo fmt

* add index sort & merge test to include deletes

* Fix posting list merge issue

Fix posting list merge issue - ensure serializer always gets monotonically increasing doc ids
handle sorting and merging for facets field

* performance: cache field readers, use bytes for doc store merge

* change facet merge test to cover index sorting

* add RawDocument abstraction to access bytes in doc store

* fix deserialization, update changelog

fix deserialization
update changelog
forward error on merge failed

* cache store readers to utilize lru cache (4x performance)

cache store readers, to utilize lru cache (4x faster performance, due to less decompress calls on the block)

* add include_temp_doc_store flag in InnerSegmentMeta

unset flag on deserialization and after finalize of a segment
set flag when creating new instances
2021-05-17 22:20:57 +09:00