Commit Graph

47 Commits

Author SHA1 Message Date
Bruce Mitchener
cb252a42af docs: "associated to" -> "associated with" (#1557)
This reads better this way.
2022-09-26 20:23:37 +09:00
Paul Masurel
2069e3e52b Fixing clippy comments 2022-02-01 10:24:05 +09:00
Paul Masurel
eca6628b3c Minor refactoring (#1266) 2022-01-28 15:55:55 +09:00
Paul Masurel
de92f094aa Closes #1101 fix delete documents with sort by field
Closes #1101

* fix delete documents with sort by field

Co-authored-by: Andre-Philippe Paquet <appaquet@gmail.com>
2021-06-30 15:51:32 +09:00
PSeitz
d523543dc7 Sort Index/Docids By Field (#1026)
* sort index by field

add sort info to IndexSettings
generate docid mapping for sorted field (only fastfield)
remap singlevalue fastfield

* support docid mapping in multivalue fastfield

move docid mapping to serialization step (less intermediate data for mapping)
add support for docid mapping in multivalue fastfield

* handle docid map in bytes fastfield

* forward docid mapping, remap postings

* fix merge conflicts

* move test to index_sorter

* add docid index mapping old->new

add docid mapping for both directions old->new (used in postings) and new->old (used in fast field)
handle mapping in postings recorder
warn instead of info for MAX_TOKEN_LEN

* remap docid in fielnorm

* resort docids in recorder, more extensive tests

* handle index sorting in docstore

handle index sort in docstore, by saving all the docs in a temp docstore file (SegmentComponent::TempStore). On serialization the docid mapping is used to create a docstore in the correct order by reader the old docstore.

add docstore sort tests
refactor tests

* refactor

rename docid doc_id
rename docid_map doc_id_map
rename DocidMapping DocIdMapping
fix  typo

* u32 to DocId

* better doc_id_map creation

remove unstable sort

* add non mut method to FastFieldWriters

add _mut prefix to &mut methods

* remove sort_index

* fix clippy issues

* fix SegmentComponent iterator

use std::mem::replace

* fix test

* fmt

* handle indexsettings deserialize

* add reading, writing bytes to doc store

get bytes of document in doc store
add store_bytes method doc writer to accept serialized document
add serialization index settings test

* rename index_sorter to doc_id_mapping

use bufferlender in recorder

* fix compile issue, make sort_by_field optional

* fix test compile

* validate index settings on merge

validate index settings on merge
forward merge info to SegmentSerializer (for TempStore)

* fix doctest

* add itertools, use kmerge

add itertools, use kmerge
push because rustfmt fails

* implement/test merge for fastfield

implement/test merge for fastfield
rename len to num_deleted in DeleteBitSet

* Use precalculated docid mapping in merger

Use precalculated docid mapping in merger for sorted indices instead of on the fly calculation 
Add index creation macro benchmark, but commented out for now, since it is not really usable due to long runtimes, and extreme fluctuations. May be better suited in criterion or an external bench bin

* fix fast field reader docs

fix fast field reader docs, Error instead of None returned
add u64s_lenient to fastreader
add create docid mapping benchmark

* add test for multifast field merge

refactor test 
add test for multifast field merge

* add num_bytes to BytesFastFieldReader

equivalent to num_vals in MultiValuedFastFieldReader

* add MultiValueLength trait

add MultiValueLength trait in order to unify index creation for BytesFastFieldReader and MultiValuedFastFieldReader in merger

* Add ReaderWithOrdinal, fix 

Add ReaderWithOrdinal to associate data to a reader in merger
Fix bytes offset index creation in merger

* add test for merging bytes with sorted docids

* Merge fieldnorm for sorted index

* handle posting list in merge in sorted index

handle posting list in merge in sorted index by using doc id mapping for sorting
reuse SegmentOrdinal type

* handle doc store order in merge in sorted index

* fix typo, cleanup

* make IndexSetting non-optional

* fix type, rename test file

fix type
rename test file
add  type

* remove SegmentReaderWithOrdinal accessors

* cargo fmt

* add index sort & merge test to include deletes

* Fix posting list merge issue

Fix posting list merge issue - ensure serializer always gets monotonically increasing doc ids
handle sorting and merging for facets field

* performance: cache field readers, use bytes for doc store merge

* change facet merge test to cover index sorting

* add RawDocument abstraction to access bytes in doc store

* fix deserialization, update changelog

fix deserialization
update changelog
forward error on merge failed

* cache store readers to utilize lru cache (4x performance)

cache store readers, to utilize lru cache (4x faster performance, due to less decompress calls on the block)

* add include_temp_doc_store flag in InnerSegmentMeta

unset flag on deserialization and after finalize of a segment
set flag when creating new instances
2021-05-17 22:20:57 +09:00
Paul Masurel
c23a03ad81 Large API Change in the Directory API. (#901)
Tantivy used to assume that all files could be somehow memory mapped. After this change, Directory return a `FileSlice` that can be reduced and eventually read into an `OwnedBytes` object. Long and blocking io operation are still required by they do not span over the entire file.
2020-10-08 16:36:51 +09:00
Paul Masurel
e0499118e2 Minor refactoring 2020-03-07 15:56:03 +09:00
Paul Masurel
ae14022bf0 Removed use::Result. (#771) 2020-01-31 18:47:02 +09:00
Paul Masurel
67bce6cbf2 Fixing the construction of the DeleteBitset. (#683)
Closes #681
2019-11-04 15:39:11 +09:00
Paul Masurel
462774b15c Tiqb feature/2018 (#583)
* rust 2018

* Added CHANGELOG comment
2019-07-01 10:01:46 +09:00
petr-tik
8ffae47854 Addressed code review
moved Opstamp to top-level namespace, added a docstring

Corrected minor typos/whitespace
2019-04-29 21:23:28 +01:00
petr-tik
8e50921363 Tidied up the Stamper module and upgraded to a 1.34 dependency
Added stamper.revert method to be used for rollback - rolling back to a previous
commit in case of deleting all documents or rolling operations back should reset
the stamper as well

Added type alias for Opstamp - helps code readibility instead of seeing u64
returned by functions.

Moved to AtomicU64 on stable rust (since 1.34) - where possible use standard
library interfaces.
2019-04-24 20:46:28 +01:00
Paul Masurel
8ebbf6b336 Issue/325 (#330)
* Introducing a SegmentMea inventory.
* Depending on census=0.1
* Cargo fmt
2018-06-30 13:11:41 +09:00
Paul Masurel
78673172d0 Cargo fmt 2018-04-21 20:05:36 +09:00
Paul Masurel
1e55189db1 NOBUG rustfmt 2017-12-14 19:30:31 +09:00
Paul Masurel
f24e5f405e NOBUG intellij misc lint 2017-12-14 18:23:35 +09:00
Paul Masurel
974c321153 cargo fmt 2017-11-26 11:02:02 +09:00
Paul Masurel
ac4d433fad Renamed analyzer to tokenizer 2017-11-24 16:50:32 +09:00
Paul Masurel
3588ca0561 Integrated with the merge branch 2017-09-09 15:27:19 +09:00
Paul Masurel
f8710bd4b0 Format 2017-08-28 18:22:41 +09:00
Paul Masurel
80ae136646 issue/198 Getting living_file after getting the list of managed files. 2017-07-24 18:46:41 +09:00
Paul Masurel
b05b5f5487 issue/191 Added an analyzer manager. 2017-06-20 10:02:26 +09:00
Paul Masurel
4c8f9742f8 format 2017-05-15 22:30:18 +09:00
Paul Masurel
dc43135fe0 NOBUG Remove .info 2017-04-08 18:49:37 +09:00
Paul Masurel
a84871468b issue/96 Rename FileError -> OpenReadError 2017-04-05 10:01:49 +09:00
Paul Masurel
e0a39fb273 issue/96 Added unit test, documentation and various tiny improvements. 2017-04-04 22:43:35 +09:00
Paul Masurel
17631ed866 issue/96 Added functionality to protect files from deletion
Hopefully fixed the race condition happening when merging files.
2017-04-02 18:48:20 +09:00
Paul Masurel
9eb2d3e8c5 issue/96 avoid removing the bitset from segment_entry. 2017-04-02 16:26:28 +09:00
Paul Masurel
c59507444f issue/77 ManagedDirectory working
Closes #77
2017-03-06 12:18:36 +09:00
Paul Masurel
4b7afa2ae7 issue/77 Added managed directory 2017-03-03 22:41:37 +09:00
Paul Masurel
a7f10f055d Nobug hidding doc, filling doc 2017-02-26 00:11:32 +09:00
Paul Masurel
597dac9cb6 NOBUG Adding doc. 2017-02-25 23:39:02 +09:00
Paul Masurel
d007cf3435 issue/43 simplification. removed the notion of delete cursor. 2017-02-19 22:39:04 +09:00
Paul Masurel
2fc3a505bc issue/43 refactoring segment meta 2017-02-19 22:39:04 +09:00
Paul Masurel
e337c35721 issue/43 SegmentMeta refactoring 2017-02-19 22:39:04 +09:00
Paul Masurel
0c318339b0 issue/43 Path logic in segment. 2017-02-19 22:39:04 +09:00
Paul Masurel
e12fc4bb09 issue/43 deletes
merge not working
only updating uncommitted
2017-02-19 22:39:04 +09:00
Paul Masurel
0820992141 issue/43 docstamp -> opstamp 2017-02-19 22:38:39 +09:00
Paul Masurel
09782858da issue/43 Segment have a commit opstamp 2017-02-19 22:38:39 +09:00
Paul Masurel
bacaabf857 issue/43 fixed on unit test. need big refactoring of segment updater 2017-02-19 22:38:38 +09:00
Paul Masurel
fba44b78b6 issue/43 Added delete doc file 2017-02-19 22:38:15 +09:00
Paul Masurel
2b7444b11a bug/4 Removed race condition in SegmentUpdater 2016-10-16 17:04:45 +09:00
Paul Masurel
0f246ba908 bug/4 Introduce segment_updater 2016-10-15 12:16:30 +09:00
Paul Masurel
9298a6ad9e bug/4 2016-10-01 19:03:36 +09:00
Paul Masurel
734866a77c issue/4 First tab, added segment register. Unit tests broken. Need to seed the random generator 2016-09-26 08:56:36 +09:00
Paul Masurel
5e806c88ef Issue 20 Searcher pool implemented using a channel.
Operational but not really ready for merge.
2016-08-27 16:15:02 +09:00
Paul Masurel
59150ad802 superficial refactoring 2016-08-26 09:30:09 +09:00