Commit Graph

183 Commits

Author SHA1 Message Date
PSeitz
9e2faecf5b add memory limit for aggregations (#1942)
* add memory limit for aggregations

introduce AggregationLimits to set memory consumption limit and bucket limits
memory limit is checked during aggregation, bucket limit is checked before returning the aggregation request.

* Apply suggestions from code review

Co-authored-by: Paul Masurel <paul@quickwit.io>

* add ByteCount with human readable format

---------

Co-authored-by: Paul Masurel <paul@quickwit.io>
2023-03-16 06:21:07 +01:00
PSeitz
cbcafae04c fix: doc store for files larger 4GB (#1856)
Fixes an issue in the skip list deserialization, which deserialized the byte start offset incorrectly as u32.
`get_doc` will fail for any docs that live in a block with start offset larger than u32::MAX (~4GB).
Causes index corruption, if a segment with a doc store larger 4GB is merged.

tantivy version 0.19 is affected
2023-02-10 14:29:43 +01:00
Paul Masurel
405e2cf4d9 Merge with main 2023-02-09 14:28:57 +01:00
Paul Masurel
bd5eea9852 Integrated columnar work. 2023-02-09 13:14:31 +01:00
PSeitz
0f20787917 fix doc store cache docs (#1821)
* fix doc store cache docs

addresses an issue reported in #1820

* rename doc_store_cache_size
2023-01-23 07:06:49 +01:00
Adam Reichold
82a183bc2d Bump dependency on lru to from version 0.7.5 to version 0.9.0. (#1755) 2023-01-10 13:35:37 +09:00
Paul Masurel
f39165e1e7 Moving FileSlice to tantivy-common (#1729) 2022-12-21 16:35:11 +09:00
Paul Masurel
32cb1d22da Removed AsyncIoResult. (#1728) 2022-12-21 16:01:17 +09:00
PSeitz
f9171a3981 fix clippy (#1725)
* fix clippy

* fix clippy fastfield codecs

* fix clippy bitpacker

* fix clippy common

* fix clippy stacker

* fix clippy sstable

* fmt
2022-12-20 07:30:06 +01:00
PSeitz
509a265659 add docstore version (#1652)
* add docstore version

closes #1589

* assert for docstore version
2022-11-04 10:19:16 +09:00
Pascal Seitz
a4485f7611 faster skipindex deserialization, larger blocksize on sort 2022-10-18 19:32:23 +08:00
Pascal Seitz
129f7422f5 remove unused buffer 2022-10-14 20:01:10 +08:00
PSeitz
8b69aab0fc avoid prepare_doc allocation (#1610)
avoid prepare_doc allocation, ~10% more thoughput best case
2022-10-11 14:15:55 +09:00
Bruce Mitchener
b3bf9a5716 Documentation improvements. 2022-10-05 14:18:10 +07:00
trinity-1686a
5945dbf0bd change format for store to make it faster with small documents (#1569)
* use new format for docstore blocks

* move index to end of block

it makes writing the block faster due to one less memcopy
2022-10-04 09:58:55 +02:00
PSeitz
fadd784a25 log improvements (#1564) 2022-09-30 09:39:26 +09:00
Bruce Mitchener
cf02e32578 Improvements to doc linking, grammar, etc. 2022-09-19 18:10:22 +07:00
Bruce Mitchener
6a88ac3fe3 Documentation improvements.
Fix some linking, some grammar, some typos, etc.
2022-09-18 18:05:37 +07:00
Paul Masurel
817225edfb Allow for a same-thread doc compressor. (#1510)
In addition, it isolates the doc compressor logic,
better reports io::Result.

In the case of the same-thread doc compressor,
the blocks are also not copied.
2022-09-13 15:32:48 +09:00
Paul Masurel
8e775b6c3d Refactoring dyn Column (#1502) 2022-09-02 17:26:30 +09:00
Kian-Meng Ang
014b1adc3e cargo +nightly fmt 2022-08-17 22:33:44 +08:00
Kian-Meng Ang
84295d5b35 cargo fmt 2022-08-15 21:07:01 +08:00
Kian-Meng Ang
625bcb4877 Fix typos and markdowns
Found via these commands:

    codespell -L crate,ser,panting,beauti,hart,ue,atleast,childs,ond,pris,hel,mot
    markdownlint *.md doc/src/*.md --disable MD013 MD025 MD033 MD001 MD024 MD036 MD041 MD003
2022-08-13 18:25:47 +08:00
Pascal Seitz
5750224d4c set docstore cache size at construction 2022-07-04 14:27:55 +08:00
Pascal Seitz
9db2f0e82b expose doc store cache size
expose lru doc store cache size
optimize doc store cache size
2022-07-04 13:54:41 +08:00
PSeitz
9baefbe2ab Update src/store/writer.rs
Co-authored-by: Paul Masurel <paul@quickwit.io>
2022-06-23 15:34:21 +08:00
PSeitz
ad76d11008 Update src/store/writer.rs
Co-authored-by: Paul Masurel <paul@quickwit.io>
2022-06-23 15:34:21 +08:00
PSeitz
c3220bece0 Update src/store/writer.rs
Co-authored-by: Paul Masurel <paul@quickwit.io>
2022-06-23 15:34:21 +08:00
PSeitz
2b713f0977 Update src/store/writer.rs
Co-authored-by: Paul Masurel <paul@quickwit.io>
2022-06-23 15:34:21 +08:00
Pascal Seitz
0bc6b4a117 renames and refactoring 2022-06-23 15:34:21 +08:00
PSeitz
79e42d4a6d Update src/store/writer.rs
Co-authored-by: Paul Masurel <paul@quickwit.io>
2022-06-23 15:34:21 +08:00
PSeitz
0135fbc4c8 Update src/store/writer.rs
Co-authored-by: Paul Masurel <paul@quickwit.io>
2022-06-23 15:34:21 +08:00
PSeitz
449594f67a Update src/store/writer.rs
Co-authored-by: Paul Masurel <paul@quickwit.io>
2022-06-23 15:34:21 +08:00
Pascal Seitz
8b6647e908 move writer to compressor thread 2022-06-23 15:34:21 +08:00
PSeitz
efabcbcdf5 Update src/store/writer.rs
Co-authored-by: Paul Masurel <paul@quickwit.io>
2022-06-23 15:34:21 +08:00
Pascal Seitz
7bf5962554 merge match, explicit type 2022-06-23 15:34:21 +08:00
Pascal Seitz
4c7dedef29 use seperate thread to compress block store
Use seperate thread to compress block store for increased indexing performance. This allows to use slower compressors with higher compression ratio, with less or no perfomance impact (with enough cores).

A seperate thread is spawned to compress the docstore, which handles single blocks and stacking from other docstores.
The spawned compressor thread does not write, instead it sends back the compressed data. This is done in order to avoid writing multithreaded on the same file.
2022-06-23 15:34:21 +08:00
Antoine G
11e4225f23 doc fix (#1391)
Documentation fix.
2022-06-21 15:53:33 +09:00
Pascal Seitz
4d9d2b6db0 split into compressor/decompressor
use custom de/serializer for compressor
accept parameters like zstd(compression_level=5) as compressor
2022-06-02 23:29:24 +08:00
Pascal Seitz
ed868f93a3 enable setting compression level 2022-06-02 16:47:29 +08:00
Pascal Seitz
314ae43a45 fix fmt 2022-06-02 14:54:23 +08:00
Pascal Seitz
fce91b2f3a vec without capacity 2022-06-02 13:50:18 +08:00
Pascal Seitz
9bcd2b8104 fix read_block_async 2022-06-02 13:37:52 +08:00
Pascal Seitz
0c9c257150 move cache handling into single function 2022-06-02 13:25:29 +08:00
Pascal Seitz
1af85a2956 accept usize instead &usize 2022-06-02 11:23:36 +08:00
Pascal Seitz
bc4c3d0c6b add peek_lru test 2022-06-02 11:13:17 +08:00
Pascal Seitz
6937c75f05 hide advanced doc store api 2022-06-02 11:13:17 +08:00
Pascal Seitz
e54429e827 expose doc store functions
expose doc store functions for advanced usage
refactor cache
expose cache statistics
remove unnecessary arc
unduplicate code
2022-06-02 11:13:17 +08:00
Kryesh
fc045e6bf9 Cleanup imports, remove unneeded error mapping 2022-05-19 10:34:02 +10:00
Kryesh
6837a4d468 Fix bench 2022-05-18 20:35:29 +10:00