PSeitz
9e2faecf5b
add memory limit for aggregations ( #1942 )
...
* add memory limit for aggregations
introduce AggregationLimits to set memory consumption limit and bucket limits
memory limit is checked during aggregation, bucket limit is checked before returning the aggregation request.
* Apply suggestions from code review
Co-authored-by: Paul Masurel <paul@quickwit.io >
* add ByteCount with human readable format
---------
Co-authored-by: Paul Masurel <paul@quickwit.io >
2023-03-16 06:21:07 +01:00
PSeitz
cbcafae04c
fix: doc store for files larger 4GB ( #1856 )
...
Fixes an issue in the skip list deserialization, which deserialized the byte start offset incorrectly as u32.
`get_doc` will fail for any docs that live in a block with start offset larger than u32::MAX (~4GB).
Causes index corruption, if a segment with a doc store larger 4GB is merged.
tantivy version 0.19 is affected
2023-02-10 14:29:43 +01:00
Paul Masurel
405e2cf4d9
Merge with main
2023-02-09 14:28:57 +01:00
Paul Masurel
bd5eea9852
Integrated columnar work.
2023-02-09 13:14:31 +01:00
PSeitz
0f20787917
fix doc store cache docs ( #1821 )
...
* fix doc store cache docs
addresses an issue reported in #1820
* rename doc_store_cache_size
2023-01-23 07:06:49 +01:00
Adam Reichold
82a183bc2d
Bump dependency on lru to from version 0.7.5 to version 0.9.0. ( #1755 )
2023-01-10 13:35:37 +09:00
Paul Masurel
f39165e1e7
Moving FileSlice to tantivy-common ( #1729 )
2022-12-21 16:35:11 +09:00
Paul Masurel
32cb1d22da
Removed AsyncIoResult. ( #1728 )
2022-12-21 16:01:17 +09:00
PSeitz
f9171a3981
fix clippy ( #1725 )
...
* fix clippy
* fix clippy fastfield codecs
* fix clippy bitpacker
* fix clippy common
* fix clippy stacker
* fix clippy sstable
* fmt
2022-12-20 07:30:06 +01:00
PSeitz
509a265659
add docstore version ( #1652 )
...
* add docstore version
closes #1589
* assert for docstore version
2022-11-04 10:19:16 +09:00
Pascal Seitz
a4485f7611
faster skipindex deserialization, larger blocksize on sort
2022-10-18 19:32:23 +08:00
Pascal Seitz
129f7422f5
remove unused buffer
2022-10-14 20:01:10 +08:00
PSeitz
8b69aab0fc
avoid prepare_doc allocation ( #1610 )
...
avoid prepare_doc allocation, ~10% more thoughput best case
2022-10-11 14:15:55 +09:00
Bruce Mitchener
b3bf9a5716
Documentation improvements.
2022-10-05 14:18:10 +07:00
trinity-1686a
5945dbf0bd
change format for store to make it faster with small documents ( #1569 )
...
* use new format for docstore blocks
* move index to end of block
it makes writing the block faster due to one less memcopy
2022-10-04 09:58:55 +02:00
PSeitz
fadd784a25
log improvements ( #1564 )
2022-09-30 09:39:26 +09:00
Bruce Mitchener
cf02e32578
Improvements to doc linking, grammar, etc.
2022-09-19 18:10:22 +07:00
Bruce Mitchener
6a88ac3fe3
Documentation improvements.
...
Fix some linking, some grammar, some typos, etc.
2022-09-18 18:05:37 +07:00
Paul Masurel
817225edfb
Allow for a same-thread doc compressor. ( #1510 )
...
In addition, it isolates the doc compressor logic,
better reports io::Result.
In the case of the same-thread doc compressor,
the blocks are also not copied.
2022-09-13 15:32:48 +09:00
Paul Masurel
8e775b6c3d
Refactoring dyn Column ( #1502 )
2022-09-02 17:26:30 +09:00
Kian-Meng Ang
014b1adc3e
cargo +nightly fmt
2022-08-17 22:33:44 +08:00
Kian-Meng Ang
84295d5b35
cargo fmt
2022-08-15 21:07:01 +08:00
Kian-Meng Ang
625bcb4877
Fix typos and markdowns
...
Found via these commands:
codespell -L crate,ser,panting,beauti,hart,ue,atleast,childs,ond,pris,hel,mot
markdownlint *.md doc/src/*.md --disable MD013 MD025 MD033 MD001 MD024 MD036 MD041 MD003
2022-08-13 18:25:47 +08:00
Pascal Seitz
5750224d4c
set docstore cache size at construction
2022-07-04 14:27:55 +08:00
Pascal Seitz
9db2f0e82b
expose doc store cache size
...
expose lru doc store cache size
optimize doc store cache size
2022-07-04 13:54:41 +08:00
PSeitz
9baefbe2ab
Update src/store/writer.rs
...
Co-authored-by: Paul Masurel <paul@quickwit.io >
2022-06-23 15:34:21 +08:00
PSeitz
ad76d11008
Update src/store/writer.rs
...
Co-authored-by: Paul Masurel <paul@quickwit.io >
2022-06-23 15:34:21 +08:00
PSeitz
c3220bece0
Update src/store/writer.rs
...
Co-authored-by: Paul Masurel <paul@quickwit.io >
2022-06-23 15:34:21 +08:00
PSeitz
2b713f0977
Update src/store/writer.rs
...
Co-authored-by: Paul Masurel <paul@quickwit.io >
2022-06-23 15:34:21 +08:00
Pascal Seitz
0bc6b4a117
renames and refactoring
2022-06-23 15:34:21 +08:00
PSeitz
79e42d4a6d
Update src/store/writer.rs
...
Co-authored-by: Paul Masurel <paul@quickwit.io >
2022-06-23 15:34:21 +08:00
PSeitz
0135fbc4c8
Update src/store/writer.rs
...
Co-authored-by: Paul Masurel <paul@quickwit.io >
2022-06-23 15:34:21 +08:00
PSeitz
449594f67a
Update src/store/writer.rs
...
Co-authored-by: Paul Masurel <paul@quickwit.io >
2022-06-23 15:34:21 +08:00
Pascal Seitz
8b6647e908
move writer to compressor thread
2022-06-23 15:34:21 +08:00
PSeitz
efabcbcdf5
Update src/store/writer.rs
...
Co-authored-by: Paul Masurel <paul@quickwit.io >
2022-06-23 15:34:21 +08:00
Pascal Seitz
7bf5962554
merge match, explicit type
2022-06-23 15:34:21 +08:00
Pascal Seitz
4c7dedef29
use seperate thread to compress block store
...
Use seperate thread to compress block store for increased indexing performance. This allows to use slower compressors with higher compression ratio, with less or no perfomance impact (with enough cores).
A seperate thread is spawned to compress the docstore, which handles single blocks and stacking from other docstores.
The spawned compressor thread does not write, instead it sends back the compressed data. This is done in order to avoid writing multithreaded on the same file.
2022-06-23 15:34:21 +08:00
Antoine G
11e4225f23
doc fix ( #1391 )
...
Documentation fix.
2022-06-21 15:53:33 +09:00
Pascal Seitz
4d9d2b6db0
split into compressor/decompressor
...
use custom de/serializer for compressor
accept parameters like zstd(compression_level=5) as compressor
2022-06-02 23:29:24 +08:00
Pascal Seitz
ed868f93a3
enable setting compression level
2022-06-02 16:47:29 +08:00
Pascal Seitz
314ae43a45
fix fmt
2022-06-02 14:54:23 +08:00
Pascal Seitz
fce91b2f3a
vec without capacity
2022-06-02 13:50:18 +08:00
Pascal Seitz
9bcd2b8104
fix read_block_async
2022-06-02 13:37:52 +08:00
Pascal Seitz
0c9c257150
move cache handling into single function
2022-06-02 13:25:29 +08:00
Pascal Seitz
1af85a2956
accept usize instead &usize
2022-06-02 11:23:36 +08:00
Pascal Seitz
bc4c3d0c6b
add peek_lru test
2022-06-02 11:13:17 +08:00
Pascal Seitz
6937c75f05
hide advanced doc store api
2022-06-02 11:13:17 +08:00
Pascal Seitz
e54429e827
expose doc store functions
...
expose doc store functions for advanced usage
refactor cache
expose cache statistics
remove unnecessary arc
unduplicate code
2022-06-02 11:13:17 +08:00
Kryesh
fc045e6bf9
Cleanup imports, remove unneeded error mapping
2022-05-19 10:34:02 +10:00
Kryesh
6837a4d468
Fix bench
2022-05-18 20:35:29 +10:00