* read path for new fst based index
* implement BlockAddrStoreWriter
* extract slop/derivation computation
* use better linear approximator and allow negative correction to approximator
* document format and reorder some fields
* optimize single block sstable size
* plug backward compat
* add fields_metadata to SegmentReader, add columnar docs
* use schema to resolve field, add test
* normalize paths
* merge for FieldsMetadata, add fields_metadata on Index
* Update src/core/segment_reader.rs
Co-authored-by: Paul Masurel <paul@quickwit.io>
* merge code paths
* add Hash
* move function oustide
---------
Co-authored-by: Paul Masurel <paul@quickwit.io>
* add missing aggregation part 2
Add missing support for:
- Mixed types columns
- Key of type string on numerical fields
The special aggregation is slower than the integrated one in TermsAggregation and therefore not
chosen by default, although it can cover all use cases.
* simplify, add num_docs to empty
* lazy columnar merge
This is the first part of addressing #3633
Instead of loading all Column into memory for the merge, only the current column_name
group is loaded. This can be done since the sstable streams the columns lexicographically.
* refactor
* add rustdoc
* replace iterator with BTreeMap
* switch to ms in histogram for date type
switch to ms in histogram, by adding a normalization step that converts
to nanoseconds precision when creating the collector.
closes#2028
related to #2026
* add missing unit long variants
* use single thread to avoid handling test case
* fix docs
* revert CI
* cleanup
* improve docs
* Update src/aggregation/bucket/histogram/histogram.rs
Co-authored-by: Paul Masurel <paul@quickwit.io>
---------
Co-authored-by: Paul Masurel <paul@quickwit.io>
Applied this command to the code, making it a bit shorter and slightly
more readable.
```
cargo +nightly clippy --all-features --benches --tests --workspace --fix -- -A clippy::all -W clippy::uninlined_format_args
cargo +nightly fmt --all
```
* compress sstable with zstd
* add some details to sstable readme
* compress only block which benefit from it
* multiple changes to sstable
make compression optional
use OwnedBytes instead of impl Read in sstable, required for next point
use zstd bulk api, which is much faster on small records
* cleanup and use bulk api for compression
* use dedicated byte for compression
* switch block len and compression flag
* change default zstd level in sstable
* Added proptest on columnar merge with a shuffle
Made column serialization more explicit.
Bugfix when a bytes column is missing, and with a shuffle.
Improved the cardinality detection logic / column detection.
* Code review
* CR comments
* Following CR
* Faster range queries
This PR does several changes
- ip compact space now uses u32
- the bitunpacker now gets a get_batch function
- we push down range filtering, removing GCD / shift in the bitpacking
codec.
- we rely on AVX2 routine to do the filtering.
* Apply suggestions from code review
* Apply suggestions from code review
* CR comments
* document a new sstable format
* add support for changing target block size
* use new format for sstable index
* handle sstable version errror
* use very small blocks for proptests
* add a footer structure
* add memory limit for aggregations
introduce AggregationLimits to set memory consumption limit and bucket limits
memory limit is checked during aggregation, bucket limit is checked before returning the aggregation request.
* Apply suggestions from code review
Co-authored-by: Paul Masurel <paul@quickwit.io>
* add ByteCount with human readable format
---------
Co-authored-by: Paul Masurel <paul@quickwit.io>
dynamic dispatch seems to be really expensive, move the buffer in front of the dynamic dispatch, to reduce the number of calls into the dynamic dispatched collector.
* Adding a write schema to columnar's merge operations.
* Added unit test checking min/max when columns are empty.
* CR comment
* Rename to value_type_to_column_type
`ColumnValues` wrongly located in column_values/column.rs due to
historical reason moves to column_values/mod.rs
u128 stuff gets its own directory like u64 stuff.
The new fast field code, based on columnar, had a larger minimum memory
footprint, causing the first docuemnt to trigger a flush of the asegment
in this unit test.
This PR prevents the allocation of a large capacity for the different hashmap tables
using in the columnar writer.
Closes#1859