* Add filter_vec benchmarks (dense, sparse, full coverage)
Uses get_ids_for_value_range to exercise both the bitpacking decode and
the filter_vec SIMD path together under realistic cache conditions.
* Add NEON and SVE implementations for filter_vec
Adds aarch64-specific SIMD paths (NEON always available on aarch64;
SVE gated on nightly + non-Apple target) with routing logic in mod.rs
that selects the best available instruction set at runtime.
* Using asm! to workaround the lack of stabilized SVE intrinsics
* showing instruction set
* improved proptesting
* removing build.rs
---------
Co-authored-by: Paul Masurel <paul.masurel@datadoghq.com>
* Faster range queries
This PR does several changes
- ip compact space now uses u32
- the bitunpacker now gets a get_batch function
- we push down range filtering, removing GCD / shift in the bitpacking
codec.
- we rely on AVX2 routine to do the filtering.
* Apply suggestions from code review
* Apply suggestions from code review
* CR comments
* Improvement on the scalar / random bitpacker code.
Added proptesting
Added simple benchmark
Added assert and comments on the very non trivial hidden contract
Remove the need for an extra padding.
The last point introduces a small performance regression (~10%).
* Fixing unit tests