* Add filter_vec benchmarks (dense, sparse, full coverage)
Uses get_ids_for_value_range to exercise both the bitpacking decode and
the filter_vec SIMD path together under realistic cache conditions.
* Add NEON and SVE implementations for filter_vec
Adds aarch64-specific SIMD paths (NEON always available on aarch64;
SVE gated on nightly + non-Apple target) with routing logic in mod.rs
that selects the best available instruction set at runtime.
* Using asm! to workaround the lack of stabilized SVE intrinsics
* showing instruction set
* improved proptesting
* removing build.rs
---------
Co-authored-by: Paul Masurel <paul.masurel@datadoghq.com>
* Improvement on the scalar / random bitpacker code.
Added proptesting
Added simple benchmark
Added assert and comments on the very non trivial hidden contract
Remove the need for an extra padding.
The last point introduces a small performance regression (~10%).
* Fixing unit tests