mirror of
https://github.com/quickwit-oss/tantivy.git
synced 2026-05-25 20:50:43 +00:00
fixes multivalue ff regression by avoiding using `get_val`. Line::train calls repeatedly get_val, but get_val implementation on Column for multivalues is very slow. The fix is to use the iterator instead. Longterm fix should be to remove get_val access in serialization. Old Code test fastfield::bench::bench_multi_value_ff_merge_few_segments ... bench: 46,103,960 ns/iter (+/- 2,066,083) test fastfield::bench::bench_multi_value_ff_merge_many_segments ... bench: 83,073,036 ns/iter (+/- 4,373,615) est fastfield::bench::bench_multi_value_ff_merge_many_segments_log_merge ... bench: 64,178,576 ns/iter (+/- 1,466,700) Current running 3 tests test fastfield::multivalued::bench::bench_multi_value_ff_merge_few_segments ... bench: 57,379,523 ns/iter (+/- 3,220,787) test fastfield::multivalued::bench::bench_multi_value_ff_merge_many_segments ... bench: 90,831,688 ns/iter (+/- 1,445,486) test fastfield::multivalued::bench::bench_multi_value_ff_merge_many_segments_log_merge ... bench: 158,313,264 ns/iter (+/- 28,823,250) With Fix running 3 tests test fastfield::multivalued::bench::bench_multi_value_ff_merge_few_segments ... bench: 57,635,671 ns/iter (+/- 2,707,361) test fastfield::multivalued::bench::bench_multi_value_ff_merge_many_segments ... bench: 91,468,712 ns/iter (+/- 11,393,581) test fastfield::multivalued::bench::bench_multi_value_ff_merge_many_segments_log_merge ... bench: 73,909,138 ns/iter (+/- 15,846,097)
Fast Field Codecs
This crate contains various fast field codecs, used to compress/decompress fast field data in tantivy.
Contributing
Contributing is pretty straightforward. Since the bitpacking is the simplest compressor, you can check it for reference.
A codec needs to implement 2 traits:
- A reader implementing
FastFieldCodecReaderto read the codec. - A serializer implementing
FastFieldCodecSerializerfor compression estimation and codec name + id.
Tests
Once the traits are implemented test and benchmark integration is pretty easy (see test_with_codec_data_sets and bench.rs).
Make sure to add the codec to the main.rs, which tests the compression ratio and estimation against different data sets. You can run it with:
cargo run --features bin
TODO
- Add real world data sets in comparison
- Add codec to cover sparse data sets
Codec Comparison
+----------------------------------+-------------------+------------------------+
| | Compression Ratio | Compression Estimation |
+----------------------------------+-------------------+------------------------+
| Autoincrement | | |
+----------------------------------+-------------------+------------------------+
| LinearInterpol | 0.000039572664 | 0.000004396963 |
+----------------------------------+-------------------+------------------------+
| MultiLinearInterpol | 0.1477348 | 0.17275847 |
+----------------------------------+-------------------+------------------------+
| Bitpacked | 0.28126493 | 0.28125 |
+----------------------------------+-------------------+------------------------+
| Monotonically increasing concave | | |
+----------------------------------+-------------------+------------------------+
| LinearInterpol | 0.25003937 | 0.26562938 |
+----------------------------------+-------------------+------------------------+
| MultiLinearInterpol | 0.190665 | 0.1883836 |
+----------------------------------+-------------------+------------------------+
| Bitpacked | 0.31251436 | 0.3125 |
+----------------------------------+-------------------+------------------------+
| Monotonically increasing convex | | |
+----------------------------------+-------------------+------------------------+
| LinearInterpol | 0.25003937 | 0.28125438 |
+----------------------------------+-------------------+------------------------+
| MultiLinearInterpol | 0.18676 | 0.2040086 |
+----------------------------------+-------------------+------------------------+
| Bitpacked | 0.31251436 | 0.3125 |
+----------------------------------+-------------------+------------------------+
| Almost monotonically increasing | | |
+----------------------------------+-------------------+------------------------+
| LinearInterpol | 0.14066513 | 0.1562544 |
+----------------------------------+-------------------+------------------------+
| MultiLinearInterpol | 0.16335973 | 0.17275847 |
+----------------------------------+-------------------+------------------------+
| Bitpacked | 0.28126493 | 0.28125 |
+----------------------------------+-------------------+------------------------+