Files
tantivy/fastfield_codecs
2022-03-28 01:17:07 +02:00
..
2022-03-26 22:29:43 +01:00
2022-03-28 01:17:07 +02:00
2022-03-26 21:33:08 +01:00
2022-03-26 21:33:08 +01:00

Fast Field Codecs

This crate contains various fast field codecs, used to compress/decompress fast field data in tantivy.

Contributing

Contributing is pretty straightforward. Since the bitpacking is the simplest compressor, you can check it for reference.

A codec needs to implement 2 traits:

  • A reader implementing FastFieldCodecReader to read the codec.
  • A serializer implementing FastFieldCodecSerializer for compression estimation and codec name + id.

Download real world datasets for codecs comparison

Before comparing codecs, you need to execute make download to download real world datasets hosted on AWS S3. To run with the unstable codecs, execute cargo run --features unstable.

Tests

Once the traits are implemented test and benchmark integration is pretty easy (see test_with_codec_data_sets and bench.rs).

Make sure to add the codec to the main.rs, which tests the compression ratio and estimation against different data sets. You can run it with:

cargo run --features bin

TODO

  • Add codec to cover sparse data sets

Codec Comparison

+----------------------------------+-------------------+------------------------------+--------------------------+----------------------+
|                                  | Compression ratio | Compression ratio estimation | Compression time (micro) | Reading time (micro) |
+----------------------------------+-------------------+------------------------------+--------------------------+----------------------+
| Autoincrement                    |                   |                              |                          |                      |
+----------------------------------+-------------------+------------------------------+--------------------------+----------------------+
| PiecewiseLinear                  | 0.0051544965      | 0.17251475                   | 960                      | 211                  |
+----------------------------------+-------------------+------------------------------+--------------------------+----------------------+
| FOR                              | 0.118189104       | 0.14172314                   | 708                      | 212                  |
+----------------------------------+-------------------+------------------------------+--------------------------+----------------------+
| Bitpacked                        | 0.28126493        | 0.28125                      | 474                      | 112                  |
+----------------------------------+-------------------+------------------------------+--------------------------+----------------------+
| Monotonically increasing concave |                   |                              |                          |                      |
+----------------------------------+-------------------+------------------------------+--------------------------+----------------------+
| PiecewiseLinear                  | 0.005955          | 0.18813984                   | 885                      | 211                  |
+----------------------------------+-------------------+------------------------------+--------------------------+----------------------+
| FOR                              | 0.16113           | 0.15734828                   | 704                      | 212                  |
+----------------------------------+-------------------+------------------------------+--------------------------+----------------------+
| Bitpacked                        | 0.31251436        | 0.3125                       | 478                      | 113                  |
+----------------------------------+-------------------+------------------------------+--------------------------+----------------------+
| Monotonically increasing convex  |                   |                              |                          |                      |
+----------------------------------+-------------------+------------------------------+--------------------------+----------------------+
| PiecewiseLinear                  | 0.00613           | 0.20376484                   | 889                      | 211                  |
+----------------------------------+-------------------+------------------------------+--------------------------+----------------------+
| FOR                              | 0.157175          | 0.17297328                   | 706                      | 212                  |
+----------------------------------+-------------------+------------------------------+--------------------------+----------------------+
| Bitpacked                        | 0.31251436        | 0.3125                       | 471                      | 113                  |
+----------------------------------+-------------------+------------------------------+--------------------------+----------------------+
| Almost monotonically increasing  |                   |                              |                          |                      |
+----------------------------------+-------------------+------------------------------+--------------------------+----------------------+
| PiecewiseLinear                  | 0.14549863        | 0.17251475                   | 923                      | 210                  |
+----------------------------------+-------------------+------------------------------+--------------------------+----------------------+
| FOR                              | 0.14943957        | 0.15734814                   | 703                      | 211                  |
+----------------------------------+-------------------+------------------------------+--------------------------+----------------------+
| Bitpacked                        | 0.28126493        | 0.28125                      | 462                      | 112                  |
+----------------------------------+-------------------+------------------------------+--------------------------+----------------------+
| Random                           |                   |                              |                          |                      |
+----------------------------------+-------------------+------------------------------+--------------------------+----------------------+
| PiecewiseLinear                  | 0.14533783        | 0.14126475                   | 924                      | 211                  |
+----------------------------------+-------------------+------------------------------+--------------------------+----------------------+
| FOR                              | 0.13381402        | 0.15734814                   | 695                      | 211                  |
+----------------------------------+-------------------+------------------------------+--------------------------+----------------------+
| Bitpacked                        | 0.12501445        | 0.125                        | 422                      | 112                  |
+----------------------------------+-------------------+------------------------------+--------------------------+----------------------+
| HDFS logs timestamps             |                   |                              |                          |                      |
+----------------------------------+-------------------+------------------------------+--------------------------+----------------------+
| PiecewiseLinear                  | 0.39826187        | 0.4068908                    | 5545                     | 1086                 |
+----------------------------------+-------------------+------------------------------+--------------------------+----------------------+
| FOR                              | 0.39214826        | 0.40734857                   | 5082                     | 1073                 |
+----------------------------------+-------------------+------------------------------+--------------------------+----------------------+
| Bitpacked                        | 0.39062786        | 0.390625                     | 2864                     | 567                  |
+----------------------------------+-------------------+------------------------------+--------------------------+----------------------+
| HDFS logs timestamps SORTED      |                   |                              |                          |                      |
+----------------------------------+-------------------+------------------------------+--------------------------+----------------------+
| PiecewiseLinear                  | 0.032736875       | 0.094390824                  | 4942                     | 1067                 |
+----------------------------------+-------------------+------------------------------+--------------------------+----------------------+
| FOR                              | 0.02667125        | 0.079223566                  | 3626                     | 994                  |
+----------------------------------+-------------------+------------------------------+--------------------------+----------------------+
| Bitpacked                        | 0.39062786        | 0.390625                     | 2493                     | 566                  |
+----------------------------------+-------------------+------------------------------+--------------------------+----------------------+
| HTTP logs timestamps SORTED      |                   |                              |                          |                      |
+----------------------------------+-------------------+------------------------------+--------------------------+----------------------+
| PiecewiseLinear                  | 0.047942877       | 0.20376582                   | 5121                     | 1065                 |
+----------------------------------+-------------------+------------------------------+--------------------------+----------------------+
| FOR                              | 0.06637425        | 0.18859856                   | 3929                     | 1093                 |
+----------------------------------+-------------------+------------------------------+--------------------------+----------------------+
| Bitpacked                        | 0.26562786        | 0.265625                     | 2221                     | 526                  |
+----------------------------------+-------------------+------------------------------+--------------------------+----------------------+
| Amazon review product ids        |                   |                              |                          |                      |
+----------------------------------+-------------------+------------------------------+--------------------------+----------------------+
| PiecewiseLinear                  | 0.41900787        | 0.4225158                    | 5239                     | 1089                 |
+----------------------------------+-------------------+------------------------------+--------------------------+----------------------+
| FOR                              | 0.41504425        | 0.43859857                   | 4158                     | 1052                 |
+----------------------------------+-------------------+------------------------------+--------------------------+----------------------+
| Bitpacked                        | 0.40625286        | 0.40625                      | 2603                     | 513                  |
+----------------------------------+-------------------+------------------------------+--------------------------+----------------------+
| Amazon review product ids SORTED |                   |                              |                          |                      |
+----------------------------------+-------------------+------------------------------+--------------------------+----------------------+
| PiecewiseLinear                  | 0.18364687        | 0.25064084                   | 5036                     | 990                  |
+----------------------------------+-------------------+------------------------------+--------------------------+----------------------+
| FOR                              | 0.21239226        | 0.21984856                   | 4087                     | 1072                 |
+----------------------------------+-------------------+------------------------------+--------------------------+----------------------+
| Bitpacked                        | 0.40625286        | 0.40625                      | 2702                     | 525                  |
+----------------------------------+-------------------+------------------------------+--------------------------+----------------------+
| Temperatures                     |                   |                              |                          |                      |
+----------------------------------+-------------------+------------------------------+--------------------------+----------------------+
| PiecewiseLinear                  |                   | Codec Disabled               | 0                        | 0                    |
+----------------------------------+-------------------+------------------------------+--------------------------+----------------------+
| FOR                              | 1.0088086         | 1.001098                     | 1306                     | 237                  |
+----------------------------------+-------------------+------------------------------+--------------------------+----------------------+
| Bitpacked                        | 1.000012          | 1                            | 950                      | 108                  |
+----------------------------------+-------------------+------------------------------+--------------------------+----------------------+