tantivy

mirror of https://github.com/quickwit-oss/tantivy.git synced 2026-01-09 02:22:54 +00:00

Files

Stu Hood c96d801c68 perf: Lazily load in BitpackedCodec (#56 )

We would like to be able to lazily load `BitpackedCodec` columns (similar to what 020bdffd61 did for `BlockwiseLinearCodec`), because in the context of `pg_search`, immediately constructing `OwnedBytes` means copying the entire content of the column into memory.

To do so, we expose some (slightly overlapped) block boundaries from `BitUnpacker`, and then lazily load each block when it is requested. Only the `get_val` function uses the cache: `get_row_ids_for_value_range` does not (yet), because it would be necessary to partition the row ids by block, and most of the time consumers using it are already loading reasonably large ranges anyway.

See https://github.com/paradedb/paradedb/pull/2894 for usage. There are a few 2x speedups in the benchmark suite, as well as a 1.8x speedup on a representative customer query. Unfortunately there are also some 13-19% slowdowns on aggregates: it looks like that is because aggregates use `get_vals`, for which the default implementation is to just call `get_val` in a loop.

2025-12-10 10:17:27 -08:00

benches

fix tests (#1813 )

2023-01-19 23:41:21 +09:00

src

perf: Lazily load in BitpackedCodec (#56 )

2025-12-10 10:17:27 -08:00

Cargo.toml

no pgrx, please

2025-12-10 10:17:24 -08:00