tantivy

mirror of https://github.com/quickwit-oss/tantivy.git synced 2026-05-25 12:40:41 +00:00

Files

Eric Ridge 5b6da9123c feat: introduce a MergeOptimizedInvertedIndexReader (#32 )

This is probably a bit of a misnomer as it's really a "PgSearchOptimizedInvertedIndexReaderForMerge".

What we've done here is copied `InvertedIndexReader` and internally adjusted it to hold onto the complete `OwnedBytes` of the index's postings and positions.  One or two other small touch points were required to make other internal APIs compatabile with this but they don't otherwise change functionality or I/O patterns.

`MergeOptimizedInvertedIndexReader` does change I/O patterns, however, in that the merge process now does two (potentially) very large reads when it obtains the new "merge optimized inverted index reader" for each segment.  This changes access patterns such that all the reads happen up-front rather than term-by-term as the merge process is solving.

A likely downside to this approach is that now pg_search will be, indirectly, holding onto a lot of heap-allocated memory that was read from its block storage.  Perhaps in the (near) future we can further optimize the new `MergeOptimizedInvertedIndexReader` such that it pages in blocks of a few megabytes at a time, on demand, rather than the whole file.

---

Some unit tests were also updated to resolve compilation problems by PR https://github.com/paradedb/tantivy/pull/31 that for some reason didn't show in CI.  #weird

2025-12-10 10:17:26 -08:00

column

perf: push FileSlices down through most of fast fields (#19 )

2025-12-10 10:17:25 -08:00

column_index

perf: push FileSlices down through most of fast fields (#19 )

2025-12-10 10:17:25 -08:00

column_values

perf: remove some fast fields loading overhead (#22 )

2025-12-10 10:17:25 -08:00

columnar

feat: introduce a MergeOptimizedInvertedIndexReader (#32 )