feat: tracks index files in another cache and preloads them (#7181)

* feat: divide parquet and puffin index

Signed-off-by: evenyag <realevenyag@gmail.com>

* feat: download index files when we open the region

Signed-off-by: evenyag <realevenyag@gmail.com>

* feat: use different label for parquet/puffin

Signed-off-by: evenyag <realevenyag@gmail.com>

* feat: control parallelism and cache size by env

Signed-off-by: evenyag <realevenyag@gmail.com>

* fix: change gauge to counter

Signed-off-by: evenyag <realevenyag@gmail.com>

* fix: correct file type labels in file cache

Signed-off-by: evenyag <realevenyag@gmail.com>

* refactor: move env to config and change cache ratio to percent

Signed-off-by: evenyag <realevenyag@gmail.com>

* feat: checks capacity before download and refine metrics

Signed-off-by: evenyag <realevenyag@gmail.com>

* refactor: change open to return MitoRegionRef

Signed-off-by: evenyag <realevenyag@gmail.com>

* refactor: extract download to FileCache

Signed-off-by: evenyag <realevenyag@gmail.com>

* feat: run load cache task in write cache

Signed-off-by: evenyag <realevenyag@gmail.com>

* feat: check region state before downloading files

Signed-off-by: evenyag <realevenyag@gmail.com>

* chore: update config docs and test

Signed-off-by: evenyag <realevenyag@gmail.com>

* fix: use file id from index_file_id to compute puffin key

Signed-off-by: evenyag <realevenyag@gmail.com>

* fix: skip loading cache in some states

Signed-off-by: evenyag <realevenyag@gmail.com>

---------

Signed-off-by: evenyag <realevenyag@gmail.com>
This commit is contained in:
Yingwen
2025-11-11 16:37:32 +08:00
committed by GitHub
parent c7fded29ee
commit 24671b60b4
17 changed files with 575 additions and 179 deletions

View File

@@ -499,6 +499,17 @@ write_cache_size = "5GiB"
## @toml2docs:none-default
write_cache_ttl = "8h"
## Preload index (puffin) files into cache on region open (default: true).
## When enabled, index files are loaded into the write cache during region initialization,
## which can improve query performance at the cost of longer startup times.
preload_index_cache = true
## Percentage of write cache capacity allocated for index (puffin) files (default: 20).
## The remaining capacity is used for data (parquet) files.
## Must be between 0 and 100 (exclusive). For example, with a 5GiB write cache and 20% allocation,
## 1GiB is reserved for index files and 4GiB for data files.
index_cache_percent = 20
## Buffer size for SST writing.
sst_write_buffer_size = "8MB"