Files
greptimedb/docs/scanbench.md
Yingwen 5c8ece27e0 feat: improve filter support for scanbench (#7736)
* feat: cast filters type for scanbench

Signed-off-by: evenyag <realevenyag@gmail.com>

* chore: pub file_range mod

So we can use the pub struct FileRange in other places

Signed-off-by: evenyag <realevenyag@gmail.com>

* fix: add api as dev-dependency to cmd for clippy

Signed-off-by: evenyag <realevenyag@gmail.com>

* feat: support profiling after warmup

Signed-off-by: evenyag <realevenyag@gmail.com>

---------

Signed-off-by: evenyag <realevenyag@gmail.com>
2026-03-03 09:00:41 +00:00

3.5 KiB

Scanbench Usage

scanbench benchmarks region scans directly from storage through:

greptime datanode scanbench ...

Build

cargo build -p cmd --bin greptime

Command

./target/debug/greptime datanode scanbench \
  --config <CONFIG_TOML> \
  --region-id <REGION_ID> \
  --table-dir <TABLE_DIR> \
  [--scanner <seq|unordered|series>] \
  [--scan-config <SCAN_CONFIG_JSON>] \
  [--parallelism <N>] \
  [--iterations <N>] \
  [--path-type <bare|data|metadata>] \
  [--force-flat-format] \
  [--enable-wal] \
  [--pprof-file <FLAMEGRAPH_SVG>] \
  [--pprof-after-warmup] \
  [--verbose]

Required Arguments

  • --config: Datanode/standalone TOML config.
  • --region-id: Region ID in one of:
    • <u64> (example: 4398046511104)
    • <table_id>:<region_number> (example: 1024:0)
  • --table-dir: Table directory used in open request (example: greptime/public/1024).

Optional Arguments

  • --scanner: Scan strategy. Default: seq.
    • seq: default scan
    • unordered: time-windowed distribution
    • series: per-series distribution
  • --scan-config: JSON file to tune scan request.
  • --parallelism: Simulated scan parallelism. Default: 1.
  • --iterations: Benchmark iterations. Default: 1.
  • --path-type: Region path type (bare, data, metadata). Default: bare.
  • --force-flat-format: Force reading the region in flat format. Default: disabled.
  • --enable-wal: Enable WAL replay when opening the region. Default: disabled. When enabled, scanbench uses the log store configured in the [wal] section of the config TOML (raft-engine or Kafka). When disabled or when no WAL is configured, a NoopLogStore is used.
  • --pprof-file: Output flamegraph path (Unix only).
  • --pprof-after-warmup: Start profiling after the first iteration, using it as a warmup. Requires --pprof-file. Default: disabled.
  • --verbose / -v: Enable verbose output.

Scan Config JSON

{
  "projection": [0, 1, 2],
  "projection_names": ["host", "cpu"],
  "filters": ["host = 'web-1'", "cpu > 80"],
  "series_row_selector": "last_row"
}

Notes:

  • All fields are optional.
  • Use either projection (indexes) or projection_names (column names), not both.
  • projection_names uses exact (case-sensitive) column name matching.
  • filters is a list of SQL expressions (not full SQL statements), e.g. "host = 'web-1'".
  • series_row_selector currently supports only "last_row".

Examples

Default sequential scan:

./target/debug/greptime datanode scanbench \
  --config /path/to/config.toml \
  --region-id 1024:0 \
  --table-dir greptime/public/1024

Unordered scan with parallelism:

./target/debug/greptime datanode scanbench \
  --config /path/to/config.toml \
  --region-id 1024:0 \
  --table-dir greptime/public/1024 \
  --scanner unordered \
  --parallelism 8 \
  --iterations 5

Series scan with scan config and flamegraph:

./target/debug/greptime datanode scanbench \
  --config /path/to/config.toml \
  --region-id 1024:0 \
  --table-dir greptime/public/1024 \
  --scanner series \
  --scan-config /path/to/scan-config.json \
  --pprof-file /tmp/scanbench.svg

Force flat-format read:

./target/debug/greptime datanode scanbench \
  --config /path/to/config.toml \
  --region-id 1024:0 \
  --table-dir greptime/public/1024 \
  --force-flat-format

Scan with WAL replay enabled (uses [wal] config from TOML):

./target/debug/greptime datanode scanbench \
  --config /path/to/config.toml \
  --region-id 1024:0 \
  --table-dir greptime/public/1024 \
  --enable-wal