Files
greptimedb/docs/scanbench.md
Yingwen 5c8ece27e0 feat: improve filter support for scanbench (#7736)
* feat: cast filters type for scanbench

Signed-off-by: evenyag <realevenyag@gmail.com>

* chore: pub file_range mod

So we can use the pub struct FileRange in other places

Signed-off-by: evenyag <realevenyag@gmail.com>

* fix: add api as dev-dependency to cmd for clippy

Signed-off-by: evenyag <realevenyag@gmail.com>

* feat: support profiling after warmup

Signed-off-by: evenyag <realevenyag@gmail.com>

---------

Signed-off-by: evenyag <realevenyag@gmail.com>
2026-03-03 09:00:41 +00:00

130 lines
3.5 KiB
Markdown

# Scanbench Usage
`scanbench` benchmarks region scans directly from storage through:
```bash
greptime datanode scanbench ...
```
## Build
```bash
cargo build -p cmd --bin greptime
```
## Command
```bash
./target/debug/greptime datanode scanbench \
--config <CONFIG_TOML> \
--region-id <REGION_ID> \
--table-dir <TABLE_DIR> \
[--scanner <seq|unordered|series>] \
[--scan-config <SCAN_CONFIG_JSON>] \
[--parallelism <N>] \
[--iterations <N>] \
[--path-type <bare|data|metadata>] \
[--force-flat-format] \
[--enable-wal] \
[--pprof-file <FLAMEGRAPH_SVG>] \
[--pprof-after-warmup] \
[--verbose]
```
## Required Arguments
- `--config`: Datanode/standalone TOML config.
- `--region-id`: Region ID in one of:
- `<u64>` (example: `4398046511104`)
- `<table_id>:<region_number>` (example: `1024:0`)
- `--table-dir`: Table directory used in open request (example: `greptime/public/1024`).
## Optional Arguments
- `--scanner`: Scan strategy. Default: `seq`.
- `seq`: default scan
- `unordered`: time-windowed distribution
- `series`: per-series distribution
- `--scan-config`: JSON file to tune scan request.
- `--parallelism`: Simulated scan parallelism. Default: `1`.
- `--iterations`: Benchmark iterations. Default: `1`.
- `--path-type`: Region path type (`bare`, `data`, `metadata`). Default: `bare`.
- `--force-flat-format`: Force reading the region in flat format. Default: disabled.
- `--enable-wal`: Enable WAL replay when opening the region. Default: disabled. When enabled, scanbench uses the log store configured in the `[wal]` section of the config TOML (raft-engine or Kafka). When disabled or when no WAL is configured, a `NoopLogStore` is used.
- `--pprof-file`: Output flamegraph path (Unix only).
- `--pprof-after-warmup`: Start profiling after the first iteration, using it as a warmup. Requires `--pprof-file`. Default: disabled.
- `--verbose` / `-v`: Enable verbose output.
## Scan Config JSON
```json
{
"projection": [0, 1, 2],
"projection_names": ["host", "cpu"],
"filters": ["host = 'web-1'", "cpu > 80"],
"series_row_selector": "last_row"
}
```
Notes:
- All fields are optional.
- Use either `projection` (indexes) or `projection_names` (column names), not both.
- `projection_names` uses exact (case-sensitive) column name matching.
- `filters` is a list of SQL expressions (not full SQL statements), e.g. `"host = 'web-1'"`.
- `series_row_selector` currently supports only `"last_row"`.
## Examples
Default sequential scan:
```bash
./target/debug/greptime datanode scanbench \
--config /path/to/config.toml \
--region-id 1024:0 \
--table-dir greptime/public/1024
```
Unordered scan with parallelism:
```bash
./target/debug/greptime datanode scanbench \
--config /path/to/config.toml \
--region-id 1024:0 \
--table-dir greptime/public/1024 \
--scanner unordered \
--parallelism 8 \
--iterations 5
```
Series scan with scan config and flamegraph:
```bash
./target/debug/greptime datanode scanbench \
--config /path/to/config.toml \
--region-id 1024:0 \
--table-dir greptime/public/1024 \
--scanner series \
--scan-config /path/to/scan-config.json \
--pprof-file /tmp/scanbench.svg
```
Force flat-format read:
```bash
./target/debug/greptime datanode scanbench \
--config /path/to/config.toml \
--region-id 1024:0 \
--table-dir greptime/public/1024 \
--force-flat-format
```
Scan with WAL replay enabled (uses `[wal]` config from TOML):
```bash
./target/debug/greptime datanode scanbench \
--config /path/to/config.toml \
--region-id 1024:0 \
--table-dir greptime/public/1024 \
--enable-wal
```