* feat: expose region read load through Prometheus metrics and heartbeat
Introduce region-level query load tracking (CPU time and scanned bytes)
collected by `RegionScanExec`, exposed via Prometheus metrics and optionally
reported through heartbeat region stats.
- **Region metrics** (`src/mito2/src/metrics.rs`, `src/store-api/src/metrics.rs`): Add
`greptime_mito_region_query_cpu_time`, `greptime_mito_region_query_scanned_bytes`,
and `greptime_mito_region_written_bytes_since_open` gauge metrics.
- **MitoRegion** (`src/mito2/src/region.rs`, `src/mito2/src/region/opener.rs`,
`src/mito2/src/region_write_ctx.rs`): Replace `AtomicU64` `written_bytes` with
`IntGauge`; add `query_cpu_time`/`query_scanned_bytes` fields with lifecycle
management (init, reset, remove-on-drop).
- **RegionStatistic** (`src/store-api/src/region_engine.rs`,
`src/store-api/src/storage/requests.rs`): Add `query_cpu_time` and
`query_scanned_bytes` fields.
- **Metric-engine** (`src/metric-engine/src/utils.rs`): Aggregate query load from
metadata and data regions.
- **Heartbeat** (`src/datanode/src/heartbeat.rs`,
`src/common/meta/src/datanode.rs`): Relay region query load via heartbeat
`RegionStat`; add test.
- **Query engine** (`src/query/src/options.rs`,
`src/query/src/query_engine/state.rs`, `src/query/src/datafusion.rs`,
`src/query/src/dist_plan/merge_scan.rs`,
`src/query/src/dist_plan/analyzer.rs`,
`src/query/src/dummy_catalog.rs`): Add `enable_region_query_load_report` config;
wire `RegionScanExec` to accumulate CPU time and scanned bytes.
- **Table scan** (`src/table/src/table/scan.rs`,
`src/table/src/table/metrics.rs`): Wire table scan metrics.
- **Config** (`config/standalone.example.toml`, `config/datanode.example.toml`,
`config/frontend.example.toml`, `config/config.md`): Add example config and
documentation for `enable_region_query_load_report`.
- **Tests** (`src/mito2/src/engine/basic_test.rs`,
`src/mito2/src/engine/close_test.rs`,
`src/cmd/tests/load_config_test.rs`,
`src/flow/src/adapter.rs`): Add unit tests for region query load reporting
and metric cleanup on region close; set default config values.
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* feat: move region read load report config from query layer to mito engine
Move the `enable_region_query_load_report` setting from query-level config
(`QueryOptions`/`DistPlannerOptions`) into the mito2 storage engine config
(`MitoConfig`), and expose it through the `RegionScanner` trait instead
of `ScanRequest`/`PrepareRequest`.
- Mito config: `src/mito2/src/config.rs`, `src/mito2/src/engine.rs`
- Scan region plumbing: `src/mito2/src/read/scan_region.rs`
- RegionScanner trait: `src/store-api/src/region_engine.rs`
- Scanner impls: `src/mito2/src/read/seq_scan.rs`, `src/mito2/src/read/series_scan.rs`, `src/mito2/src/read/unordered_scan.rs`
- RegionScanExec: `src/table/src/table/scan.rs`
- Removed from query layer: `src/query/src/options.rs`, `src/query/src/dist_plan/analyzer.rs`, `src/query/src/query_engine/state.rs`, `src/query/src/datafusion.rs`, `src/query/src/dummy_catalog.rs`
- Removed from test/config: `src/query/src/dist_plan/analyzer/test.rs`, `src/flow/src/adapter.rs`, `src/cmd/tests/load_config_test.rs`, `src/store-api/src/storage/requests.rs`
- Config docs: `config/config.md`, `config/datanode.example.toml`, `config/frontend.example.toml`, `config/standalone.example.toml`
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* feat: move region query load report config from MitoConfig to LoggingOptions
Relocate the `enable_region_query_load_report` setting from
`MitoConfig` to `LoggingOptions` (as `enable_per_region_metrics`),
and thread it into `MitoEngineBuilder` instead of reading from
the engine config directly. This makes the region read-load
reporting a per-node logging/observability concern rather than
a per-engine storage setting.
- `config/config.md`
- `config/datanode.example.toml`
- `config/standalone.example.toml`
- `src/common/telemetry/src/logging.rs`
- `src/datanode/src/datanode.rs`
- `src/mito2/src/config.rs`
- `src/mito2/src/engine.rs`
- `src/mito2/src/region.rs`
Signed-off-by: Lei Huang <lei@huang.to>
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* feat: report region query load on stream drop instead of stream end
Move `report_region_query_load()` from `StreamWithMetricWrapper::poll_next()`
to `Drop::drop()` so that region query load is reported even when the
stream is dropped prematurely (not just when fully consumed).
Affected files:
- `src/table/src/table/scan.rs`
Signed-off-by: Lei, Huang <huanglei@qiyi.com>
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* feat: make region query load reporting configurable
Introduce `enable_region_query_load_report` flag to optionally report
per-region `query_cpu_time` and `query_scanned_bytes` metrics instead
of always creating them. When disabled, the Prometheus gauges are not
created (`None`), avoiding metric churn for workloads that do not
need query-level load tracking.
- `src/common/meta/src/datanode.rs` — Placeholder fields for query load
- `src/mito2/src/region.rs` — Make query metrics `Option<IntGauge>`, conditional create/remove/reset
- `src/mito2/src/region/opener.rs` — Thread flag through `RegionOpener`
- `src/mito2/src/worker.rs` — Thread flag through `WorkerGroup`/`WorkerStarter`/`RegionWorkerLoop`
- `src/mito2/src/worker/handle_catchup.rs` — Pass flag on region open
- `src/mito2/src/worker/handle_create.rs` — Pass flag on region create
- `src/mito2/src/worker/handle_open.rs` — Pass flag on region open
- `src/mito2/src/engine.rs` — Pass flag from `MitoEngineBuilder`
- `src/mito2/src/test_util.rs` — Test helpers for both modes
- `src/mito2/src/engine/basic_test.rs` — Cover disabled and preserve cases
- `src/mito2/src/engine/close_test.rs` — Adapt to optional metrics
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* refactor: remove elapsed_compute metric from scan stream
The elapsed_compute metric conflated poll-wait time with actual CPU
computation, making it misleading. Removed the metric and its
recording path from StreamMetrics and StreamWithMetricWrapper.
Added a test asserting that poll duration is not reported as
elapsed_compute.
- `src/table/src/table/metrics.rs` — removed elapsed_compute field,
builder, and record_elapsed_compute method
- `src/table/src/table/scan.rs` — removed record_elapsed_compute
call; added SlowRecordBatchStream test helper and
wrapper_poll_time_is_not_elapsed_compute test
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* feat: disable region query load report for compaction scans
Compaction scans are internal operations initiated by the engine,
not user queries. Disable region query load reporting when the
scan input is marked as compaction to avoid misleading load metrics.
- `src/mito2/src/read/scan_region.rs` — set `enable_region_query_load_report`
to `false` when compaction is enabled; add unit test
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* test: add `enable_per_region_metrics` config to HTTP integration test
- Enable per-region metrics config in HTTP test setup
\`tests-integration/tests/http.rs\`
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* refactor: remove region query load reporting tests and helpers
Remove the region query load reporting feature from the codebase,
including tests, test utilities, and helper infrastructure that were
part of this now-deprecated functionality.
Specifically:
- Remove region query load reporting tests from
`src/mito2/src/engine/basic_test.rs` and
`src/table/src/table/scan.rs`, and the region close metrics test
from `src/mito2/src/engine/close_test.rs`
- Remove region query load report test utilities and simplify engine
construction helpers in `src/mito2/src/test_util.rs`
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* perf: avoid disabled region query load timing
Summary:
- Avoid per-poll `Instant::now` and elapsed-time accumulation when `enable_region_query_load_report` is disabled.
- Keep region query-load CPU accounting active only when reporting is enabled.
Files:
- `src/table/src/table/scan.rs`
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* feat: move per-region query load reporting from storage to query engine
Move `enable_per_region_metrics` from datanode to frontend config and
migrate query load tracking (CPU time, scanned bytes) from mito2
storage engine to the query engine's distributed scan planner. The
storage-level metrics plumbing and `enable_region_query_load_report`
flag are removed from mito2, `ScanInput`, `ScanRegion`, and
`RegionScanner`. Query-level metrics are now collected in
`merge_scan.rs` via `scan_region_load`.
- `src/mito2/` -- Remove `query_cpu_time`, `query_scanned_bytes`
metrics, `enable_region_query_load_report` plumbing from engine,
region, opener, scanner types, workers
- `src/store-api/` -- Remove `query_cpu_time`, `query_scanned_bytes`
from `RegionStatistic`
- `src/metric-engine/` -- Remove query load fields from
`get_region_statistic`
- `src/query/` -- Add `enable_per_region_metrics` to `QueryOptions`;
wire through planner, optimizer, merge scan with `scan_region_load`
metrics
- `src/frontend/` -- Pass `enable_per_region_metrics` into
`QueryOptions`
- `src/common/meta/` -- Remove TODO for query load fields
- `config/` -- Move `enable_per_region_metrics` from datanode to
frontend and standalone example configs
- `src/cmd/tests/` -- Add `enable_per_region_metrics` to flownode
config test
- `src/flow/` -- Add `enable_per_region_metrics` default to flownode
options
- `src/table/` -- Remove unused query load fields from scan
- `src/datanode/` -- Remove
`with_enable_region_query_load_report` calls
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* refactor: remove obsolete mito write load metric
Remove obsolete mito-side region written-bytes metric plumbing that is not needed by the frontend read-load reporting path.
Related files:
- \`src/mito2/src/metrics.rs\`
- \`src/mito2/src/region.rs\`
- \`src/mito2/src/region/opener.rs\`
- \`src/mito2/src/region_write_ctx.rs\`
- \`src/mito2/src/engine/basic_test.rs\`
- \`src/mito2/src/worker.rs\`
- \`src/mito2/src/config.rs\`
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* feat: change region query load metrics from gauge to counter
Change `REGION_QUERY_CPU_TIME` and `REGION_QUERY_SCANNED_BYTES` from
`IntGaugeVec` to `IntCounterVec` since these values are monotonically
increasing and do not need gauge semantics. Update corresponding `add`
calls to `inc_by` in merge scan reporting.
Files:
- `src/store-api/src/metrics.rs` — metric type and label changes
- `src/query/src/dist_plan/merge_scan.rs` — caller adaptation
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* refactor: pass ReadItem directly to report_region_query_load
Move `region_scan_load` call to the caller, so `report_region_query_load`
accepts the already-computed `ReadItem` instead of `RecordBatchMetrics`.
- `src/query/src/dist_plan/merge_scan.rs` — update signature, inline call,
remove stale test
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* feat: ensure region query load is reported on MergeScanExec drop
Remove the `enable_per_region_metrics` parameter from `report_region_query_load`
so region load metrics are always emitted. Add a `Drop` impl for
`MergeScanExec` that reports sub-stage metrics when the executor is
dropped, covering edge cases where per-region metric emission was
missed. Add a unit test verifying CPU time and scanned bytes are
recorded on drop.
Affected file: `src/query/src/dist_plan/merge_scan.rs`
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* fix: gate region query load reporting
Guard drop-time region query load reporting with the configured per-region metrics flag.
Related files:
- \`src/query/src/dist_plan/merge_scan.rs\`
Symbols:
- \`MergeScanExec::drop\`
- \`enable_per_region_metrics\`
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* fix: clean region query load metrics on drop
Remove per-region query load metric labels when a region is dropped so stale label series do not remain in the registry.
Related files:
- \`src/mito2/src/region.rs\`
Symbols:
- \`MitoRegion::drop\`
- \`REGION_QUERY_CPU_TIME\`
- \`REGION_QUERY_SCANNED_BYTES\`
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
---------
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
Signed-off-by: Lei Huang <lei@huang.to>
Signed-off-by: Lei, Huang <huanglei@qiyi.com>
* feat: add password verifier formats
Signed-off-by: Dennis Zhuang <killme2008@gmail.com>
* fix: harden password verifier parsing and auth config errors
- Reject pbkdf2_sha256 verifiers whose hash is not 32 bytes and bound the
salt length, preventing short-hash verifiers from matching on a prefix.
- Verify pbkdf2_sha256 with a stack-allocated buffer.
- Report only the length, not the bytes, when a mysql native password
verifier has an illegal length.
- Map empty frontend_auth credentials to an invalid-config error instead
of an internal error.
Signed-off-by: Dennis Zhuang <killme2008@gmail.com>
* chore: update config.md
Signed-off-by: Dennis Zhuang <killme2008@gmail.com>
* fix: skip non-plain verifiers in get_one_user_pwd
Pick the first plain-text credential instead of failing when the first
user happens to hold a hashed verifier.
Signed-off-by: Dennis Zhuang <killme2008@gmail.com>
* fix: format
Signed-off-by: Dennis Zhuang <killme2008@gmail.com>
* chore: remove unused get_one_user_pwd
Internal flownode-to-frontend communication no longer authenticates
(see #8244), so the plain-text credential export path is dead code.
Drop get_one_user_pwd, its now-orphan as_plain_text helper, and the
related tests.
Signed-off-by: Dennis Zhuang <killme2008@gmail.com>
---------
Signed-off-by: Dennis Zhuang <killme2008@gmail.com>
* feat: global switch for creating table automatically
Signed-off-by: Dennis Zhuang <killme2008@gmail.com>
* chore: make auto_create_table as comment by default
Signed-off-by: Dennis Zhuang <killme2008@gmail.com>
* feat: respect gloabl switch for metric engine
Signed-off-by: Dennis Zhuang <killme2008@gmail.com>
---------
Signed-off-by: Dennis Zhuang <killme2008@gmail.com>
Fixes#8087
FrontendOptions, DatanodeOptions, and FlownodeOptions do not define
a heartbeat field, so the [heartbeat] sections in their example TOML
files were never parsed.
Heartbeat intervals are actually negotiated from metasrv during the
heartbeat handshake:
- Datanode/Flownode: interval = metasrv.heartbeat_interval
- Frontend: interval = metasrv.heartbeat_interval * 6
This mislead operators into thinking they could tune heartbeat timing
locally. Remove the sections to eliminate confusion.
Metasrv's heartbeat_interval remains documented and functional.
* chore/enable-flight-encoder:
### Add Flight Compression Support
- **Configuration Updates**:
- Added `grpc.flight_compression` option to `config/config.md`, `config/datanode.example.toml`, and `config/frontend.example.toml` to specify compression modes for Arrow IPC service.
- **Code Enhancements**:
- Updated `FlightEncoder` in `src/common/grpc/src/flight.rs` to support compression modes.
- Modified `RegionServer` and `DatanodeBuilder` in `src/datanode/src/datanode.rs` and `src/datanode/src/region_server.rs` to handle `FlightCompression`.
- Integrated `FlightCompression` in `src/servers/src/grpc.rs` and `src/servers/src/grpc/flight.rs` to manage compression settings.
- **Testing and Integration**:
- Updated test utilities and integration tests in `tests-integration/src/grpc/flight.rs` and `tests-integration/src/test_util.rs` to include `FlightCompression`.
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* chore/enable-flight-encoder:
### Enable Compression in FlightClient
- **`client.rs`**: Updated `make_flight_client` to accept `send_compression` and `accept_compression` parameters, enabling Zstd compression for sending and receiving messages.
- **`client_manager.rs`**: Modified `datanode` method to pass compression settings from `ChannelConfig` to `RegionRequester`.
- **`database.rs`**: Adjusted calls to `make_flight_client` to include compression parameters.
- **`region.rs`**: Updated `RegionRequester` to store and utilize compression settings.
- **`frontend.rs`**: Configured `ChannelConfig` to enable compression based on options.
- **`channel_manager.rs`**: Added `send_compression` and `accept_compression` fields to `ChannelConfig` with default values and updated tests accordingly.
Signed-off-by: Lei, HUANG <lhuang@greptime.com>
* chore/enable-flight-encoder:
### Update Compression Defaults and Documentation
- **Configuration Files**: Updated `datanode.example.toml` and `frontend.example.toml` to include a default setting comment for `flight_compression`, specifying it defaults to `none`.
- **gRPC Server Code**: Modified `grpc.rs` to set `None` as the default for `FlightCompression` instead of `ArrowIpc`.
Signed-off-by: Lei, HUANG <lhuang@greptime.com>
---------
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
Signed-off-by: Lei, HUANG <lhuang@greptime.com>
* feat/lossy-string-validation-in-prom-remote-write:
### Commit Message
#### Refactor Prometheus Validation Mode
- **Replace `is_strict_mode` with `PromValidationMode` Enum:**
- Updated `HttpOptions` and related structures to use `PromValidationMode` enum instead of the boolean `is_strict_mode`.
- Modified functions and tests to accommodate the new enum, ensuring flexible validation modes (`Strict`, `Lossy`, `Unchecked`).
- Affected files: `server.rs`, `prom_decode.rs`, `http.rs`, `prom_store.rs`, `prom_row_builder.rs`, `proto.rs`, `prom_store_test.rs`, `test_util.rs`, `http.rs`.
- **Enhance UTF-8 String Decoding:**
- Introduced `decode_string` function to handle UTF-8 string decoding based on the selected `PromValidationMode`.
- Affected files: `proto.rs`, `prom_row_builder.rs`.
This refactor improves the flexibility and clarity of Prometheus request handling by allowing different validation strategies.
* feat/lossy-string-validation-in-prom-remote-write:
- **Add Prometheus Validation Mode Configuration:**
- Updated `config/config.md`, `config/frontend.example.toml`, and `config/standalone.example.toml` to include `http.prom_validation_mode` setting for Prometheus remote write requests.
- **Enhance Benchmarking for Prometheus Requests:**
- Modified `src/servers/benches/prom_decode.rs` to benchmark different Prometheus validation modes (`Strict`, `Lossy`, `Unchecked`).
- **Implement and Test String Decoding:**
- Added `decode_string` function and comprehensive tests in `src/servers/src/proto.rs` to handle string decoding with different validation modes.
* feat/lossy-string-validation-in-prom-remote-write:
### Add Histogram Buckets to Metrics
- **Files Modified**: `src/servers/src/metrics.rs`
- **Key Changes**:
- Added specific histogram buckets to `METRIC_MYSQL_QUERY_TIMER`, `METRIC_POSTGRES_QUERY_TIMER`, and `METRIC_SERVER_GRPC_PROM_REQUEST_TIMER` to enhance granularity in query elapsed time metrics.
* feat/lossy-string-validation-in-prom-remote-write:
### Update Prometheus Validation Mode Default
- **Config Documentation**: Updated the default description for `http.prom_validation_mode` to indicate that "strict" is the default option in `config.md`, `frontend.example.toml`, and `standalone.example.toml`.
- **HTTP Server Implementation**: Changed the default `prom_validation_mode` to `PromValidationMode::Strict` in `src/servers/src/http.rs`.
* feat/lossy-string-validation-in-prom-remote-write:
**Commit Message:**
Update Prometheus Validation Mode to Strict
- Changed `http.prom_validation_mode` from `unchecked` to `strict` in `config.md`, `frontend.example.toml`, and
`standalone.example.toml` to enforce strict validation of Prometheus remote write requests.
* feat: update to disable http timeout by default
* feat: make http timeout default to 0
* test: correct test case
* chore: generate new config doc
* test: correct tests
* refactor: rename grpc options
* refactor: make the arg clearly
* chore: comments on server_addr
* chore: fix test
* chore: remove the store_addr alias
* refactor: cli option rpc_server_addr
* chore: keep store-addr alias
* chore: by comment
* feat: set max log files to 720 by default, info log only
* expose max_log_files in tomls
* include dir info when panicing, limit max_log_files of err_log to 30, and that of slow_queries to opt.max_log_files
* fix clippy
* update config.md
* update expected config str
* limit err_log max files size to `max_log_files` too, include err info when panicing, put `max_l_f` in right position
* fix typos
* chore: config
Co-authored-by: Lei, HUANG <6406592+v0y4g3r@users.noreply.github.com>
---------
Co-authored-by: dennis zhuang <killme2008@gmail.com>
Co-authored-by: Lei, HUANG <6406592+v0y4g3r@users.noreply.github.com>