Files
greptimedb/grafana/dashboards/metrics/cluster/dashboard.md
zyy17 31cb769507 chore: add limit in resources panel and Cache Miss panel (#6636)
chore: add `limit` in resources panel and 'Cache Miss' panel

Signed-off-by: zyy17 <zyylsxm@gmail.com>
2025-08-03 19:09:32 +00:00

26 KiB

Overview

Title Query Type Description Datasource Unit Legend Format
Uptime time() - process_start_time_seconds stat The start time of GreptimeDB. prometheus s __auto
Version SELECT pkg_version FROM information_schema.build_info stat GreptimeDB version. mysql -- --
Total Ingestion Rate sum(rate(greptime_table_operator_ingest_rows[$__rate_interval])) stat Total ingestion rate. prometheus rowsps __auto
Total Storage Size select SUM(disk_size) from information_schema.region_statistics; stat Total number of data file size. mysql decbytes --
Total Rows select SUM(region_rows) from information_schema.region_statistics; stat Total number of data rows in the cluster. Calculated by sum of rows from each region. mysql sishort --
Deployment SELECT count(*) as datanode FROM information_schema.cluster_info WHERE peer_type = 'DATANODE';
SELECT count(*) as frontend FROM information_schema.cluster_info WHERE peer_type = 'FRONTEND';
SELECT count(*) as metasrv FROM information_schema.cluster_info WHERE peer_type = 'METASRV';
SELECT count(*) as flownode FROM information_schema.cluster_info WHERE peer_type = 'FLOWNODE';
stat The deployment topology of GreptimeDB. mysql -- --
Database Resources SELECT COUNT(*) as databases FROM information_schema.schemata WHERE schema_name NOT IN ('greptime_private', 'information_schema')
SELECT COUNT(*) as tables FROM information_schema.tables WHERE table_schema != 'information_schema'
SELECT COUNT(region_id) as regions FROM information_schema.region_peers
SELECT COUNT(*) as flows FROM information_schema.flows
stat The number of the key resources in GreptimeDB. mysql -- --
Data Size SELECT SUM(memtable_size) * 0.42825 as WAL FROM information_schema.region_statistics;
SELECT SUM(index_size) as index FROM information_schema.region_statistics;
SELECT SUM(manifest_size) as manifest FROM information_schema.region_statistics;
stat The data size of wal/index/manifest in the GreptimeDB. mysql decbytes --

Ingestion

Title Query Type Description Datasource Unit Legend Format
Total Ingestion Rate sum(rate(greptime_table_operator_ingest_rows{instance=~"$frontend"}[$__rate_interval])) timeseries Total ingestion rate.

Here we listed 3 primary protocols:

- Prometheus remote write
- Greptime's gRPC API (when using our ingest SDK)
- Log ingestion http API
prometheus rowsps ingestion
Ingestion Rate by Type sum(rate(greptime_servers_http_logs_ingestion_counter[$__rate_interval]))
sum(rate(greptime_servers_prometheus_remote_write_samples[$__rate_interval]))
timeseries Total ingestion rate.

Here we listed 3 primary protocols:

- Prometheus remote write
- Greptime's gRPC API (when using our ingest SDK)
- Log ingestion http API
prometheus rowsps http-logs

Queries

Title Query Type Description Datasource Unit Legend Format
Total Query Rate sum (rate(greptime_servers_mysql_query_elapsed_count{instance=~"$frontend"}[$__rate_interval]))
sum (rate(greptime_servers_postgres_query_elapsed_count{instance=~"$frontend"}[$__rate_interval]))
sum (rate(greptime_servers_http_promql_elapsed_counte{instance=~"$frontend"}[$__rate_interval]))
timeseries Total rate of query API calls by protocol. This metric is collected from frontends.

Here we listed 3 main protocols:
- MySQL
- Postgres
- Prometheus API

Note that there are some other minor query APIs like /sql are not included
prometheus reqps mysql

Resources

Title Query Type Description Datasource Unit Legend Format
Datanode Memory per Instance sum(process_resident_memory_bytes{instance=~"$datanode"}) by (instance, pod)
max(greptime_memory_limit_in_bytes{app="greptime-datanode"})
timeseries Current memory usage by instance prometheus bytes [{{instance}}]-[{{ pod }}]
Datanode CPU Usage per Instance sum(rate(process_cpu_seconds_total{instance=~"$datanode"}[$__rate_interval]) * 1000) by (instance, pod)
max(greptime_cpu_limit_in_millicores{app="greptime-datanode"})
timeseries Current cpu usage by instance prometheus none [{{ instance }}]-[{{ pod }}]
Frontend Memory per Instance sum(process_resident_memory_bytes{instance=~"$frontend"}) by (instance, pod)
max(greptime_memory_limit_in_bytes{app="greptime-frontend"})
timeseries Current memory usage by instance prometheus bytes [{{ instance }}]-[{{ pod }}]
Frontend CPU Usage per Instance sum(rate(process_cpu_seconds_total{instance=~"$frontend"}[$__rate_interval]) * 1000) by (instance, pod)
max(greptime_cpu_limit_in_millicores{app="greptime-frontend"})
timeseries Current cpu usage by instance prometheus none [{{ instance }}]-[{{ pod }}]-cpu
Metasrv Memory per Instance sum(process_resident_memory_bytes{instance=~"$metasrv"}) by (instance, pod)
max(greptime_memory_limit_in_bytes{app="greptime-metasrv"})
timeseries Current memory usage by instance prometheus bytes [{{ instance }}]-[{{ pod }}]-resident
Metasrv CPU Usage per Instance sum(rate(process_cpu_seconds_total{instance=~"$metasrv"}[$__rate_interval]) * 1000) by (instance, pod)
max(greptime_cpu_limit_in_millicores{app="greptime-metasrv"})
timeseries Current cpu usage by instance prometheus none [{{ instance }}]-[{{ pod }}]
Flownode Memory per Instance sum(process_resident_memory_bytes{instance=~"$flownode"}) by (instance, pod)
max(greptime_memory_limit_in_bytes{app="greptime-flownode"})
timeseries Current memory usage by instance prometheus bytes [{{ instance }}]-[{{ pod }}]
Flownode CPU Usage per Instance sum(rate(process_cpu_seconds_total{instance=~"$flownode"}[$__rate_interval]) * 1000) by (instance, pod)
max(greptime_cpu_limit_in_millicores{app="greptime-flownode"})
timeseries Current cpu usage by instance prometheus none [{{ instance }}]-[{{ pod }}]

Frontend Requests

Title Query Type Description Datasource Unit Legend Format
HTTP QPS per Instance sum by(instance, pod, path, method, code) (rate(greptime_servers_http_requests_elapsed_count{instance=~"$frontend",path!~"/health|/metrics"}[$__rate_interval])) timeseries HTTP QPS per Instance. prometheus reqps [{{instance}}]-[{{pod}}]-[{{path}}]-[{{method}}]-[{{code}}]
HTTP P99 per Instance histogram_quantile(0.99, sum by(instance, pod, le, path, method, code) (rate(greptime_servers_http_requests_elapsed_bucket{instance=~"$frontend",path!~"/health|/metrics"}[$__rate_interval]))) timeseries HTTP P99 per Instance. prometheus s [{{instance}}]-[{{pod}}]-[{{path}}]-[{{method}}]-[{{code}}]-p99
gRPC QPS per Instance sum by(instance, pod, path, code) (rate(greptime_servers_grpc_requests_elapsed_count{instance=~"$frontend"}[$__rate_interval])) timeseries gRPC QPS per Instance. prometheus reqps [{{instance}}]-[{{pod}}]-[{{path}}]-[{{code}}]
gRPC P99 per Instance histogram_quantile(0.99, sum by(instance, pod, le, path, code) (rate(greptime_servers_grpc_requests_elapsed_bucket{instance=~"$frontend"}[$__rate_interval]))) timeseries gRPC P99 per Instance. prometheus s [{{instance}}]-[{{pod}}]-[{{path}}]-[{{method}}]-[{{code}}]-p99
MySQL QPS per Instance sum by(pod, instance)(rate(greptime_servers_mysql_query_elapsed_count{instance=~"$frontend"}[$__rate_interval])) timeseries MySQL QPS per Instance. prometheus reqps [{{instance}}]-[{{pod}}]
MySQL P99 per Instance histogram_quantile(0.99, sum by(pod, instance, le) (rate(greptime_servers_mysql_query_elapsed_bucket{instance=~"$frontend"}[$__rate_interval]))) timeseries MySQL P99 per Instance. prometheus s [{{ instance }}]-[{{ pod }}]-p99
PostgreSQL QPS per Instance sum by(pod, instance)(rate(greptime_servers_postgres_query_elapsed_count{instance=~"$frontend"}[$__rate_interval])) timeseries PostgreSQL QPS per Instance. prometheus reqps [{{instance}}]-[{{pod}}]
PostgreSQL P99 per Instance histogram_quantile(0.99, sum by(pod,instance,le) (rate(greptime_servers_postgres_query_elapsed_bucket{instance=~"$frontend"}[$__rate_interval]))) timeseries PostgreSQL P99 per Instance. prometheus s [{{instance}}]-[{{pod}}]-p99

Frontend to Datanode

Title Query Type Description Datasource Unit Legend Format
Ingest Rows per Instance sum by(instance, pod)(rate(greptime_table_operator_ingest_rows{instance=~"$frontend"}[$__rate_interval])) timeseries Ingestion rate by row as in each frontend prometheus rowsps [{{instance}}]-[{{pod}}]
Region Call QPS per Instance sum by(instance, pod, request_type) (rate(greptime_grpc_region_request_count{instance=~"$frontend"}[$__rate_interval])) timeseries Region Call QPS per Instance. prometheus ops [{{instance}}]-[{{pod}}]-[{{request_type}}]
Region Call P99 per Instance histogram_quantile(0.99, sum by(instance, pod, le, request_type) (rate(greptime_grpc_region_request_bucket{instance=~"$frontend"}[$__rate_interval]))) timeseries Region Call P99 per Instance. prometheus s [{{instance}}]-[{{pod}}]-[{{request_type}}]
Frontend Handle Bulk Insert Elapsed Time sum by(instance, pod, stage) (rate(greptime_table_operator_handle_bulk_insert_sum[$__rate_interval]))/sum by(instance, pod, stage) (rate(greptime_table_operator_handle_bulk_insert_count[$__rate_interval]))
histogram_quantile(0.99, sum by(instance, pod, stage, le) (rate(greptime_table_operator_handle_bulk_insert_bucket[$__rate_interval])))
timeseries Per-stage time for frontend to handle bulk insert requests prometheus s [{{instance}}]-[{{pod}}]-[{{stage}}]-AVG

Mito Engine

Title Query Type Description Datasource Unit Legend Format
Request OPS per Instance sum by(instance, pod, type) (rate(greptime_mito_handle_request_elapsed_count{instance=~"$datanode"}[$__rate_interval])) timeseries Request QPS per Instance. prometheus ops [{{instance}}]-[{{pod}}]-[{{type}}]
Request P99 per Instance histogram_quantile(0.99, sum by(instance, pod, le, type) (rate(greptime_mito_handle_request_elapsed_bucket{instance=~"$datanode"}[$__rate_interval]))) timeseries Request P99 per Instance. prometheus s [{{instance}}]-[{{pod}}]-[{{type}}]
Write Buffer per Instance greptime_mito_write_buffer_bytes{instance=~"$datanode"} timeseries Write Buffer per Instance. prometheus decbytes [{{instance}}]-[{{pod}}]
Write Rows per Instance sum by (instance, pod) (rate(greptime_mito_write_rows_total{instance=~"$datanode"}[$__rate_interval])) timeseries Ingestion size by row counts. prometheus rowsps [{{instance}}]-[{{pod}}]
Flush OPS per Instance sum by(instance, pod, reason) (rate(greptime_mito_flush_requests_total{instance=~"$datanode"}[$__rate_interval])) timeseries Flush QPS per Instance. prometheus ops [{{instance}}]-[{{pod}}]-[{{reason}}]
Write Stall per Instance sum by(instance, pod) (greptime_mito_write_stall_total{instance=~"$datanode"}) timeseries Write Stall per Instance. prometheus -- [{{instance}}]-[{{pod}}]
Read Stage OPS per Instance sum by(instance, pod) (rate(greptime_mito_read_stage_elapsed_count{instance=~"$datanode", stage="total"}[$__rate_interval])) timeseries Read Stage OPS per Instance. prometheus ops [{{instance}}]-[{{pod}}]
Read Stage P99 per Instance histogram_quantile(0.99, sum by(instance, pod, le, stage) (rate(greptime_mito_read_stage_elapsed_bucket{instance=~"$datanode"}[$__rate_interval]))) timeseries Read Stage P99 per Instance. prometheus s [{{instance}}]-[{{pod}}]-[{{stage}}]
Write Stage P99 per Instance histogram_quantile(0.99, sum by(instance, pod, le, stage) (rate(greptime_mito_write_stage_elapsed_bucket{instance=~"$datanode"}[$__rate_interval]))) timeseries Write Stage P99 per Instance. prometheus s [{{instance}}]-[{{pod}}]-[{{stage}}]
Compaction OPS per Instance sum by(instance, pod) (rate(greptime_mito_compaction_total_elapsed_count{instance=~"$datanode"}[$__rate_interval])) timeseries Compaction OPS per Instance. prometheus ops [{{ instance }}]-[{{pod}}]
Compaction Elapsed Time per Instance by Stage histogram_quantile(0.99, sum by(instance, pod, le, stage) (rate(greptime_mito_compaction_stage_elapsed_bucket{instance=~"$datanode"}[$__rate_interval])))
sum by(instance, pod, stage) (rate(greptime_mito_compaction_stage_elapsed_sum{instance=~"$datanode"}[$__rate_interval]))/sum by(instance, pod, stage) (rate(greptime_mito_compaction_stage_elapsed_count{instance=~"$datanode"}[$__rate_interval]))
timeseries Compaction latency by stage prometheus s [{{instance}}]-[{{pod}}]-[{{stage}}]-p99
Compaction P99 per Instance histogram_quantile(0.99, sum by(instance, pod, le,stage) (rate(greptime_mito_compaction_total_elapsed_bucket{instance=~"$datanode"}[$__rate_interval]))) timeseries Compaction P99 per Instance. prometheus s [{{instance}}]-[{{pod}}]-[{{stage}}]-compaction
WAL write size histogram_quantile(0.95, sum by(le,instance, pod) (rate(raft_engine_write_size_bucket[$__rate_interval])))
histogram_quantile(0.99, sum by(le,instance,pod) (rate(raft_engine_write_size_bucket[$__rate_interval])))
sum by (instance, pod)(rate(raft_engine_write_size_sum[$__rate_interval]))
timeseries Write-ahead logs write size as bytes. This chart includes stats of p95 and p99 size by instance, total WAL write rate. prometheus bytes [{{instance}}]-[{{pod}}]-req-size-p95
Cached Bytes per Instance greptime_mito_cache_bytes{instance=~"$datanode"} timeseries Cached Bytes per Instance. prometheus decbytes [{{instance}}]-[{{pod}}]-[{{type}}]
Inflight Compaction greptime_mito_inflight_compaction_count timeseries Ongoing compaction task count prometheus none [{{instance}}]-[{{pod}}]
WAL sync duration seconds histogram_quantile(0.99, sum by(le, type, node, instance, pod) (rate(raft_engine_sync_log_duration_seconds_bucket[$__rate_interval]))) timeseries Raft engine (local disk) log store sync latency, p99 prometheus s [{{instance}}]-[{{pod}}]-p99
Log Store op duration seconds histogram_quantile(0.99, sum by(le,logstore,optype,instance, pod) (rate(greptime_logstore_op_elapsed_bucket[$__rate_interval]))) timeseries Write-ahead log operations latency at p99 prometheus s [{{instance}}]-[{{pod}}]-[{{logstore}}]-[{{optype}}]-p99
Inflight Flush greptime_mito_inflight_flush_count timeseries Ongoing flush task count prometheus none [{{instance}}]-[{{pod}}]
Compaction Input/Output Bytes sum by(instance, pod) (greptime_mito_compaction_input_bytes)
sum by(instance, pod) (greptime_mito_compaction_output_bytes)
timeseries Compaction oinput output bytes prometheus bytes [{{instance}}]-[{{pod}}]-input
Region Worker Handle Bulk Insert Requests histogram_quantile(0.95, sum by(le,instance, stage, pod) (rate(greptime_region_worker_handle_write_bucket[$__rate_interval])))
sum by(instance, stage, pod) (rate(greptime_region_worker_handle_write_sum[$__rate_interval]))/sum by(instance, stage, pod) (rate(greptime_region_worker_handle_write_count[$__rate_interval]))
timeseries Per-stage elapsed time for region worker to handle bulk insert region requests. prometheus s [{{instance}}]-[{{pod}}]-[{{stage}}]-P95
Active Series and Field Builders Count sum by(instance, pod) (greptime_mito_memtable_active_series_count)
sum by(instance, pod) (greptime_mito_memtable_field_builder_count)
timeseries Compaction oinput output bytes prometheus none [{{instance}}]-[{{pod}}]-series
Region Worker Convert Requests histogram_quantile(0.95, sum by(le, instance, stage, pod) (rate(greptime_datanode_convert_region_request_bucket[$__rate_interval])))
sum by(le,instance, stage, pod) (rate(greptime_datanode_convert_region_request_sum[$__rate_interval]))/sum by(le,instance, stage, pod) (rate(greptime_datanode_convert_region_request_count[$__rate_interval]))
timeseries Per-stage elapsed time for region worker to decode requests. prometheus s [{{instance}}]-[{{pod}}]-[{{stage}}]-P95
Cache Miss sum by (instance,pod, type) (rate(greptime_mito_cache_miss{instance=~"$datanode"}[$__rate_interval])) timeseries The local cache miss of the datanode. prometheus -- [{{instance}}]-[{{pod}}]-[{{type}}]

OpenDAL

Title Query Type Description Datasource Unit Legend Format
QPS per Instance sum by(instance, pod, scheme, operation) (rate(opendal_operation_duration_seconds_count{instance=~"$datanode"}[$__rate_interval])) timeseries QPS per Instance. prometheus ops [{{instance}}]-[{{pod}}]-[{{scheme}}]-[{{operation}}]
Read QPS per Instance sum by(instance, pod, scheme, operation) (rate(opendal_operation_duration_seconds_count{instance=~"$datanode", operation=~"read|Reader::read"}[$__rate_interval])) timeseries Read QPS per Instance. prometheus ops [{{instance}}]-[{{pod}}]-[{{scheme}}]-[{{operation}}]
Read P99 per Instance histogram_quantile(0.99, sum by(instance, pod, le, scheme, operation) (rate(opendal_operation_duration_seconds_bucket{instance=~"$datanode",operation=~"read|Reader::read"}[$__rate_interval]))) timeseries Read P99 per Instance. prometheus s [{{instance}}]-[{{pod}}]-[{{scheme}}]-[{{operation}}]
Write QPS per Instance sum by(instance, pod, scheme, operation) (rate(opendal_operation_duration_seconds_count{instance=~"$datanode", operation=~"write|Writer::write|Writer::close"}[$__rate_interval])) timeseries Write QPS per Instance. prometheus ops [{{instance}}]-[{{pod}}]-[{{scheme}}]-[{{operation}}]
Write P99 per Instance histogram_quantile(0.99, sum by(instance, pod, le, scheme, operation) (rate(opendal_operation_duration_seconds_bucket{instance=~"$datanode", operation =~ "Writer::write|Writer::close|write"}[$__rate_interval]))) timeseries Write P99 per Instance. prometheus s [{{instance}}]-[{{pod}}]-[{{scheme}}]-[{{operation}}]
List QPS per Instance sum by(instance, pod, scheme) (rate(opendal_operation_duration_seconds_count{instance=~"$datanode", operation="list"}[$__rate_interval])) timeseries List QPS per Instance. prometheus ops [{{instance}}]-[{{pod}}]-[{{scheme}}]
List P99 per Instance histogram_quantile(0.99, sum by(instance, pod, le, scheme) (rate(opendal_operation_duration_seconds_bucket{instance=~"$datanode", operation="list"}[$__rate_interval]))) timeseries List P99 per Instance. prometheus s [{{instance}}]-[{{pod}}]-[{{scheme}}]
Other Requests per Instance sum by(instance, pod, scheme, operation) (rate(opendal_operation_duration_seconds_count{instance=~"$datanode",operation!~"read|write|list|stat"}[$__rate_interval])) timeseries Other Requests per Instance. prometheus ops [{{instance}}]-[{{pod}}]-[{{scheme}}]-[{{operation}}]
Other Request P99 per Instance histogram_quantile(0.99, sum by(instance, pod, le, scheme, operation) (rate(opendal_operation_duration_seconds_bucket{instance=~"$datanode", operation!~"read|write|list|Writer::write|Writer::close|Reader::read"}[$__rate_interval]))) timeseries Other Request P99 per Instance. prometheus s [{{instance}}]-[{{pod}}]-[{{scheme}}]-[{{operation}}]
Opendal traffic sum by(instance, pod, scheme, operation) (rate(opendal_operation_bytes_sum{instance=~"$datanode"}[$__rate_interval])) timeseries Total traffic as in bytes by instance and operation prometheus decbytes [{{instance}}]-[{{pod}}]-[{{scheme}}]-[{{operation}}]
OpenDAL errors per Instance sum by(instance, pod, scheme, operation, error) (rate(opendal_operation_errors_total{instance=~"$datanode", error!="NotFound"}[$__rate_interval])) timeseries OpenDAL error counts per Instance. prometheus -- [{{instance}}]-[{{pod}}]-[{{scheme}}]-[{{operation}}]-[{{error}}]

Metasrv

Title Query Type Description Datasource Unit Legend Format
Region migration datanode greptime_meta_region_migration_stat{datanode_type="src"}
greptime_meta_region_migration_stat{datanode_type="desc"}
status-history Counter of region migration by source and destination prometheus -- from-datanode-{{datanode_id}}
Region migration error greptime_meta_region_migration_error timeseries Counter of region migration error prometheus none {{pod}}-{{state}}-{{error_type}}
Datanode load greptime_datanode_load timeseries Gauge of load information of each datanode, collected via heartbeat between datanode and metasrv. This information is for metasrv to schedule workloads. prometheus binBps Datanode-{{datanode_id}}-writeload
Rate of SQL Executions (RDS) rate(greptime_meta_rds_pg_sql_execute_elapsed_ms_count[$__rate_interval]) timeseries Displays the rate of SQL executions processed by the Meta service using the RDS backend. prometheus none {{pod}} {{op}} {{type}} {{result}}
SQL Execution Latency (RDS) histogram_quantile(0.90, sum by(pod, op, type, result, le) (rate(greptime_meta_rds_pg_sql_execute_elapsed_ms_bucket[$__rate_interval]))) timeseries Measures the response time of SQL executions via the RDS backend. prometheus ms {{pod}} {{op}} {{type}} {{result}} p90
Handler Execution Latency `histogram_quantile(0.90, sum by(pod, le, name) (
rate(greptime_meta_handler_execute_bucket[$__rate_interval])
))` timeseries Shows latency of Meta handlers by pod and handler name, useful for monitoring handler performance and detecting latency spikes.
prometheus s {{pod}} {{name}} p90
Heartbeat Packet Size histogram_quantile(0.9, sum by(pod, le) (greptime_meta_heartbeat_stat_memory_size_bucket)) timeseries Shows p90 heartbeat message sizes, helping track network usage and identify anomalies in heartbeat payload.
prometheus bytes {{pod}}
Meta Heartbeat Receive Rate rate(greptime_meta_heartbeat_rate[$__rate_interval]) timeseries Gauge of load information of each datanode, collected via heartbeat between datanode and metasrv. This information is for metasrv to schedule workloads. prometheus s {{pod}}
Meta KV Ops Latency histogram_quantile(0.99, sum by(pod, le, op, target) (greptime_meta_kv_request_elapsed_bucket)) timeseries Gauge of load information of each datanode, collected via heartbeat between datanode and metasrv. This information is for metasrv to schedule workloads. prometheus s {{pod}}-{{op}} p99
Rate of meta KV Ops rate(greptime_meta_kv_request_elapsed_count[$__rate_interval]) timeseries Gauge of load information of each datanode, collected via heartbeat between datanode and metasrv. This information is for metasrv to schedule workloads. prometheus none {{pod}}-{{op}} p99
DDL Latency histogram_quantile(0.9, sum by(le, pod, step) (greptime_meta_procedure_create_tables_bucket))
histogram_quantile(0.9, sum by(le, pod, step) (greptime_meta_procedure_create_table))
histogram_quantile(0.9, sum by(le, pod, step) (greptime_meta_procedure_create_view))
histogram_quantile(0.9, sum by(le, pod, step) (greptime_meta_procedure_create_flow))
histogram_quantile(0.9, sum by(le, pod, step) (greptime_meta_procedure_drop_table))
histogram_quantile(0.9, sum by(le, pod, step) (greptime_meta_procedure_alter_table))
timeseries Gauge of load information of each datanode, collected via heartbeat between datanode and metasrv. This information is for metasrv to schedule workloads. prometheus s CreateLogicalTables-{{step}} p90

Flownode

Title Query Type Description Datasource Unit Legend Format
Flow Ingest / Output Rate sum by(instance, pod, direction) (rate(greptime_flow_processed_rows[$__rate_interval])) timeseries Flow Ingest / Output Rate. prometheus -- [{{pod}}]-[{{instance}}]-[{{direction}}]
Flow Ingest Latency histogram_quantile(0.95, sum(rate(greptime_flow_insert_elapsed_bucket[$__rate_interval])) by (le, instance, pod))
histogram_quantile(0.99, sum(rate(greptime_flow_insert_elapsed_bucket[$__rate_interval])) by (le, instance, pod))
timeseries Flow Ingest Latency. prometheus -- [{{instance}}]-[{{pod}}]-p95
Flow Operation Latency histogram_quantile(0.95, sum(rate(greptime_flow_processing_time_bucket[$__rate_interval])) by (le,instance,pod,type))
histogram_quantile(0.99, sum(rate(greptime_flow_processing_time_bucket[$__rate_interval])) by (le,instance,pod,type))
timeseries Flow Operation Latency. prometheus -- [{{instance}}]-[{{pod}}]-[{{type}}]-p95
Flow Buffer Size per Instance greptime_flow_input_buf_size timeseries Flow Buffer Size per Instance. prometheus -- [{{instance}}]-[{{pod}]
Flow Processing Error per Instance sum by(instance,pod,code) (rate(greptime_flow_errors[$__rate_interval])) timeseries Flow Processing Error per Instance. prometheus -- [{{instance}}]-[{{pod}}]-[{{code}}]