* feat: expose region read load through Prometheus metrics and heartbeat Introduce region-level query load tracking (CPU time and scanned bytes) collected by `RegionScanExec`, exposed via Prometheus metrics and optionally reported through heartbeat region stats. - **Region metrics** (`src/mito2/src/metrics.rs`, `src/store-api/src/metrics.rs`): Add `greptime_mito_region_query_cpu_time`, `greptime_mito_region_query_scanned_bytes`, and `greptime_mito_region_written_bytes_since_open` gauge metrics. - **MitoRegion** (`src/mito2/src/region.rs`, `src/mito2/src/region/opener.rs`, `src/mito2/src/region_write_ctx.rs`): Replace `AtomicU64` `written_bytes` with `IntGauge`; add `query_cpu_time`/`query_scanned_bytes` fields with lifecycle management (init, reset, remove-on-drop). - **RegionStatistic** (`src/store-api/src/region_engine.rs`, `src/store-api/src/storage/requests.rs`): Add `query_cpu_time` and `query_scanned_bytes` fields. - **Metric-engine** (`src/metric-engine/src/utils.rs`): Aggregate query load from metadata and data regions. - **Heartbeat** (`src/datanode/src/heartbeat.rs`, `src/common/meta/src/datanode.rs`): Relay region query load via heartbeat `RegionStat`; add test. - **Query engine** (`src/query/src/options.rs`, `src/query/src/query_engine/state.rs`, `src/query/src/datafusion.rs`, `src/query/src/dist_plan/merge_scan.rs`, `src/query/src/dist_plan/analyzer.rs`, `src/query/src/dummy_catalog.rs`): Add `enable_region_query_load_report` config; wire `RegionScanExec` to accumulate CPU time and scanned bytes. - **Table scan** (`src/table/src/table/scan.rs`, `src/table/src/table/metrics.rs`): Wire table scan metrics. - **Config** (`config/standalone.example.toml`, `config/datanode.example.toml`, `config/frontend.example.toml`, `config/config.md`): Add example config and documentation for `enable_region_query_load_report`. - **Tests** (`src/mito2/src/engine/basic_test.rs`, `src/mito2/src/engine/close_test.rs`, `src/cmd/tests/load_config_test.rs`, `src/flow/src/adapter.rs`): Add unit tests for region query load reporting and metric cleanup on region close; set default config values. Signed-off-by: Lei, HUANG <mrsatangel@gmail.com> * feat: move region read load report config from query layer to mito engine Move the `enable_region_query_load_report` setting from query-level config (`QueryOptions`/`DistPlannerOptions`) into the mito2 storage engine config (`MitoConfig`), and expose it through the `RegionScanner` trait instead of `ScanRequest`/`PrepareRequest`. - Mito config: `src/mito2/src/config.rs`, `src/mito2/src/engine.rs` - Scan region plumbing: `src/mito2/src/read/scan_region.rs` - RegionScanner trait: `src/store-api/src/region_engine.rs` - Scanner impls: `src/mito2/src/read/seq_scan.rs`, `src/mito2/src/read/series_scan.rs`, `src/mito2/src/read/unordered_scan.rs` - RegionScanExec: `src/table/src/table/scan.rs` - Removed from query layer: `src/query/src/options.rs`, `src/query/src/dist_plan/analyzer.rs`, `src/query/src/query_engine/state.rs`, `src/query/src/datafusion.rs`, `src/query/src/dummy_catalog.rs` - Removed from test/config: `src/query/src/dist_plan/analyzer/test.rs`, `src/flow/src/adapter.rs`, `src/cmd/tests/load_config_test.rs`, `src/store-api/src/storage/requests.rs` - Config docs: `config/config.md`, `config/datanode.example.toml`, `config/frontend.example.toml`, `config/standalone.example.toml` Signed-off-by: Lei, HUANG <mrsatangel@gmail.com> * feat: move region query load report config from MitoConfig to LoggingOptions Relocate the `enable_region_query_load_report` setting from `MitoConfig` to `LoggingOptions` (as `enable_per_region_metrics`), and thread it into `MitoEngineBuilder` instead of reading from the engine config directly. This makes the region read-load reporting a per-node logging/observability concern rather than a per-engine storage setting. - `config/config.md` - `config/datanode.example.toml` - `config/standalone.example.toml` - `src/common/telemetry/src/logging.rs` - `src/datanode/src/datanode.rs` - `src/mito2/src/config.rs` - `src/mito2/src/engine.rs` - `src/mito2/src/region.rs` Signed-off-by: Lei Huang <lei@huang.to> Signed-off-by: Lei, HUANG <mrsatangel@gmail.com> * feat: report region query load on stream drop instead of stream end Move `report_region_query_load()` from `StreamWithMetricWrapper::poll_next()` to `Drop::drop()` so that region query load is reported even when the stream is dropped prematurely (not just when fully consumed). Affected files: - `src/table/src/table/scan.rs` Signed-off-by: Lei, Huang <huanglei@qiyi.com> Signed-off-by: Lei, HUANG <mrsatangel@gmail.com> * feat: make region query load reporting configurable Introduce `enable_region_query_load_report` flag to optionally report per-region `query_cpu_time` and `query_scanned_bytes` metrics instead of always creating them. When disabled, the Prometheus gauges are not created (`None`), avoiding metric churn for workloads that do not need query-level load tracking. - `src/common/meta/src/datanode.rs` — Placeholder fields for query load - `src/mito2/src/region.rs` — Make query metrics `Option<IntGauge>`, conditional create/remove/reset - `src/mito2/src/region/opener.rs` — Thread flag through `RegionOpener` - `src/mito2/src/worker.rs` — Thread flag through `WorkerGroup`/`WorkerStarter`/`RegionWorkerLoop` - `src/mito2/src/worker/handle_catchup.rs` — Pass flag on region open - `src/mito2/src/worker/handle_create.rs` — Pass flag on region create - `src/mito2/src/worker/handle_open.rs` — Pass flag on region open - `src/mito2/src/engine.rs` — Pass flag from `MitoEngineBuilder` - `src/mito2/src/test_util.rs` — Test helpers for both modes - `src/mito2/src/engine/basic_test.rs` — Cover disabled and preserve cases - `src/mito2/src/engine/close_test.rs` — Adapt to optional metrics Signed-off-by: Lei, HUANG <mrsatangel@gmail.com> * refactor: remove elapsed_compute metric from scan stream The elapsed_compute metric conflated poll-wait time with actual CPU computation, making it misleading. Removed the metric and its recording path from StreamMetrics and StreamWithMetricWrapper. Added a test asserting that poll duration is not reported as elapsed_compute. - `src/table/src/table/metrics.rs` — removed elapsed_compute field, builder, and record_elapsed_compute method - `src/table/src/table/scan.rs` — removed record_elapsed_compute call; added SlowRecordBatchStream test helper and wrapper_poll_time_is_not_elapsed_compute test Signed-off-by: Lei, HUANG <mrsatangel@gmail.com> * feat: disable region query load report for compaction scans Compaction scans are internal operations initiated by the engine, not user queries. Disable region query load reporting when the scan input is marked as compaction to avoid misleading load metrics. - `src/mito2/src/read/scan_region.rs` — set `enable_region_query_load_report` to `false` when compaction is enabled; add unit test Signed-off-by: Lei, HUANG <mrsatangel@gmail.com> * test: add `enable_per_region_metrics` config to HTTP integration test - Enable per-region metrics config in HTTP test setup \`tests-integration/tests/http.rs\` Signed-off-by: Lei, HUANG <mrsatangel@gmail.com> * refactor: remove region query load reporting tests and helpers Remove the region query load reporting feature from the codebase, including tests, test utilities, and helper infrastructure that were part of this now-deprecated functionality. Specifically: - Remove region query load reporting tests from `src/mito2/src/engine/basic_test.rs` and `src/table/src/table/scan.rs`, and the region close metrics test from `src/mito2/src/engine/close_test.rs` - Remove region query load report test utilities and simplify engine construction helpers in `src/mito2/src/test_util.rs` Signed-off-by: Lei, HUANG <mrsatangel@gmail.com> * perf: avoid disabled region query load timing Summary: - Avoid per-poll `Instant::now` and elapsed-time accumulation when `enable_region_query_load_report` is disabled. - Keep region query-load CPU accounting active only when reporting is enabled. Files: - `src/table/src/table/scan.rs` Signed-off-by: Lei, HUANG <mrsatangel@gmail.com> * feat: move per-region query load reporting from storage to query engine Move `enable_per_region_metrics` from datanode to frontend config and migrate query load tracking (CPU time, scanned bytes) from mito2 storage engine to the query engine's distributed scan planner. The storage-level metrics plumbing and `enable_region_query_load_report` flag are removed from mito2, `ScanInput`, `ScanRegion`, and `RegionScanner`. Query-level metrics are now collected in `merge_scan.rs` via `scan_region_load`. - `src/mito2/` -- Remove `query_cpu_time`, `query_scanned_bytes` metrics, `enable_region_query_load_report` plumbing from engine, region, opener, scanner types, workers - `src/store-api/` -- Remove `query_cpu_time`, `query_scanned_bytes` from `RegionStatistic` - `src/metric-engine/` -- Remove query load fields from `get_region_statistic` - `src/query/` -- Add `enable_per_region_metrics` to `QueryOptions`; wire through planner, optimizer, merge scan with `scan_region_load` metrics - `src/frontend/` -- Pass `enable_per_region_metrics` into `QueryOptions` - `src/common/meta/` -- Remove TODO for query load fields - `config/` -- Move `enable_per_region_metrics` from datanode to frontend and standalone example configs - `src/cmd/tests/` -- Add `enable_per_region_metrics` to flownode config test - `src/flow/` -- Add `enable_per_region_metrics` default to flownode options - `src/table/` -- Remove unused query load fields from scan - `src/datanode/` -- Remove `with_enable_region_query_load_report` calls Signed-off-by: Lei, HUANG <mrsatangel@gmail.com> * refactor: remove obsolete mito write load metric Remove obsolete mito-side region written-bytes metric plumbing that is not needed by the frontend read-load reporting path. Related files: - \`src/mito2/src/metrics.rs\` - \`src/mito2/src/region.rs\` - \`src/mito2/src/region/opener.rs\` - \`src/mito2/src/region_write_ctx.rs\` - \`src/mito2/src/engine/basic_test.rs\` - \`src/mito2/src/worker.rs\` - \`src/mito2/src/config.rs\` Signed-off-by: Lei, HUANG <mrsatangel@gmail.com> * feat: change region query load metrics from gauge to counter Change `REGION_QUERY_CPU_TIME` and `REGION_QUERY_SCANNED_BYTES` from `IntGaugeVec` to `IntCounterVec` since these values are monotonically increasing and do not need gauge semantics. Update corresponding `add` calls to `inc_by` in merge scan reporting. Files: - `src/store-api/src/metrics.rs` — metric type and label changes - `src/query/src/dist_plan/merge_scan.rs` — caller adaptation Signed-off-by: Lei, HUANG <mrsatangel@gmail.com> * refactor: pass ReadItem directly to report_region_query_load Move `region_scan_load` call to the caller, so `report_region_query_load` accepts the already-computed `ReadItem` instead of `RecordBatchMetrics`. - `src/query/src/dist_plan/merge_scan.rs` — update signature, inline call, remove stale test Signed-off-by: Lei, HUANG <mrsatangel@gmail.com> * feat: ensure region query load is reported on MergeScanExec drop Remove the `enable_per_region_metrics` parameter from `report_region_query_load` so region load metrics are always emitted. Add a `Drop` impl for `MergeScanExec` that reports sub-stage metrics when the executor is dropped, covering edge cases where per-region metric emission was missed. Add a unit test verifying CPU time and scanned bytes are recorded on drop. Affected file: `src/query/src/dist_plan/merge_scan.rs` Signed-off-by: Lei, HUANG <mrsatangel@gmail.com> * fix: gate region query load reporting Guard drop-time region query load reporting with the configured per-region metrics flag. Related files: - \`src/query/src/dist_plan/merge_scan.rs\` Symbols: - \`MergeScanExec::drop\` - \`enable_per_region_metrics\` Signed-off-by: Lei, HUANG <mrsatangel@gmail.com> * fix: clean region query load metrics on drop Remove per-region query load metric labels when a region is dropped so stale label series do not remain in the registry. Related files: - \`src/mito2/src/region.rs\` Symbols: - \`MitoRegion::drop\` - \`REGION_QUERY_CPU_TIME\` - \`REGION_QUERY_SCANNED_BYTES\` Signed-off-by: Lei, HUANG <mrsatangel@gmail.com> --------- Signed-off-by: Lei, HUANG <mrsatangel@gmail.com> Signed-off-by: Lei Huang <lei@huang.to> Signed-off-by: Lei, Huang <huanglei@qiyi.com>
One database for metrics, logs, and traces
replacing Prometheus, Loki, and Elasticsearch
The unified OpenTelemetry backend — with SQL + PromQL on object storage.
- Introduction
- Overview
- Features
- How GreptimeDB Compares
- Architecture
- Try GreptimeDB
- Getting Started
- Build From Source
- Tools & Extensions
- Project Status
- Community
- License
- Commercial Support
- Contributing
- Acknowledgement
Introduction
GreptimeDB is an open-source observability database built for Observability 2.0 — treating metrics, logs, and traces as one unified data model (wide events) instead of three separate pillars.
Use it as the single OpenTelemetry backend — replacing Prometheus, Loki, and Elasticsearch with one database built on object storage. Query with SQL and PromQL, scale without pain, cut costs up to 50×.
Overview
A quick overview of what GreptimeDB ingests, how it connects to other systems, and what its distributed engine lets you do.
Features
| Feature | Description |
|---|---|
| Observability 2.0 native | Logs, metrics, and traces in one engine with SQL + PromQL. Native OpenTelemetry, Prometheus remote write, and Jaeger. Migrate one signal at a time, or use as a single backend. |
| Elastic compute-storage separation | Scale reads independently with horizontal replicas. Serve high-concurrency workloads from dashboards, alerting, and AI agents — without resharding or data migration. |
| Sub-second on PB–EB-scale data | Columnar engine with fulltext, inverted, and skipping indexes. Written in Rust. Designed for high-concurrency point queries, not just analytical scans. |
| 50× lower cost | Object storage (S3, GCS, Azure Blob) as primary storage, with a tiered cache (memory + local disk) to keep writes and queries fast. |
Perfect for:
- Replacing Prometheus + Loki + Elasticsearch with a single observability backend
- Scaling past Prometheus — high cardinality, long-term storage, no Thanos/Mimir overhead
- AI/agent workloads — store GenAI telemetry (OTel GenAI conventions), and serve high-concurrency reads from SRE/developer agents via horizontal read replicas
- Cutting observability costs with object storage (up to 50× savings on traces, 30% on logs)
- Edge-to-cloud observability with unified APIs on resource-constrained devices
Why Observability 2.0? Three separate databases for metrics, logs, and traces means three storage layers, three query languages, and three sets of dashboards. GreptimeDB stores all three as timestamped wide events in one columnar engine — JOIN across signals in SQL, run one stack instead of three, and ingest AI agent telemetry the same way. Read more: Observability 2.0 and the Database for It.
Learn more in Why GreptimeDB.
How GreptimeDB Compares
| Capability | GreptimeDB | Prometheus / Thanos / Mimir | Grafana Loki | Elasticsearch |
|---|---|---|---|---|
| Data types | Metrics, logs, traces | Metrics only | Logs only | Logs, traces |
| Query language | SQL + PromQL | PromQL | LogQL | Query DSL |
| Storage | Native object storage (S3, etc.) | Local disk + object storage (Thanos/Mimir) | Object storage (chunks) | Local disk |
| Scaling | Compute-storage separation, stateless nodes | Federation / Thanos / Mimir — multi-component, ops heavy | Stateless + object storage | Shard-based, ops heavy |
| Cost efficiency | Up to 50× lower storage cost | High at scale | Moderate | High (inverted index overhead) |
| OpenTelemetry | Native (metrics + logs + traces) | Partial (metrics only) | Partial (logs only) | Via instrumentation |
Benchmarks:
Architecture
GreptimeDB can run in two modes:
- Standalone — single binary for development and small deployments.
- Distributed — four components, each independently scalable:
- Frontend — protocol entry (OTel, Prometheus, MySQL/PostgreSQL, gRPC, ingestion APIs for Elasticsearch/InfluxDB/Loki) and the distributed query engine. Stateless, scales horizontally.
- Datanode — region engine with WAL, memtable, SST, cache, compaction, and indexes. Persists data to object storage. Elastic.
- Metasrv — metadata, routing, repartitioning, autopilot, and security. Backed by a pluggable KV layer (etcd or RDS).
- Flownode (optional) — continuous flow computation (streaming and materialized views).
For deeper coverage, see the architecture doc or DeepWiki.
Try GreptimeDB
For AI agents — paste this prompt into your agent:
Read https://docs.greptime.com/SKILL.md and follow the instructions
to deploy, configure, ingest, and query GreptimeDB.
docker run -p 127.0.0.1:4000-4003:4000-4003 \
-v "$(pwd)/greptimedb_data:/greptimedb_data" \
--name greptime --rm \
greptime/greptimedb:latest standalone start \
--http-addr 0.0.0.0:4000 \
--rpc-bind-addr 0.0.0.0:4001 \
--mysql-addr 0.0.0.0:4002 \
--postgres-addr 0.0.0.0:4003
Dashboard: http://localhost:4000/dashboard
Read more in the full Install Guide.
Troubleshooting:
- Cannot connect to the database? Ensure that ports
4000,4001,4002, and4003are not blocked by a firewall or used by other services. - Failed to start? Check the container logs with
docker logs greptimefor further details.
Getting Started
Build From Source
Prerequisites:
- Rust toolchain — nightly, pinned by
rust-toolchain.toml - Protobuf compiler (>= 3.15)
- C/C++ building essentials:
gcc/g++/autoconfand the glibc dev package (libc6-devon Ubuntu,glibc-develon Fedora) - Python toolchain (optional, only for some test scripts)
Build and run:
make # build greptime binary
cargo run -- standalone start # start in standalone mode
Common dev commands:
make fmt # format Rust code
make clippy # lint (fails on warnings)
make test # unit + integration tests (uses cargo-nextest)
make sqlness-test # SQL regression tests
See the Contribution Guidelines for the full developer workflow.
Tools & Extensions
- Kubernetes: GreptimeDB Operator
- Helm Charts: Greptime Helm Charts
- Dashboard: Web UI
- gRPC Ingester: Go, Java, C++, Erlang, Rust, .NET
- Grafana Data Source: GreptimeDB Grafana data source plugin
- Grafana Dashboard: Official Dashboard for monitoring
Project Status
GreptimeDB is at v1.0 GA with stable APIs and regular releases. It runs in production at scale — OceanBase Cloud operates 80+ GreptimeDB clusters managing 300 TB of logs, cutting log storage cost by 60% after migrating from Grafana Loki. See more in case studies.
Read the v1.0 highlights and 2026 roadmap, or browse the version reference.
If GreptimeDB is useful to you, please star the repo.
Community
We invite you to engage and contribute!
License
GreptimeDB is licensed under the Apache License 2.0.
Commercial Support
Running GreptimeDB in your organization? We offer enterprise add-ons, services, training, and consulting. Contact us for details.
Contributing
- Read our Contribution Guidelines.
- Explore Internal Concepts and DeepWiki.
- Pick up a good first issue and join the #contributors Slack channel.
Acknowledgement
Special thanks to all contributors! See AUTHOR.md.
- Uses Apache Arrow™ (memory model)
- Apache Parquet™ (file storage)
- Apache DataFusion™ (query engine)
- Apache OpenDAL™ (data access abstraction)
All trademarks, logos, and brand names referenced in this README and in the Overview diagram are the property of their respective owners. Their use is for identification purposes only and does not imply endorsement or affiliation.
