greptimedb

mirror of https://github.com/GreptimeTeam/greptimedb.git synced 2026-05-17 05:20:37 +00:00

Go to file

Yvan Wang d1873ca31d fix(metric-engine): validate column types and require time index in verify_rows (#8018 )

* fix(metric-engine): validate column types and require time index in verify_rows

The remote-write path into the metric engine previously bypassed schema
validation. When a row's time index column carried a non-timestamp
datatype (e.g. a string), the request reached mito's ValueBuilder::push
for the timestamp builder and panicked instead of surfacing a typed
error.

Cache the (column_id, data_type, semantic_type) tuple for each physical
column on PhysicalRegionState and use it in verify_rows to:

- reject columns whose datatype or semantic type disagrees with the
  physical region's schema (mirrors mito's WriteRequest::check_schema)
- reject requests that omit the time index column entirely

Field columns stay optional; tag completeness needs per-logical-region
metadata that verify_rows doesn't have and is left to a follow-up.

Fixes #7990.

Signed-off-by: BootstrapperSBL <yvanwww01@gmail.com>

* refactor(metric-engine): simplify PhysicalColumnInfo construction

- Add From<ColumnMetadata> and From<&ColumnMetadata> for PhysicalColumnInfo
  so call sites can use metadata.into() instead of repeating the field list.
- Replace the four struct-literal constructions in create.rs, open.rs and
  alter.rs with the conversion.
- In verify_rows, pass &col.column_name to ColumnNotFoundSnafu instead of
  cloning it explicitly (snafu's context handles the conversion).

Signed-off-by: BootstrapperSBL <yvanwww01@gmail.com>

* perf(metric-engine): cache time index column name in PhysicalRegionState

verify_rows previously scanned every physical column on each row batch to
find the timestamp column. Since the time index is fixed at region
creation and never changes, stash its name on PhysicalRegionState when
the region is first registered and read it directly from there.

add_physical_columns carries a debug_assert to document the invariant
that alter never introduces a new time index.

Signed-off-by: BootstrapperSBL <yvanwww01@gmail.com>

* perf(metric-engine): borrow physical column names when building name_to_id

On the row-write path we built a HashMap<String, ColumnId> by cloning
every column name out of the physical region's cached state. The map is
scoped to the block that holds the state's read guard, so there's no
need to own the keys.

Switch the map to HashMap<&str, ColumnId> and widen RowsIter::new /
IterIndex::new to accept any key type that borrows as str. Existing
test helpers that pass HashMap<String, ColumnId> keep working through
the Borrow<str> bound.

Signed-off-by: BootstrapperSBL <yvanwww01@gmail.com>

* fix: validate metric rows against physical schema

Cache physical column metadata in the metric engine state so row validation and row modification can use the same source of truth for column IDs, data types, and semantic types.

Validate incoming metric rows against the physical schema before writes. Put requests now require the time index and the expected field column, while delete requests keep accepting primary-key-plus-timestamp payloads by skipping the field completeness check.

Pass physical column metadata directly into RowsIter instead of rebuilding a name-to-column-id map at each call site, and cover the new validation paths with tests for missing time indexes, missing fields, and duplicate field columns.

Signed-off-by: evenyag <realevenyag@gmail.com>

* fix: do not allow adding a new field

Signed-off-by: evenyag <realevenyag@gmail.com>

* fix: fill default value for fields

Signed-off-by: evenyag <realevenyag@gmail.com>

* fix: fill default for nullable fields

Signed-off-by: evenyag <realevenyag@gmail.com>

---------

Signed-off-by: BootstrapperSBL <yvanwww01@gmail.com>
Signed-off-by: evenyag <realevenyag@gmail.com>
Co-authored-by: BootstrapperSBL <yvanwww01@gmail.com>
Co-authored-by: evenyag <realevenyag@gmail.com>

2026-05-07 12:41:07 +00:00

.cargo

feat: put sqlness into a separated dir (#6911 )

2025-09-05 01:39:29 +00:00

.config

build: on windows (#2054 )

2023-08-10 08:08:37 +00:00

.github

chore: update the opendal to 0.56 rc2 (#8003 )

2026-04-26 09:59:48 +00:00

config

refactor: unify frontend discovery with active peer discovery (#8031 )

2026-04-27 06:40:37 +00:00

cyborg

ci: handle prerelease version (#7492 )

2025-12-29 08:21:05 +00:00

docker

fix!: align gRPC CLI option names with config naming (#8021 )

2026-04-24 09:51:01 +00:00

docs

docs: rfc for remote dyn filter (#7931 )

2026-04-15 11:19:48 +00:00

grafana

chore: add grafana dashboard about trigger (#7536 )

2026-01-08 06:47:46 +00:00

scripts

feat: add TLS support for mysql backend (#6979 )

2025-09-16 13:46:37 +00:00

src

fix(metric-engine): validate column types and require time index in verify_rows (#8018 )

2026-05-07 12:41:07 +00:00

tests

fix(metric-engine): validate column types and require time index in verify_rows (#8018 )

2026-05-07 12:41:07 +00:00

tests-fuzz

chore: update the opendal to 0.56 rc2 (#8003 )

2026-04-26 09:59:48 +00:00

tests-integration

feat(operator): allow last_row merge mode with append mode (#8065 )

2026-05-07 07:21:37 +00:00

.dockerignore

fix: docker build (#1822 )

2023-06-25 11:05:46 +08:00

.editorconfig

feat: to_timezone function (#3470 )

2024-03-12 01:46:19 +00:00

.env.example

feat: add GcsConfig credential field (#4568 )

2024-08-16 03:11:20 +00:00

.gitignore

feat(meta): add dropped table tombstone metadata helpers (#8040 )

2026-04-28 08:44:02 +00:00

.pre-commit-config.yaml

chore: check for redundant pre-commit hooks (#7506 )

2026-01-07 13:46:42 +00:00

AUTHOR.md

chore: members and committers update (#7341 )

2025-12-04 04:08:43 +00:00

Cargo.lock

chore(deps): bump rustls-webpki from 0.103.10 to 0.103.13 (#8077 )

2026-05-07 09:38:33 +00:00

Cargo.toml

chore: update opendal's version to official 0.56 (#8069 )

2026-05-07 03:17:16 +00:00

cliff.toml

chore: fix git cliff errors in latest version (#7947 )

2026-04-13 09:11:38 +00:00

codecov.yml

refactor: refactor TableRouteManager (#3392 )

2024-02-28 06:18:09 +00:00

CONTRIBUTING.md

fix: typo in AI-assisted contributions policy (#7472 )

2025-12-25 03:03:14 +00:00

Cross.toml

fix: cross compiling for aarch64 targets and allow customizing page size (#5487 )

2025-02-07 11:21:16 +00:00

flake.lock

chore: update rust toolchain to 2026-03-21 (#7849 )

2026-03-30 12:13:14 +00:00

flake.nix

chore: update rust toolchain to 2026-03-21 (#7849 )

2026-03-30 12:13:14 +00:00

LICENSE

chore: multiple licenses fixes (#2714 )

2023-11-09 10:38:12 +00:00

licenserc.toml

feat: trigger alter parse (#6553 )

2025-07-29 11:07:31 +00:00

Makefile

ci: update dev-builder image tag (#7894 )

2026-04-01 02:43:25 +00:00

README.md

fix!: align gRPC CLI option names with config naming (#8021 )

2026-04-24 09:51:01 +00:00

rust-toolchain.toml

chore: update rust toolchain to 2026-03-21 (#7849 )

2026-03-30 12:13:14 +00:00

rustfmt.toml

chore: specify import style in rustfmt (#460 )

2022-11-15 15:58:54 +08:00

SECURITY.md

feat: Create SECURITY.md (#1270 )

2023-03-28 19:14:29 +08:00

taplo.toml

chore: skip reorder workspace tables in taplo (#3388 )

2024-02-26 08:57:49 +00:00

typos.toml

chore: adjust manifest cache log level (#7655 )

2026-02-03 07:08:52 +00:00

README.md

One database for metrics, logs, and traces
replacing Prometheus, Loki, and Elasticsearch

The unified OpenTelemetry backend — with SQL + PromQL on object storage.

User Guide | API Docs | Roadmap 2026

Introduction
⭐ Key Features
How GreptimeDB Compares
Architecture
Try GreptimeDB
Getting Started
Build From Source
Tools & Extensions
Project Status
Community
License
Commercial Support
Contributing
Acknowledgement

Introduction

GreptimeDB is an open-source observability database built for Observability 2.0 — treating metrics, logs, and traces as one unified data model (wide events) instead of three separate pillars.

Use it as the single OpenTelemetry backend — replacing Prometheus, Loki, and Elasticsearch with one database built on object storage. Query with SQL and PromQL, scale without pain, cut costs up to 50x.

Features

Feature	Description
Drop-in replacement	PromQL, Prometheus remote write, Jaeger, and OpenTelemetry native. Use as your single backend for all three signals, or migrate one at a time.
50x lower cost	Object storage (S3, GCS, Azure Blob etc.) as primary storage. Compute-storage separation scales without pain.
SQL + PromQL	Monitor with PromQL, analyze with SQL. One database replaces Prometheus + your data warehouse.
Sub-second at PB-EB scale	Columnar engine with fulltext, inverted, and skipping indexes. Written in Rust.

✅ Perfect for:

Replacing Prometheus + Loki + Elasticsearch with one database
Scaling past Prometheus — high cardinality, long-term storage, no Thanos/Mimir overhead
Cutting observability costs with object storage (up to 50x savings on traces, 30% on logs)
AI/LLM observability — store and analyze high-volume conversation data, agent traces, and token metrics via OpenTelemetry GenAI conventions
Edge-to-cloud observability with unified APIs on resource-constrained devices

Why Observability 2.0? The three-pillar model (separate databases for metrics, logs, traces) creates data silos and operational complexity. GreptimeDB treats all observability data as timestamped wide events in a single columnar engine — enabling cross-signal SQL JOINs, eliminating redundant infrastructure, and naturally supporting emerging workloads like AI agent observability. Read more: Observability 2.0 and the Database for It.

Learn more in Why GreptimeDB.

How GreptimeDB Compares

Feature	GreptimeDB	Prometheus / Thanos / Mimir	Grafana Loki	Elasticsearch
Data types	Metrics, logs, traces	Metrics only	Logs only	Logs, traces
Query language	SQL + PromQL	PromQL	LogQL	Query DSL
Storage	Native object storage (S3, etc.)	Local disk + object storage (Thanos/Mimir)	Object storage (chunks)	Local disk
Scaling	Compute-storage separation, stateless nodes	Federation / Thanos / Mimir — multi-component, ops heavy	Stateless + object storage	Shard-based, ops heavy
Cost efficiency	Up to 50x lower storage	High at scale	Moderate	High (inverted index overhead)
OpenTelemetry	Native (metrics + logs + traces)	Partial (metrics only)	Partial (logs only)	Via instrumentation

Benchmarks:

Architecture

GreptimeDB can run in two modes:

Standalone Mode - Single binary for development and small deployments
Distributed Mode - Separate components for production scale:
- Frontend: Query processing and protocol handling
- Datanode: Data storage and retrieval
- Metasrv: Metadata management and coordination

Read the architecture document. DeepWiki provides an in-depth look at GreptimeDB:

Try GreptimeDB

docker pull greptime/greptimedb

docker run -p 127.0.0.1:4000-4003:4000-4003 \
  -v "$(pwd)/greptimedb_data:/greptimedb_data" \
  --name greptime --rm \
  greptime/greptimedb:latest standalone start \
  --http-addr 0.0.0.0:4000 \
  --grpc-bind-addr 0.0.0.0:4001 \
  --mysql-addr 0.0.0.0:4002 \
  --postgres-addr 0.0.0.0:4003

Dashboard: http://localhost:4000/dashboard

Getting Started

Build From Source

Prerequisites:

Rust toolchain (nightly)
Protobuf compiler (>= 3.15)
C/C++ building essentials, including gcc/g++/autoconf and glibc library (eg. libc6-dev on Ubuntu and glibc-devel on Fedora)
Python toolchain (optional): Required only if using some test scripts.

Build and Run:

make
cargo run -- standalone start

Tools & Extensions

Kubernetes: GreptimeDB Operator
Helm Charts: Greptime Helm Charts
Dashboard: Web UI
gRPC Ingester: Go, Java, C++, Erlang, Rust, .NET
Grafana Data Source: GreptimeDB Grafana data source plugin
Grafana Dashboard: Official Dashboard for monitoring

Project Status

Status: v1.0 GA — generally available and production-ready! 🎉

Deployed in production handling billions of data points daily
Stable APIs, actively maintained, with regular releases (version info)

GreptimeDB v1.0 marks a major milestone — stable APIs, production readiness, and proven performance at scale.

Learn more: v1.0 highlights and 2026 roadmap.

For production use, we recommend v1.0 or later.

If you find this project useful, a ⭐ would mean a lot to us!

Community

We invite you to engage and contribute!

License

GreptimeDB is licensed under the Apache License 2.0.

Commercial Support

Running GreptimeDB in your organization? We offer enterprise add-ons, services, training, and consulting. Contact us for details.

Contributing

Read our Contribution Guidelines.
Explore Internal Concepts and DeepWiki.
Pick up a good first issue and join the #contributors Slack channel.

Acknowledgement

Special thanks to all contributors! See AUTHORS.md.

Uses Apache Arrow™ (memory model)
Apache Parquet™ (file storage)
Apache DataFusion™ (query engine)
Apache OpenDAL™ (data access abstraction)

Description

Open-source, cloud-native, unified observability database for metrics, logs and traces, supporting SQL/PromQL/Streaming.

analytics cloud-native database distributed greptimedb logs metrics monitoring observability observability-database observability-datalake promql rust rust-database sql time-series traces tsdb

Readme Apache-2.0 853 MiB

README.md

One database for metrics, logs, and traces replacing Prometheus, Loki, and Elasticsearch

User Guide | API Docs | Roadmap 2026

Introduction

Features

How GreptimeDB Compares

Architecture

Try GreptimeDB

Getting Started

Build From Source

Tools & Extensions

Project Status

Community

License

Commercial Support

Contributing

Acknowledgement

One database for metrics, logs, and traces
replacing Prometheus, Loki, and Elasticsearch