rust/greptimedb: Open-source, cloud-native, unified observability database for metrics, logs and traces, supporting SQL/PromQL/Streaming. - greptimedb - Gitea: Git with a cup of tea

rust/greptimedb

mirror of https://github.com/GreptimeTeam/greptimedb.git synced 2026-07-07 22:40:38 +00:00

Go to file

Lei, HUANG 9f1aefe98f feat: allow one to many VRL pipeline (#7342 )

* feat/allow-one-to-many-pipeline:
### Enhance Pipeline Processing for One-to-Many Transformations

- **Support One-to-Many Transformations**:
- Updated `processor.rs`, `etl.rs`, `vrl_processor.rs`, and `greptime.rs` to handle one-to-many transformations by allowing VRL processors to return arrays, expanding each element into separate rows.
- Introduced `transform_array_elements` and `values_to_rows` functions to facilitate this transformation.

- **Error Handling Enhancements**:
- Added new error types in `error.rs` to handle cases where array elements are not objects and for transformation failures.

- **Testing Enhancements**:
- Added tests in `pipeline.rs` to verify one-to-many transformations, single object processing, and error handling for non-object array elements.

- **Context Management**:
- Modified `ctx_req.rs` to clone `ContextOpt` when adding rows, ensuring correct context management during transformations.

- **Server Pipeline Adjustments**:
- Updated `pipeline.rs` in `servers` to handle transformed outputs with one-to-many row expansions, ensuring correct row padding and request formation.

Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>

* feat/allow-one-to-many-pipeline:
Add one-to-many VRL pipeline test in `http.rs`

- Introduced `test_pipeline_one_to_many_vrl` to verify VRL processor's ability to expand a single input row into multiple output rows.
- Updated `http_tests!` macro to include the new test.
- Implemented test scenarios for single and multiple input rows, ensuring correct data transformation and row count validation.

Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>

* feat/allow-one-to-many-pipeline:
### Add Tests for VRL Pipeline Transformations

- **File:** `src/pipeline/src/etl.rs`
- Added tests for one-to-many VRL pipeline expansion to ensure multiple output rows from a single input.
- Introduced tests to verify backward compatibility for single object output.
- Implemented tests to confirm zero rows are produced from empty arrays.
- Added validation tests to ensure array elements must be objects.
- Developed tests for one-to-many transformations with table suffix hints from VRL.

Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>

* feat/allow-one-to-many-pipeline:
### Enhance Pipeline Transformation with Per-Row Table Suffixes

- **`src/pipeline/src/etl.rs`**: Updated `TransformedOutput` to include per-row table suffixes, allowing for more flexible routing of transformed data. Modified `PipelineExecOutput` and related methods to
handle the new structure.
- **`src/pipeline/src/etl/transform/transformer/greptime.rs`**: Enhanced `values_to_rows` to support per-row table suffix extraction and application.
- **`src/pipeline/tests/common.rs`** and **`src/pipeline/tests/pipeline.rs`**: Adjusted tests to validate the new per-row table suffix functionality, ensuring backward compatibility and correct behavior in
one-to-many transformations.
- **`src/servers/src/pipeline.rs`**: Modified `run_custom_pipeline` to process transformed outputs with per-row table suffixes, grouping rows by `(opt, table_name)` for insertion.

Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>

* feat/allow-one-to-many-pipeline:
### Update VRL Processor Type Checks

- **File:** `vrl_processor.rs`
- **Changes:** Updated type checking logic to use `contains_object()` and `contains_array()` methods instead of `is_object()` and `is_array()`. This change ensures
compatibility with VRL type inference that may return multiple possible types.

Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>

* feat/allow-one-to-many-pipeline:
- **Enhance Error Handling**: Added new error types `ArrayElementMustBeObjectSnafu` and `TransformArrayElementSnafu` to improve error handling in `etl.rs` and `greptime.rs`.
- **Refactor Error Usage**: Moved error usage declarations in `transform_array_elements` and `values_to_rows` functions to the top of the file for better organization in `etl.rs` and `greptime.rs`.

Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>

* feat/allow-one-to-many-pipeline:
### Update `greptime.rs` to Enhance Error Handling

- **Error Handling**: Modified the `values_to_rows` function to handle invalid array elements based on the `skip_error` parameter. If `skip_error` is true, invalid elements are skipped; otherwise, an error is returned.
- **Testing**: Added unit tests in `greptime.rs` to verify the behavior of `values_to_rows` with different `skip_error` settings, ensuring correct processing of valid objects and appropriate error handling for invalid elements.

Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>

* feat/allow-one-to-many-pipeline:
### Commit Summary

- **Enhance `TransformedOutput` Structure**: Refactored `TransformedOutput` to use a `HashMap` for grouping rows by `ContextOpt`, allowing for per-row configuration options. Updated methods in `PipelineExecOutput` to support the new structure (`src/pipeline/src/etl.rs`).

- **Add New Transformation Method**: Introduced `transform_array_elements_to_hashmap` to handle array inputs with per-row `ContextOpt` in `HashMap` format (`src/pipeline/src/etl.rs`).

- **Update Pipeline Execution**: Modified `run_custom_pipeline` to process `TransformedOutput` using the new `HashMap` structure, ensuring rows are grouped by `ContextOpt` and table name (`src/servers/src/pipeline.rs`).

- **Add Tests for New Structure**: Implemented tests to verify the functionality of the new `HashMap` structure in `TransformedOutput`, including scenarios for one-to-many mapping, single object input, and empty arrays (`src/pipeline/src/etl.rs`).

Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>

* feat/allow-one-to-many-pipeline:
### Refactor `values_to_rows` to Return `HashMap` Grouped by `ContextOpt`

- **`etl.rs`**:
- Updated `values_to_rows` to return a `HashMap` grouped by `ContextOpt` instead of a vector.
- Adjusted logic to handle single object and array inputs, ensuring rows are grouped by their `ContextOpt`.
- Modified functions to extract rows from default `ContextOpt` and apply table suffixes accordingly.

- **`greptime.rs`**:
- Enhanced `values_to_rows` to handle errors gracefully with `skip_error` logic.
- Added logic to group rows by `ContextOpt` for array inputs.

- **Tests**:
- Updated existing tests to validate the new `HashMap` return structure.
- Added a new test to verify correct grouping of rows by per-element `ContextOpt`.

Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>

* feat/allow-one-to-many-pipeline:
### Refactor and Enhance Error Handling in ETL Pipeline

- **Refactored Functionality**:
- Replaced `transform_array_elements` with `transform_array_elements_by_ctx` in `etl.rs` to streamline transformation logic and improve error handling.
- Updated `values_to_rows` in `greptime.rs` to use `or_default` for cleaner code.

- **Enhanced Error Handling**:
- Introduced `unwrap_or_continue_if_err` macro in `etl.rs` to allow skipping errors based on pipeline context, improving robustness in data processing.

These changes enhance the maintainability and error resilience of the ETL pipeline.

Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>

* feat/allow-one-to-many-pipeline:
### Update `Row` Handling in ETL Pipeline

- **Refactor `Row` Type**: Introduced `RowWithTableSuffix` type alias to simplify handling of rows with optional table suffixes across the ETL pipeline.
- **Modify Function Signatures**: Updated function signatures in `etl.rs` and `greptime.rs` to use `RowWithTableSuffix` for better clarity and consistency.
- **Enhance Test Coverage**: Adjusted test logic in `greptime.rs` to align with the new `RowWithTableSuffix` type, ensuring correct grouping and processing of rows by TTL.

Files affected: `etl.rs`, `greptime.rs`.

Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>

---------

Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>

2025-12-10 06:38:44 +00:00

feat: put sqlness into a separated dir (#6911 )

2025-09-05 01:39:29 +00:00

build: on windows (#2054 )

2023-08-10 08:08:37 +00:00

feat: allow publishing new nightly release when some platforms are absent (#7354 )

2025-12-09 04:59:50 +00:00

fix: configure HTTP/2 keep-alive for heartbeat client to detect network failures faster (#7344 )

2025-12-04 08:07:45 +00:00

fix: doc issue assignee (#6406 )

2025-06-26 09:18:47 +00:00

feat: add building option to build images base on distroless image (#7240 )

2025-11-26 05:13:05 +00:00

feat: gc worker only local regions&test (#7203 )

2025-11-18 02:45:09 +00:00

fix: use instance lables to fetch greptime_memory_limit_in_bytes and greptime_cpu_limit_in_millicores metrics (#7043 )

2025-09-29 11:43:35 +00:00

feat: add TLS support for mysql backend (#6979 )

2025-09-16 13:46:37 +00:00

feat: allow one to many VRL pipeline (#7342 )

2025-12-10 06:38:44 +00:00

refactor: use versioned index for index file (#7309 )

2025-12-09 07:31:12 +00:00

feat: allow fuzz input override through env var (#7208 )

2025-11-10 14:02:23 +00:00

tests-integration

feat: allow one to many VRL pipeline (#7342 )

2025-12-10 06:38:44 +00:00

.dockerignore

fix: docker build (#1822 )

2023-06-25 11:05:46 +08:00

.editorconfig

feat: to_timezone function (#3470 )

2024-03-12 01:46:19 +00:00

.env.example

feat: add GcsConfig credential field (#4568 )

2024-08-16 03:11:20 +00:00

.gitignore

feat: resolve unused dependencies with cargo-udeps (#6578 ) (#6619 )

2025-08-26 10:22:53 +00:00

.pre-commit-config.yaml

feat: Loki remote write (#4941 )

2024-11-18 08:39:17 +00:00

AUTHOR.md

chore: members and committers update (#7341 )

2025-12-04 04:08:43 +00:00

Cargo.lock

refactor: extract file watcher to common-config (#7357 )

2025-12-09 11:23:26 +00:00

Cargo.toml

feat: update pg-catalog for describe table (#7321 )

2025-12-02 01:38:36 +00:00

cliff.toml

chore: improve contributor click in git-cliff (#3672 )

2024-04-08 18:15:00 +00:00

codecov.yml

refactor: refactor TableRouteManager (#3392 )

2024-02-28 06:18:09 +00:00

CONTRIBUTING.md

feat: resolve unused dependencies with cargo-udeps (#6578 ) (#6619 )

2025-08-26 10:22:53 +00:00

Cross.toml

fix: cross compiling for aarch64 targets and allow customizing page size (#5487 )

2025-02-07 11:21:16 +00:00

flake.lock

chore: update rust to nightly 2025-10-01 (#7069 )

2025-10-11 07:30:52 +00:00

flake.nix

chore: update rust to nightly 2025-10-01 (#7069 )

2025-10-11 07:30:52 +00:00

LICENSE

chore: multiple licenses fixes (#2714 )

2023-11-09 10:38:12 +00:00

licenserc.toml

feat: trigger alter parse (#6553 )

2025-07-29 11:07:31 +00:00

Makefile

ci: dev-build with large page size (#7228 )

2025-11-17 02:38:16 +00:00

README.md

docs: update project status and tweak readme (#7216 )

2025-11-12 15:06:56 +00:00

rust-toolchain.toml

chore: update rust to nightly 2025-10-01 (#7069 )

2025-10-11 07:30:52 +00:00

rustfmt.toml

chore: specify import style in rustfmt (#460 )

2022-11-15 15:58:54 +08:00

SECURITY.md

feat: Create SECURITY.md (#1270 )

2023-03-28 19:14:29 +08:00

taplo.toml

chore: skip reorder workspace tables in taplo (#3388 )

2024-02-26 08:57:49 +00:00

typos.toml

feat: node excluder (#5964 )

2025-04-23 10:48:46 +00:00

README.md

Real-Time & Cloud-Native Observability Database
for metrics, logs, and traces

Delivers sub-second querying at PB scale and exceptional cost efficiency from edge to cloud.

User Guide | API Docs | Roadmap 2025

Version

Releases

Docker Pulls

License

Introduction
⭐ Key Features
Quick Comparison
Architecture
Try GreptimeDB
Getting Started
Build From Source
Tools & Extensions
Project Status
Community
License
Commercial Support
Contributing
Acknowledgement

Introduction

GreptimeDB is an open-source, cloud-native database that unifies metrics, logs, and traces, enabling real-time observability at any scale — across edge, cloud, and hybrid environments.

Features

Feature	Description
All-in-One Observability	OpenTelemetry-native platform unifying metrics, logs, and traces. Query via SQL, PromQL, and Flow.
High Performance	Written in Rust with rich indexing (inverted, fulltext, skipping, vector), delivering sub-second responses at PB scale.
Cost Efficiency	50x lower operational and storage costs with compute-storage separation and native object storage (S3, Azure Blob, etc.).
Cloud-Native & Scalable	Purpose-built for Kubernetes with unlimited cross-cloud scaling, handling hundreds of thousands of concurrent requests.
Developer-Friendly	SQL/PromQL interfaces, built-in web dashboard, REST API, MySQL/PostgreSQL protocol compatibility, and native OpenTelemetry support.
Flexible Deployment	Deploy anywhere from ARM-based edge devices (including Android) to cloud, with unified APIs and efficient data sync.

✅ Perfect for:

Unified observability stack replacing Prometheus + Loki + Tempo
Large-scale metrics with high cardinality (millions to billions of time series)
Large-scale observability platform requiring cost efficiency and scalability
IoT and edge computing with resource and bandwidth constraints

Learn more in Why GreptimeDB and Observability 2.0 and the Database for It.

Quick Comparison

Feature	GreptimeDB	Traditional TSDB	Log Stores
Data Types	Metrics, Logs, Traces	Metrics only	Logs only
Query Language	SQL, PromQL	Custom/PromQL	Custom/DSL
Deployment	Edge + Cloud	Cloud/On-prem	Mostly central
Indexing & Performance	PB-Scale, Sub-second	Varies	Varies
Integration	REST API, SQL, Common protocols	Varies	Varies

Performance:

Read more benchmark reports.

Architecture

GreptimeDB can run in two modes:

Standalone Mode - Single binary for development and small deployments
Distributed Mode - Separate components for production scale:
- Frontend: Query processing and protocol handling
- Datanode: Data storage and retrieval
- Metasrv: Metadata management and coordination

Read the architecture document. DeepWiki provides an in-depth look at GreptimeDB:

Try GreptimeDB

docker pull greptime/greptimedb

docker run -p 127.0.0.1:4000-4003:4000-4003 \
  -v "$(pwd)/greptimedb_data:/greptimedb_data" \
  --name greptime --rm \
  greptime/greptimedb:latest standalone start \
  --http-addr 0.0.0.0:4000 \
  --rpc-bind-addr 0.0.0.0:4001 \
  --mysql-addr 0.0.0.0:4002 \
  --postgres-addr 0.0.0.0:4003

Dashboard: http://localhost:4000/dashboard

Read more in the full Install Guide.

Troubleshooting:

Cannot connect to the database? Ensure that ports 4000, 4001, 4002, and 4003 are not blocked by a firewall or used by other services.
Failed to start? Check the container logs with docker logs greptime for further details.

Getting Started

Build From Source

Prerequisites:

Rust toolchain (nightly)
Protobuf compiler (>= 3.15)
C/C++ building essentials, including gcc/g++/autoconf and glibc library (eg. libc6-dev on Ubuntu and glibc-devel on Fedora)
Python toolchain (optional): Required only if using some test scripts.

Build and Run:

make
cargo run -- standalone start

Tools & Extensions

Kubernetes: GreptimeDB Operator
Helm Charts: Greptime Helm Charts
Dashboard: Web UI
gRPC Ingester: Go, Java, C++, Erlang, Rust
Grafana Data Source: GreptimeDB Grafana data source plugin
Grafana Dashboard: Official Dashboard for monitoring

Project Status

Status: Beta — marching toward v1.0 GA! GA (v1.0): January 10, 2026

Deployed in production by open-source projects and commercial users
Stable, actively maintained, with regular releases (version info)
Suitable for evaluation and pilot deployments

GreptimeDB v1.0 represents a major milestone toward maturity — marking stable APIs, production readiness, and proven performance.

Roadmap: Beta1 (Nov 10) → Beta2 (Nov 24) → RC1 (Dec 8) → GA (Jan 10, 2026), please read v1.0 highlights and release plan for details.

For production use, we recommend using the latest stable release.

If you find this project useful, a ⭐ would mean a lot to us!

Community

We invite you to engage and contribute!

License

GreptimeDB is licensed under the Apache License 2.0.

Commercial Support

Running GreptimeDB in your organization? We offer enterprise add-ons, services, training, and consulting. Contact us for details.

Contributing

Read our Contribution Guidelines.
Explore Internal Concepts and DeepWiki.
Pick up a good first issue and join the #contributors Slack channel.

Acknowledgement

Special thanks to all contributors! See AUTHORS.md.

Uses Apache Arrow™ (memory model)
Apache Parquet™ (file storage)
Apache DataFusion™ (query engine)
Apache OpenDAL™ (data access abstraction)

Description

Open-source, cloud-native, unified observability database for metrics, logs and traces, supporting SQL/PromQL/Streaming.

analytics cloud-native database distributed greptimedb logs metrics monitoring observability observability-database observability-datalake promql rust rust-database sql time-series traces tsdb

Readme Apache-2.0 819 MiB