greptimedb

mirror of https://github.com/GreptimeTeam/greptimedb.git synced 2026-07-05 05:20:38 +00:00

Go to file

Lei, HUANG 6700c0762d feat: Column-wise partition rule implementation (#5804 )

* wip: naive impl

* feat/column-partition:
 ### Add support for DataFusion physical expressions

 - **`Cargo.lock` & `Cargo.toml`**: Added `datafusion-physical-expr` as a dependency to support physical expression creation.
 - **`expr.rs`**: Implemented conversion methods `try_as_logical_expr` and `try_as_physical_expr` for `Operand` and `PartitionExpr` to facilitate logical and physical expression handling.
 - **`multi_dim.rs`**: Enhanced `MultiDimPartitionRule` to utilize physical expressions for partitioning logic, including new methods for evaluating record batches.
 - **Tests**: Added unit tests for logical and physical expression conversions and partitioning logic in `expr.rs` and `multi_dim.rs`.

* feat/column-partition:
 ### Refactor and Enhance Partition Handling

 - **Refactor Partition Parsing Logic**: Moved partition parsing logic from `src/operator/src/statement/ddl.rs` to a new utility module `src/partition/src/utils.rs`. This includes functions like `parse_partitions`, `find_partition_bounds`, and `convert_one_expr`.
 - **Error Handling Improvements**: Added new error variants `ColumnNotFound`, `InvalidPartitionRule`, and `ParseSqlValue` in `src/partition/src/error.rs` to improve error reporting for partition-related operations.
 - **Dependency Updates**: Updated `Cargo.lock` and `Cargo.toml` to include new dependencies `common-time` and `session`.
 - **Code Cleanup**: Removed redundant partition parsing functions from `src/operator/src/error.rs` and `src/operator/src/statement/ddl.rs`.

* feat/column-partition:
 ## Refactor and Enhance SQL and Table Handling

 - **Refactor Column Definitions and Error Handling**
   - Made `FULLTEXT_GRPC_KEY`, `INVERTED_INDEX_GRPC_KEY`, and `SKIPPING_INDEX_GRPC_KEY` public in `column_def.rs`.
   - Removed `IllegalPrimaryKeysDef` error from `error.rs` and moved it to `sql/src/error.rs`.
   - Updated error handling in `fill_impure_default.rs` and `expr_helper.rs`.

 - **Enhance SQL Utility Functions**
   - Moved and refactored functions like `create_to_expr`, `find_primary_keys`, and `validate_create_expr` to `sql/src/util.rs`.
   - Added new utility functions for SQL parsing and validation in `sql/src/util.rs`.

 - **Improve Partition Handling**
   - Added `parse_partition_columns_and_exprs` function in `partition/src/utils.rs`.
   - Updated partition rule tests in `partition/src/multi_dim.rs` to use SQL-based partitioning.

 - **Simplify Table Name Handling**
   - Re-exported `table_idents_to_full_name` from `sql::util` in `session/src/table_name.rs`.

 - **Test Enhancements**
   - Updated tests in `partition/src/multi_dim.rs` to use SQL for partition rule creation.

* feat/column-partition:
 **Add Benchmarking and Enhance Partitioning Logic**

 - **Benchmarking**: Introduced a new benchmark for `split_record_batch` in `bench_split_record_batch.rs` using `criterion` and `rand` as development dependencies in `Cargo.toml`.
 - **Partitioning Logic**: Enhanced `MultiDimPartitionRule` in `multi_dim.rs` to include a default region for unmatched partition expressions and optimized the `split_record_batch` method.
 - **Refactoring**: Moved `sql_to_partition_rule` function to a public scope for reuse in `multi_dim.rs`.
 - **Testing**: Added new test module `test_split_record_batch` to validate the partitioning logic.

* Revert "feat/column-partition:  ### Refactor and Enhance Partition Handling"

This reverts commit 183fa19f

* fix: revert refctoring parse_partition

* revert some refactor

* feat/column-partition:
 ### Enhance Partitioning and Error Handling

 - **Benchmark Enhancements**: Added new benchmark `bench_split_record_batch_vs_row` in `bench_split_record_batch.rs` to compare row and column-based splitting.
 - **Error Handling Improvements**: Introduced new error variants in `error.rs` for better error reporting related to record batch evaluation and arrow kernel computation.
 - **Expression Handling**: Updated `expr.rs` to improve error context when converting schemas and creating physical expressions.
 - **Partition Rule Enhancements**: Made `row_at` and `record_batch_to_cols` methods public in `multi_dim.rs` and improved error handling for physical expression evaluation and boolean operations.

* feat/column-partition:
 ### Add `eq` Method and Optimize Expression Caching

 - **`expr.rs`**: Added a new `eq` method to the `Operand` struct for equality comparisons.
 - **`multi_dim.rs`**: Introduced a caching mechanism for physical expressions using `RwLock` to improve performance in `MultiDimPartitionRule`.
 - **`lib.rs`**: Enabled the `let_chains` feature for more concise code.
 - **`multi_dim.rs` Tests**: Enhanced test coverage with new test cases for multi-dimensional partitioning, including random record batch generation and default region handling.

* feat/column-partition:
 ### Add `split_record_batch` Method to `PartitionRule` Trait

 - **Files Modified**:
   - `src/partition/src/multi_dim.rs`
   - `src/partition/src/partition.rs`
   - `src/partition/src/splitter.rs`

 Added a new method `split_record_batch` to the `PartitionRule` trait, allowing record batches to be split into multiple regions based on partition values. Implemented this method in `MultiDimPartitionRule` and provided unimplemented stubs in test modules.

 ### Dependency Update

 - **File Modified**:
   - `src/operator/src/expr_helper.rs`

 Removed unused import `ColumnDataType` and `Timezone` from the test module.

 ### Miscellaneous

 - **File Modified**:
   - `src/partition/Cargo.toml`

 No functional changes; only minor formatting adjustments.

* chore: add license header

* chore: remove useless fules

* feat/column-partition:
 Add support for handling unsupported partition expression values

 - **`error.rs`**: Introduced a new error variant `UnsupportedPartitionExprValue` to handle unsupported partition expression values, and updated `ErrorExt` to map this error to `StatusCode::InvalidArguments`.
 - **`expr.rs`**: Modified the `Operand` implementation to return the new error when encountering unsupported partition expression values.
 - **`multi_dim.rs`**: Added a fast path to optimize the selection process when all rows are selected.

* feat/column-partition: Add validation for expression and region length in MultiDimPartitionRule constructor

 • Ensure the lengths of exprs and regions match to prevent mismatches.
 • Introduce error handling for length discrepancies with a descriptive error message.

* chore: add debug log

* feat/column-partition: Removed the validation check for matching lengths between exprs and regions in MultiDimPartitionRule constructor, simplifying the initialization process.

* fix: unit tests

2025-04-15 10:42:07 +00:00

.cargo

fix: speed up cargo build using sallow clone (#5620 )

2025-03-03 08:02:12 +00:00

.config

build: on windows (#2054 )

2023-08-10 08:08:37 +00:00

.github

ci: not push latest image when schedule release (#5883 )

2025-04-14 01:22:40 +00:00

config

feat: add query engine options (#5895 )

2025-04-14 13:12:37 +00:00

cyborg

ci: automatically bump doc version when release GreptimeDB (#5343 )

2025-01-14 08:20:58 +00:00

docker

feat: move default data path from /tmp to current directory (#5719 )

2025-03-16 09:57:46 +00:00

docs

chore: upgrade some dependencies (#5777 )

2025-03-27 02:48:44 +00:00

grafana

chore: add datanode write rows to grafana dashboard (#5745 )

2025-03-20 03:39:40 +00:00

scripts

refactor: check and fix super import (#5846 )

2025-04-08 11:48:52 +00:00

src

feat: Column-wise partition rule implementation (#5804 )

2025-04-15 10:42:07 +00:00

tests

feat: optimize matches_term with constant term pre-compilation (#5886 )

2025-04-15 06:45:56 +00:00

tests-fuzz

chore: update datafusion family (#5814 )

2025-04-09 02:20:55 +00:00

tests-integration

feat(flow): dual engine (#5881 )

2025-04-15 07:03:12 +00:00

.coderabbit.yaml

ci: disable auto review (#4387 )

2024-07-18 08:03:37 +00:00

.dockerignore

fix: docker build (#1822 )

2023-06-25 11:05:46 +08:00

.editorconfig

feat: to_timezone function (#3470 )

2024-03-12 01:46:19 +00:00

.env.example

feat: add GcsConfig credential field (#4568 )

2024-08-16 03:11:20 +00:00

.gitignore

feat: move default data path from /tmp to current directory (#5719 )

2025-03-16 09:57:46 +00:00

.pre-commit-config.yaml

feat: Loki remote write (#4941 )

2024-11-18 08:39:17 +00:00

AUTHOR.md

fix: broken link in AUTHOR.md (#5581 )

2025-02-21 07:01:41 +00:00

Cargo.lock

feat: Column-wise partition rule implementation (#5804 )

2025-04-15 10:42:07 +00:00

Cargo.toml

feat(flow): dual engine (#5881 )

2025-04-15 07:03:12 +00:00

cliff.toml

chore: improve contributor click in git-cliff (#3672 )

2024-04-08 18:15:00 +00:00

codecov.yml

refactor: refactor TableRouteManager (#3392 )

2024-02-28 06:18:09 +00:00

CONTRIBUTING.md

docs(contributing): replace expired links (#4468 )

2024-07-31 06:11:30 +00:00

Cross.toml

fix: cross compiling for aarch64 targets and allow customizing page size (#5487 )

2025-02-07 11:21:16 +00:00

flake.lock

chore: update toolchain to 2024-12-25 (#5430 )

2025-01-24 09:30:54 +00:00

flake.nix

ci: move components to flakes so it won't affect builders (#5464 )

2025-01-31 08:55:59 +00:00

LICENSE

chore: multiple licenses fixes (#2714 )

2023-11-09 10:38:12 +00:00

licenserc.toml

ci: replace pull-request actions with cyborg (#3854 )

2024-05-04 03:12:26 +00:00

Makefile

refactor: check and fix super import (#5846 )

2025-04-08 11:48:52 +00:00

README.md

docs: update readme (#5891 )

2025-04-14 06:45:24 +00:00

rust-toolchain.toml

ci: move components to flakes so it won't affect builders (#5464 )

2025-01-31 08:55:59 +00:00

rustfmt.toml

chore: specify import style in rustfmt (#460 )

2022-11-15 15:58:54 +08:00

SECURITY.md

feat: Create SECURITY.md (#1270 )

2023-03-28 19:14:29 +08:00

taplo.toml

chore: skip reorder workspace tables in taplo (#3388 )

2024-02-26 08:57:49 +00:00

typos.toml

fix: update typos rules to fix ci (#5621 )

2025-03-01 09:21:36 +00:00

README.md

Unified & Cost-Effective Observability Database for Metrics, Logs, and Events

GreptimeCloud | User Guide | API Docs | Roadmap 2025

Introduction
Features: Why GreptimeDB
Architecture
Try it for free
Getting Started
Project Status
Join the community
- Contributing
Tools & Extensions
License
Acknowledgement

Introduction

GreptimeDB is an open-source, cloud-native, unified & cost-effective observability database for Metrics, Logs, and Traces. You can gain real-time insights from Edge to Cloud at Any Scale.

News

GreptimeDB tops JSONBench's billion-record cold run test!

Why GreptimeDB

Our core developers have been building observability data platforms for years. Based on our best practices, GreptimeDB was born to give you:

Unified Processing of Observability Data

A unified database that treats metrics, logs, and traces as timestamped wide events with context, supporting SQL/PromQL queries and stream processing to simplify complex data stacks.
High Performance and Cost-effective

Written in Rust, combines a distributed query engine with rich indexing (inverted, fulltext, skip data, and vector) and optimized columnar storage to deliver sub-second responses on petabyte-scale data and high-cost efficiency.
Cloud-native Distributed Database

Built for Kubernetes. GreptimeDB achieves seamless scalability with its cloud-native architecture of separated compute and storage, built on object storage (AWS S3, Azure Blob Storage, etc.) while enabling cross-cloud deployment through a unified data access layer.
Developer-Friendly

Access standardized SQL/PromQL interfaces through built-in web dashboard, REST API, and MySQL/PostgreSQL protocols. Supports widely adopted data ingestion protocols for seamless migration and integration.
Flexible Deployment Options

Deploy GreptimeDB anywhere from ARM-based edge devices to cloud environments with unified APIs and bandwidth-efficient data synchronization. Query edge and cloud data seamlessly through identical APIs. Learn how to run on Android.

For more detailed info please read Why GreptimeDB.

Try GreptimeDB

1. Live Demo

Try out the features of GreptimeDB right from your browser.

2. GreptimeCloud

Start instantly with a free cluster.

3. Docker Image

To install GreptimeDB locally, the recommended way is via Docker:

docker pull greptime/greptimedb

Start a GreptimeDB container with:

docker run -p 127.0.0.1:4000-4003:4000-4003 \
  -v "$(pwd)/greptimedb:./greptimedb_data" \
  --name greptime --rm \
  greptime/greptimedb:latest standalone start \
  --http-addr 0.0.0.0:4000 \
  --rpc-bind-addr 0.0.0.0:4001 \
  --mysql-addr 0.0.0.0:4002 \
  --postgres-addr 0.0.0.0:4003

Access the dashboard via http://localhost:4000/dashboard.

Getting Started

Build

Check the prerequisite:

Rust toolchain (nightly)
Protobuf compiler (>= 3.15)
C/C++ building essentials, including gcc/g++/autoconf and glibc library (eg. libc6-dev on Ubuntu and glibc-devel on Fedora)
Python toolchain (optional): Required only if using some test scripts.

Build GreptimeDB binary:

make

Run a standalone server:

cargo run -- standalone start

Tools & Extensions

Kubernetes

GreptimeDB Operator

Dashboard

The dashboard UI for GreptimeDB

SDK

Grafana Dashboard

Our official Grafana dashboard for monitoring GreptimeDB is available at grafana directory.

Project Status

GreptimeDB is currently in Beta. We are targeting GA (General Availability) with v1.0 release by Early 2025.

While in Beta, GreptimeDB is already:

Being used in production by early adopters
Actively maintained with regular releases, about version number
Suitable for testing and evaluation

For production use, we recommend using the latest stable release.

Community

Our core team is thrilled to see you participate in any ways you like. When you are stuck, try to ask for help by filling an issue with a detailed description of what you were trying to do and what went wrong. If you have any questions or if you would like to get involved in our community, please check out:

GreptimeDB Community on Slack
GreptimeDB GitHub Discussions forum
Greptime official website

In addition, you may:

View our official Blog
Connect us with Linkedin
Follow us on Twitter

Commercial Support

If you are running GreptimeDB OSS in your organization, we offer additional enterprise add-ons, installation services, training, and consulting. Contact us and we will reach out to you with more detail of our commercial license.

License

GreptimeDB uses the Apache License 2.0 to strike a balance between open contributions and allowing you to use the software however you want.

Contributing

Please refer to contribution guidelines and internal concepts docs for more information.

Acknowledgement

Special thanks to all the contributors who have propelled GreptimeDB forward. For a complete list of contributors, please refer to AUTHOR.md.

GreptimeDB uses Apache Arrow™ as the memory model and Apache Parquet™ as the persistent file format.
GreptimeDB's query engine is powered by Apache Arrow DataFusion™.
Apache OpenDAL™ gives GreptimeDB a very general and elegant data access abstraction layer.
GreptimeDB's meta service is based on etcd.

Description

Open-source, cloud-native, unified observability database for metrics, logs and traces, supporting SQL/PromQL/Streaming.

analytics cloud-native database distributed greptimedb logs metrics monitoring observability observability-database observability-datalake promql rust rust-database sql time-series traces tsdb

Readme Apache-2.0 772 MiB