Lei, HUANG 6700c0762d feat: Column-wise partition rule implementation (#5804)
* wip: naive impl

* feat/column-partition:
 ### Add support for DataFusion physical expressions

 - **`Cargo.lock` & `Cargo.toml`**: Added `datafusion-physical-expr` as a dependency to support physical expression creation.
 - **`expr.rs`**: Implemented conversion methods `try_as_logical_expr` and `try_as_physical_expr` for `Operand` and `PartitionExpr` to facilitate logical and physical expression handling.
 - **`multi_dim.rs`**: Enhanced `MultiDimPartitionRule` to utilize physical expressions for partitioning logic, including new methods for evaluating record batches.
 - **Tests**: Added unit tests for logical and physical expression conversions and partitioning logic in `expr.rs` and `multi_dim.rs`.

* feat/column-partition:
 ### Refactor and Enhance Partition Handling

 - **Refactor Partition Parsing Logic**: Moved partition parsing logic from `src/operator/src/statement/ddl.rs` to a new utility module `src/partition/src/utils.rs`. This includes functions like `parse_partitions`, `find_partition_bounds`, and `convert_one_expr`.
 - **Error Handling Improvements**: Added new error variants `ColumnNotFound`, `InvalidPartitionRule`, and `ParseSqlValue` in `src/partition/src/error.rs` to improve error reporting for partition-related operations.
 - **Dependency Updates**: Updated `Cargo.lock` and `Cargo.toml` to include new dependencies `common-time` and `session`.
 - **Code Cleanup**: Removed redundant partition parsing functions from `src/operator/src/error.rs` and `src/operator/src/statement/ddl.rs`.

* feat/column-partition:
 ## Refactor and Enhance SQL and Table Handling

 - **Refactor Column Definitions and Error Handling**
   - Made `FULLTEXT_GRPC_KEY`, `INVERTED_INDEX_GRPC_KEY`, and `SKIPPING_INDEX_GRPC_KEY` public in `column_def.rs`.
   - Removed `IllegalPrimaryKeysDef` error from `error.rs` and moved it to `sql/src/error.rs`.
   - Updated error handling in `fill_impure_default.rs` and `expr_helper.rs`.

 - **Enhance SQL Utility Functions**
   - Moved and refactored functions like `create_to_expr`, `find_primary_keys`, and `validate_create_expr` to `sql/src/util.rs`.
   - Added new utility functions for SQL parsing and validation in `sql/src/util.rs`.

 - **Improve Partition Handling**
   - Added `parse_partition_columns_and_exprs` function in `partition/src/utils.rs`.
   - Updated partition rule tests in `partition/src/multi_dim.rs` to use SQL-based partitioning.

 - **Simplify Table Name Handling**
   - Re-exported `table_idents_to_full_name` from `sql::util` in `session/src/table_name.rs`.

 - **Test Enhancements**
   - Updated tests in `partition/src/multi_dim.rs` to use SQL for partition rule creation.

* feat/column-partition:
 **Add Benchmarking and Enhance Partitioning Logic**

 - **Benchmarking**: Introduced a new benchmark for `split_record_batch` in `bench_split_record_batch.rs` using `criterion` and `rand` as development dependencies in `Cargo.toml`.
 - **Partitioning Logic**: Enhanced `MultiDimPartitionRule` in `multi_dim.rs` to include a default region for unmatched partition expressions and optimized the `split_record_batch` method.
 - **Refactoring**: Moved `sql_to_partition_rule` function to a public scope for reuse in `multi_dim.rs`.
 - **Testing**: Added new test module `test_split_record_batch` to validate the partitioning logic.

* Revert "feat/column-partition:  ### Refactor and Enhance Partition Handling"

This reverts commit 183fa19f

* fix: revert refctoring parse_partition

* revert some refactor

* feat/column-partition:
 ### Enhance Partitioning and Error Handling

 - **Benchmark Enhancements**: Added new benchmark `bench_split_record_batch_vs_row` in `bench_split_record_batch.rs` to compare row and column-based splitting.
 - **Error Handling Improvements**: Introduced new error variants in `error.rs` for better error reporting related to record batch evaluation and arrow kernel computation.
 - **Expression Handling**: Updated `expr.rs` to improve error context when converting schemas and creating physical expressions.
 - **Partition Rule Enhancements**: Made `row_at` and `record_batch_to_cols` methods public in `multi_dim.rs` and improved error handling for physical expression evaluation and boolean operations.

* feat/column-partition:
 ### Add `eq` Method and Optimize Expression Caching

 - **`expr.rs`**: Added a new `eq` method to the `Operand` struct for equality comparisons.
 - **`multi_dim.rs`**: Introduced a caching mechanism for physical expressions using `RwLock` to improve performance in `MultiDimPartitionRule`.
 - **`lib.rs`**: Enabled the `let_chains` feature for more concise code.
 - **`multi_dim.rs` Tests**: Enhanced test coverage with new test cases for multi-dimensional partitioning, including random record batch generation and default region handling.

* feat/column-partition:
 ### Add `split_record_batch` Method to `PartitionRule` Trait

 - **Files Modified**:
   - `src/partition/src/multi_dim.rs`
   - `src/partition/src/partition.rs`
   - `src/partition/src/splitter.rs`

 Added a new method `split_record_batch` to the `PartitionRule` trait, allowing record batches to be split into multiple regions based on partition values. Implemented this method in `MultiDimPartitionRule` and provided unimplemented stubs in test modules.

 ### Dependency Update

 - **File Modified**:
   - `src/operator/src/expr_helper.rs`

 Removed unused import `ColumnDataType` and `Timezone` from the test module.

 ### Miscellaneous

 - **File Modified**:
   - `src/partition/Cargo.toml`

 No functional changes; only minor formatting adjustments.

* chore: add license header

* chore: remove useless fules

* feat/column-partition:
 Add support for handling unsupported partition expression values

 - **`error.rs`**: Introduced a new error variant `UnsupportedPartitionExprValue` to handle unsupported partition expression values, and updated `ErrorExt` to map this error to `StatusCode::InvalidArguments`.
 - **`expr.rs`**: Modified the `Operand` implementation to return the new error when encountering unsupported partition expression values.
 - **`multi_dim.rs`**: Added a fast path to optimize the selection process when all rows are selected.

* feat/column-partition: Add validation for expression and region length in MultiDimPartitionRule constructor

 • Ensure the lengths of exprs and regions match to prevent mismatches.
 • Introduce error handling for length discrepancies with a descriptive error message.

* chore: add debug log

* feat/column-partition: Removed the validation check for matching lengths between exprs and regions in MultiDimPartitionRule constructor, simplifying the initialization process.

* fix: unit tests
2025-04-15 10:42:07 +00:00
2023-08-10 08:08:37 +00:00
2023-06-25 11:05:46 +08:00
2025-04-15 07:03:12 +00:00
2023-11-09 10:38:12 +00:00
2025-04-14 06:45:24 +00:00
2023-03-28 19:14:29 +08:00

GreptimeDB Logo

Unified & Cost-Effective Observability Database for Metrics, Logs, and Events

Introduction

GreptimeDB is an open-source, cloud-native, unified & cost-effective observability database for Metrics, Logs, and Traces. You can gain real-time insights from Edge to Cloud at Any Scale.

News

GreptimeDB tops JSONBench's billion-record cold run test!

Why GreptimeDB

Our core developers have been building observability data platforms for years. Based on our best practices, GreptimeDB was born to give you:

  • Unified Processing of Observability Data

    A unified database that treats metrics, logs, and traces as timestamped wide events with context, supporting SQL/PromQL queries and stream processing to simplify complex data stacks.

  • High Performance and Cost-effective

    Written in Rust, combines a distributed query engine with rich indexing (inverted, fulltext, skip data, and vector) and optimized columnar storage to deliver sub-second responses on petabyte-scale data and high-cost efficiency.

  • Cloud-native Distributed Database

    Built for Kubernetes. GreptimeDB achieves seamless scalability with its cloud-native architecture of separated compute and storage, built on object storage (AWS S3, Azure Blob Storage, etc.) while enabling cross-cloud deployment through a unified data access layer.

  • Developer-Friendly

    Access standardized SQL/PromQL interfaces through built-in web dashboard, REST API, and MySQL/PostgreSQL protocols. Supports widely adopted data ingestion protocols for seamless migration and integration.

  • Flexible Deployment Options

    Deploy GreptimeDB anywhere from ARM-based edge devices to cloud environments with unified APIs and bandwidth-efficient data synchronization. Query edge and cloud data seamlessly through identical APIs. Learn how to run on Android.

For more detailed info please read Why GreptimeDB.

Try GreptimeDB

1. Live Demo

Try out the features of GreptimeDB right from your browser.

2. GreptimeCloud

Start instantly with a free cluster.

3. Docker Image

To install GreptimeDB locally, the recommended way is via Docker:

docker pull greptime/greptimedb

Start a GreptimeDB container with:

docker run -p 127.0.0.1:4000-4003:4000-4003 \
  -v "$(pwd)/greptimedb:./greptimedb_data" \
  --name greptime --rm \
  greptime/greptimedb:latest standalone start \
  --http-addr 0.0.0.0:4000 \
  --rpc-bind-addr 0.0.0.0:4001 \
  --mysql-addr 0.0.0.0:4002 \
  --postgres-addr 0.0.0.0:4003

Access the dashboard via http://localhost:4000/dashboard.

Read more about Installation on docs.

Getting Started

Build

Check the prerequisite:

  • Rust toolchain (nightly)
  • Protobuf compiler (>= 3.15)
  • C/C++ building essentials, including gcc/g++/autoconf and glibc library (eg. libc6-dev on Ubuntu and glibc-devel on Fedora)
  • Python toolchain (optional): Required only if using some test scripts.

Build GreptimeDB binary:

make

Run a standalone server:

cargo run -- standalone start

Tools & Extensions

Kubernetes

Dashboard

SDK

Grafana Dashboard

Our official Grafana dashboard for monitoring GreptimeDB is available at grafana directory.

Project Status

GreptimeDB is currently in Beta. We are targeting GA (General Availability) with v1.0 release by Early 2025.

While in Beta, GreptimeDB is already:

  • Being used in production by early adopters
  • Actively maintained with regular releases, about version number
  • Suitable for testing and evaluation

For production use, we recommend using the latest stable release.

Community

Our core team is thrilled to see you participate in any ways you like. When you are stuck, try to ask for help by filling an issue with a detailed description of what you were trying to do and what went wrong. If you have any questions or if you would like to get involved in our community, please check out:

In addition, you may:

Commercial Support

If you are running GreptimeDB OSS in your organization, we offer additional enterprise add-ons, installation services, training, and consulting. Contact us and we will reach out to you with more detail of our commercial license.

License

GreptimeDB uses the Apache License 2.0 to strike a balance between open contributions and allowing you to use the software however you want.

Contributing

Please refer to contribution guidelines and internal concepts docs for more information.

Acknowledgement

Special thanks to all the contributors who have propelled GreptimeDB forward. For a complete list of contributors, please refer to AUTHOR.md.

Known Users
Description
Languages
Rust 99.6%
Shell 0.1%