greptimedb

mirror of https://github.com/GreptimeTeam/greptimedb.git synced 2026-01-05 21:02:58 +00:00

Go to file

Lei, HUANG 5a9023d6b3 feat(bulk): write to multiple time partitions (#6086 )

* add benchmark for splitting according to time partition

* feat/write-to-multiple-time-partitions:
**Enhancements to Bulk Processing and Time Partitioning**

- **`part.rs`**: Added `Snafu` to imports and introduced `timestamp_index` in `BulkPart` struct. Implemented `timestamps` method for accessing timestamp columns.
- **`simple_bulk_memtable.rs`**: Updated tests to include `timestamp_index` initialization.
- **`time_partition.rs`**: Enhanced `TimePartition` to support partial writes with `write_record_batch_partial`. Implemented `split_record_batch` for filtering records by timestamp range. Added comprehensive tests for `split_record_batch`.
- **`handle_bulk_insert.rs`**: Modified to retrieve timestamp index and column together, updating `BulkPart` initialization with `timestamp_index`.

* feat/write-to-multiple-time-partitions:
### Enhance Time Partitioning Logic

- **`time_partition.rs`**:
- Introduced `HashSet` for efficient partition management.
- Refactored `write_bulk` to handle multiple partitions and added `find_partitions_by_time_range` for identifying existing and missing partitions.
- Updated `get_or_create_time_partition` to manage partition creation.
- Added comprehensive tests for partition finding logic, covering various scenarios including overlapping and non-overlapping time ranges.

- **Tests**:
- Added `test_find_partitions_by_time_range` to validate new partitioning logic.
- Updated `test_split_record_batch` to ensure correct record batch splitting behavior.

* feat/write-to-multiple-time-partitions:
### Enhance Time Partitioning and Testing in `time_partition.rs`

- **Time Partitioning Enhancements**:
- Updated `split_record_batch` to handle multiple timestamp units (`Second`, `Millisecond`, `Microsecond`, `Nanosecond`) by matching on `DataType`.
- Improved filtering logic for timestamp arrays to support various time units.

- **Testing Enhancements**:
- Added `test_write_bulk` to verify writing across multiple partitions and scenarios in `time_partition.rs`.
- Updated `test_split_record_batch` to use `TimestampMillisecondArray` for testing timestamp partitioning.

- **Imports and Dependencies**:
- Added necessary imports for new timestamp array types and testing utilities.

* feat/write-to-multiple-time-partitions:
### Refactor and Enhance Time Partition Filtering

- **Refactor Filtering Logic**: Consolidated the filtering logic for timestamp arrays using macros in `time_partition.rs` and `bench_filter_time_partition.rs`. This reduces code duplication and improves maintainability.
- **Enhance `BulkPart` Struct**: Made fields in `BulkPart` public to facilitate easier access and manipulation in `memtable.rs` and `part.rs`.
- **Rename Function**: Renamed `split_record_batch` to `filter_record_batch` for clarity in `time_partition.rs` and `bench_filter_time_partition.rs`.
- **Add Feature Flag**: Introduced `int_roundings` feature in `lib.rs` to support new functionality.

* refactor tests

* feat/write-to-multiple-time-partitions:
Improve timestamp handling in `time_partition.rs`

- Enhanced safety comments for timestamp conversion to ensure clarity.
- Modified logic to prevent overflow by using `div_euclid` for `bulk_start_sec` and `bulk_end_sec` calculations.
- Adjusted the `filter_map` logic to correctly compute timestamps using `start_sec` and `part_duration_sec`.

* feat/write-to-multiple-time-partitions:
**Refactor timestamp handling and add utility function**

- **Refactor `time_partition.rs`:** Simplified timestamp handling by replacing direct type access with a utility function to retrieve the timestamp unit. Improved error handling for timestamp conversion.
- **Enhance `metadata.rs`:** Added `time_index_type` function to `RegionMetadata` to retrieve the timestamp type of the time index column, ensuring safer and more readable code.

* feat/write-to-multiple-time-partitions:
Refactor time partition variable names in `time_partition.rs`

- Renamed variables for clarity: `bulk_start_sec` to `start_bucket` and `bulk_end_sec` to `end_bucket`.
- Updated related logic to use new variable names for improved readability and maintainability.

* feat/write-to-multiple-time-partitions:
**Refactor variable names in `time_partition.rs`**

- Updated variable names from `matching` and `missing` to `matchings` and `missings` for clarity and consistency.
- Modified function calls and loop iterations to align with the new variable names.
- Affected file: `src/mito2/src/memtable/time_partition.rs`

* feat/write-to-multiple-time-partitions:
### Refactor variable names in `time_partition.rs`

- Updated variable names for clarity in `time_partition.rs`:
- Renamed `matchings` to `matching_parts`
- Renamed `missings` to `missing_parts`
- Adjusted logic to use new variable names in methods `find_partitions_by_time_range` and `write_record_batch`.

* feat/write-to-multiple-time-partitions:
### Enhance Time Partition Handling

- **`time_partition.rs`**:
- Added `ArrayRef` to handle timestamp arrays, improving the partitioning logic by allowing more efficient timestamp range checks.
- Enhanced `find_partitions_by_time_range` to support sparse data and handle different timestamp units (`Second`, `Millisecond`, `Microsecond`, `Nanosecond`).
- Updated test cases to cover new scenarios, including sparse data and edge cases, ensuring robustness of partition handling.

---------

Co-authored-by: Lei <lei@Leis-MacBook-Pro.local>

2025-05-14 05:09:59 +00:00

.cargo

fix: speed up cargo build using sallow clone (#5620 )

2025-03-03 08:02:12 +00:00

.config

build: on windows (#2054 )

2023-08-10 08:08:37 +00:00

.github

ci: update homebrew greptime version when release (#6082 )

2025-05-12 07:13:09 +00:00

config

feat: allow forced region failover for local WAL (#5972 )

2025-04-24 08:11:45 +00:00

cyborg

ci: update website greptimedb version when releasing automatically (#6037 )

2025-05-03 22:07:32 +00:00

docker

feat: move default data path from /tmp to current directory (#5719 )

2025-03-16 09:57:46 +00:00

docs

fix: typos (#6084 )

2025-05-12 12:12:47 +00:00

grafana

chore: add logs dashboard (#6028 )

2025-05-03 08:30:28 +00:00

scripts

refactor: check and fix super import (#5846 )

2025-04-08 11:48:52 +00:00

src

feat(bulk): write to multiple time partitions (#6086 )

2025-05-14 05:09:59 +00:00

tests

fix: promql regex escape behavior (#6094 )

2025-05-13 18:19:17 +00:00

tests-fuzz

fix: alter table modify type should also modify default value (#6049 )

2025-05-09 03:40:59 +00:00

tests-integration

chore: bump rskafka version (#6090 )

2025-05-13 11:57:31 +00:00

.dockerignore

fix: docker build (#1822 )

2023-06-25 11:05:46 +08:00

.editorconfig

feat: to_timezone function (#3470 )

2024-03-12 01:46:19 +00:00

.env.example

feat: add GcsConfig credential field (#4568 )

2024-08-16 03:11:20 +00:00

.gitignore

chore: add logs dashboard (#6028 )

2025-05-03 08:30:28 +00:00

.pre-commit-config.yaml

feat: Loki remote write (#4941 )

2024-11-18 08:39:17 +00:00

AUTHOR.md

fix: broken link in AUTHOR.md (#5581 )

2025-02-21 07:01:41 +00:00

Cargo.lock

fix: promql regex escape behavior (#6094 )

2025-05-13 18:19:17 +00:00

Cargo.toml

fix: promql regex escape behavior (#6094 )

2025-05-13 18:19:17 +00:00

cliff.toml

chore: improve contributor click in git-cliff (#3672 )

2024-04-08 18:15:00 +00:00

codecov.yml

refactor: refactor TableRouteManager (#3392 )

2024-02-28 06:18:09 +00:00

CONTRIBUTING.md

docs(contributing): replace expired links (#4468 )

2024-07-31 06:11:30 +00:00

Cross.toml

fix: cross compiling for aarch64 targets and allow customizing page size (#5487 )

2025-02-07 11:21:16 +00:00

flake.lock

chore: update nix for new toolchain (#5991 )

2025-04-27 11:40:44 +00:00

flake.nix

chore: update nix for new toolchain (#5991 )

2025-04-27 11:40:44 +00:00

LICENSE

chore: multiple licenses fixes (#2714 )

2023-11-09 10:38:12 +00:00

licenserc.toml

ci: replace pull-request actions with cyborg (#3854 )

2024-05-04 03:12:26 +00:00

Makefile

ci: update dev-builder image version to 2025-04-15-1a517ec8-202504280… (#6003 )

2025-04-28 03:34:31 +00:00

README.md

ci: update website greptimedb version when releasing automatically (#6037 )

2025-05-03 22:07:32 +00:00

rust-toolchain.toml

chore: update rust toolchain (#5818 )

2025-04-27 09:02:36 +00:00

rustfmt.toml

chore: specify import style in rustfmt (#460 )

2022-11-15 15:58:54 +08:00

SECURITY.md

feat: Create SECURITY.md (#1270 )

2023-03-28 19:14:29 +08:00

taplo.toml

chore: skip reorder workspace tables in taplo (#3388 )

2024-02-26 08:57:49 +00:00

typos.toml

feat: node excluder (#5964 )

2025-04-23 10:48:46 +00:00

README.md

Real-Time & Cloud-Native Observability Database
for metrics, logs, and traces

Delivers sub-second querying at PB scale and exceptional cost efficiency from edge to cloud.

GreptimeCloud | User Guide | API Docs | Roadmap 2025

Introduction
⭐ Key Features
Quick Comparison
Architecture
Try GreptimeDB
Getting Started
Build From Source
Tools & Extensions
Project Status
Community
License
Commercial Support
Contributing
Acknowledgement

Introduction

GreptimeDB is an open-source, cloud-native database purpose-built for the unified collection and analysis of observability data (metrics, logs, and traces). Whether you’re operating on the edge, in the cloud, or across hybrid environments, GreptimeDB empowers real-time insights at massive scale — all in one system.

Features

Feature	Description
Unified Observability Data	Store metrics, logs, and traces as timestamped, contextual wide events. Query via SQL, PromQL, and streaming.
High Performance & Cost Effective	Written in Rust, with a distributed query engine, rich indexing, and optimized columnar storage, delivering sub-second responses at PB scale.
Cloud-Native Architecture	Designed for Kubernetes, with compute/storage separation, native object storage (AWS S3, Azure Blob, etc.) and seamless cross-cloud access.
Developer-Friendly	Access via SQL/PromQL interfaces, REST API, MySQL/PostgreSQL protocols, and popular ingestion protocols.
Flexible Deployment	Deploy anywhere: edge (including ARM/Android) or cloud, with unified APIs and efficient data sync.

Learn more in Why GreptimeDB and Observability 2.0 and the Database for It.

Quick Comparison

Feature	GreptimeDB	Traditional TSDB	Log Stores
Data Types	Metrics, Logs, Traces	Metrics only	Logs only
Query Language	SQL, PromQL, Streaming	Custom/PromQL	Custom/DSL
Deployment	Edge + Cloud	Cloud/On-prem	Mostly central
Indexing & Performance	PB-Scale, Sub-second	Varies	Varies
Integration	REST, SQL, Common protocols	Varies	Varies

Performance:

Architecture

Read the architecture document.
DeepWiki provides an in-depth look at GreptimeDB:

Try GreptimeDB

1. Live Demo

Experience GreptimeDB directly in your browser.

2. GreptimeCloud

Start instantly with a free cluster.

3. Docker (Local Quickstart)

docker pull greptime/greptimedb

docker run -p 127.0.0.1:4000-4003:4000-4003 \
  -v "$(pwd)/greptimedb:/greptimedb_data" \
  --name greptime --rm \
  greptime/greptimedb:latest standalone start \
  --http-addr 0.0.0.0:4000 \
  --rpc-bind-addr 0.0.0.0:4001 \
  --mysql-addr 0.0.0.0:4002 \
  --postgres-addr 0.0.0.0:4003

Dashboard: http://localhost:4000/dashboard
Full Install Guide

Troubleshooting:

Cannot connect to the database? Ensure that ports 4000, 4001, 4002, and 4003 are not blocked by a firewall or used by other services.
Failed to start? Check the container logs with docker logs greptime for further details.

Getting Started

Build From Source

Prerequisites:

Rust toolchain (nightly)
Protobuf compiler (>= 3.15)
C/C++ building essentials, including gcc/g++/autoconf and glibc library (eg. libc6-dev on Ubuntu and glibc-devel on Fedora)
Python toolchain (optional): Required only if using some test scripts.

Build and Run:

make
cargo run -- standalone start

Tools & Extensions

Kubernetes: GreptimeDB Operator
Helm Charts: Greptime Helm Charts
Dashboard: Web UI
SDKs/Ingester: Go, Java, C++, Erlang, Rust, JS
Grafana: Official Dashboard

Project Status

Status: Beta.
GA (v1.0): Targeted for mid 2025.

Being used in production by early adopters
Stable, actively maintained, with regular releases (version info)
Suitable for evaluation and pilot deployments

For production use, we recommend using the latest stable release.

If you find this project useful, a ⭐ would mean a lot to us!

Community

We invite you to engage and contribute!

License

GreptimeDB is licensed under the Apache License 2.0.

Commercial Support

Running GreptimeDB in your organization?
We offer enterprise add-ons, services, training, and consulting.
Contact us for details.

Contributing

Read our Contribution Guidelines.
Explore Internal Concepts and DeepWiki.
Pick up a good first issue and join the #contributors Slack channel.

Acknowledgement

Special thanks to all contributors! See AUTHORS.md.

Uses Apache Arrow™ (memory model)
Apache Parquet™ (file storage)
Apache Arrow DataFusion™ (query engine)
Apache OpenDAL™ (data access abstraction)
etcd (meta service)

Description

Open-source, cloud-native, unified observability database for metrics, logs and traces, supporting SQL/PromQL/Streaming.

analytics cloud-native database distributed greptimedb logs metrics monitoring observability observability-database observability-datalake promql rust rust-database sql time-series traces tsdb

Readme Apache-2.0 380 MiB

README.md Unescape Escape

Real-Time & Cloud-Native Observability Databasefor metrics, logs, and traces

GreptimeCloud | User Guide | API Docs | Roadmap 2025

Introduction

Features

Quick Comparison

Architecture

Try GreptimeDB

1. Live Demo

2. GreptimeCloud

3. Docker (Local Quickstart)

Getting Started

Build From Source

Tools & Extensions

Project Status

Community

License

Commercial Support

Contributing

Acknowledgement

README.md

Real-Time & Cloud-Native Observability Database
for metrics, logs, and traces