greptimedb

mirror of https://github.com/GreptimeTeam/greptimedb.git synced 2026-01-05 12:52:57 +00:00

Go to file

Lei, HUANG 31cfab81ad feat(mito): parquet memtable reader (#4967 )

* wip: row group reader base

* wip: memtable row group reader

* Refactor MemtableRowGroupReader to streamline data fetching

 - Added early return when fetch_ranges is empty to optimize performance.
 - Replaced inline chunk data assignment with a call to `assign_dense_chunk` for cleaner code.

* wip: row group reader

* wip: reuse RowGroupReader

* wip: bulk part reader

* Enhance BulkPart Iteration with Filtering

 - Introduced `RangeBase` to `BulkIterContext` for improved filter handling.
 - Implemented filter application in `BulkPartIter` to prune batches based on predicates.
 - Updated `SimpleFilterContext::new_opt` to be public for broader access.

* chore: add prune test

* fix: clippy

* fix: introduce prune reader for memtable and add more prune test

* Enhance BulkPart read method to return Option<BoxedBatchIterator>

 - Modified `BulkPart::read` to return `Option<BoxedBatchIterator>` to handle cases where no row groups are selected.
 - Added logic to return `None` when all row groups are filtered out.
 - Updated tests to handle the new return type and added a test case to verify behavior when no row groups match the pr

* refactor/separate-paraquet-reader: Add helper function to parse parquet metadata and integrate it into BulkPartEncoder

* refactor/separate-paraquet-reader:
 Change BulkPartEncoder row_group_size from Option to usize and update tests

* refactor/separate-paraquet-reader: Add context module for bulk memtable iteration and refactor part reading

 • Introduce context module to encapsulate context for bulk memtable iteration.
 • Refactor BulkPart to use BulkIterContextRef for reading operations.
 • Remove redundant code in BulkPart by centralizing context creation and row group pruning logic in the new context module.
 • Create new file context.rs with structures and logic for handling iteration context.
 • Adjust part_reader.rs and row_group_reader.rs to reference the new BulkIterContextRef.

* refactor/separate-paraquet-reader: Refactor RowGroupReader traits and implementations in memtable and parquet reader modules

 • Rename RowGroupReaderVirtual to RowGroupReaderContext for clarity.
 • Replace BulkPartVirt with direct usage of BulkIterContextRef in MemtableRowGroupReader.
 • Simplify MemtableRowGroupReaderBuilder by directly passing context instead of creating a BulkPartVirt instance.
 • Update RowGroupReaderBase to use context field instead of virt, reflecting the trait renaming and usage.
 • Modify FileRangeVirt to FileRangeContextRef and adjust implementations accordingly.

* refactor/separate-paraquet-reader: Refactor column page reader creation and remove unused code

 • Centralize creation of SerializedPageReader in RowGroupBase::column_reader method.
 • Remove unused RowGroupCachedReader and related code from MemtableRowGroupPageFetcher.
 • Eliminate redundant error handling for invalid column index in multiple places.

* chore: rebase main and resolve conflicts

* fix: some comments

* chore: resolve conflicts

* chore: resolve conflicts

2025-01-04 02:12:27 +08:00

.cargo

chore: use workspace-wide lints (#3352 )

2024-02-22 01:01:10 +00:00

.config

build: on windows (#2054 )

2023-08-10 08:08:37 +00:00

.github

ci: fix nightly ci task on nix build (#5198 )

2025-01-04 02:12:27 +08:00

config

refactor: cache inverted index with fixed-size page (#5114 )

2024-12-20 14:12:19 +08:00

cyborg

ci: fixup strings in check ci status (#3987 )

2024-05-20 03:59:05 +00:00

docker

ci: install latest protobuf in dev-builder image (#5196 )

2024-12-20 14:12:19 +08:00

docs

docs: remove lg_prof_interval from env (#5103 )

2024-12-06 02:59:16 +00:00

grafana

docs: fix grafana dashboard row (#5192 )

2024-12-20 14:12:19 +08:00

scripts

feat: do not remove time filters in ScanRegion (#5180 )

2024-12-20 14:12:19 +08:00

src

feat(mito): parquet memtable reader (#4967 )

2025-01-04 02:12:27 +08:00

tests

fix(flow): batch builder with type (#5195 )

2024-12-20 14:12:19 +08:00

tests-fuzz

chore: adjust fuzz tests cfg (#5207 )

2025-01-04 02:12:27 +08:00

tests-integration

fix: auto created table ttl check (#5203 )

2024-12-20 14:12:19 +08:00

.coderabbit.yaml

ci: disable auto review (#4387 )

2024-07-18 08:03:37 +00:00

.dockerignore

fix: docker build (#1822 )

2023-06-25 11:05:46 +08:00

.editorconfig

feat: to_timezone function (#3470 )

2024-03-12 01:46:19 +00:00

.env.example

feat: add GcsConfig credential field (#4568 )

2024-08-16 03:11:20 +00:00

.gitignore

chore: add nix-shell configure for a minimal environment for development (#5175 )

2024-12-20 14:12:19 +08:00

.pre-commit-config.yaml

feat: Loki remote write (#4941 )

2024-11-18 08:39:17 +00:00

AUTHOR.md

docs: tweak readme and AUTHOR (#5069 )

2024-11-29 04:08:53 +00:00

Cargo.lock

chore: bump opendal to fork version to fix prometheus layer (#5223 )

2025-01-04 02:12:27 +08:00

Cargo.toml

feat: logs query endpoint (#5202 )

2025-01-04 02:12:27 +08:00

cliff.toml

chore: improve contributor click in git-cliff (#3672 )

2024-04-08 18:15:00 +00:00

codecov.yml

refactor: refactor TableRouteManager (#3392 )

2024-02-28 06:18:09 +00:00

CONTRIBUTING.md

docs(contributing): replace expired links (#4468 )

2024-07-31 06:11:30 +00:00

Cross.toml

fix: libz dependency (#1867 )

2023-07-03 10:08:53 +00:00

LICENSE

chore: multiple licenses fixes (#2714 )

2023-11-09 10:38:12 +00:00

licenserc.toml

ci: replace pull-request actions with cyborg (#3854 )

2024-05-04 03:12:26 +00:00

Makefile

chore: udapte Rust toolchain to 2024-10-19 (#4857 )

2024-10-25 00:23:32 +00:00

README.md

docs: add greptimedb-operator project link in 'Tools & Extensions' and other small improvements (#5216 )

2025-01-04 02:12:27 +08:00

rust-toolchain.toml

chore: make nix compilation environment config more robust (#5183 )

2024-12-20 14:12:19 +08:00

rustfmt.toml

chore: specify import style in rustfmt (#460 )

2022-11-15 15:58:54 +08:00

SECURITY.md

feat: Create SECURITY.md (#1270 )

2023-03-28 19:14:29 +08:00

shell.nix

chore: make nix compilation environment config more robust (#5183 )

2024-12-20 14:12:19 +08:00

taplo.toml

chore: skip reorder workspace tables in taplo (#3388 )

2024-02-26 08:57:49 +00:00

typos.toml

fix(metric-engine): set ttl also on opening metadata regions (#5051 )

2024-11-26 08:14:41 +00:00

README.md

Unified & Cost-Effective Time Series Database for Metrics, Logs, and Events

GreptimeCloud | User Guide | API Docs | Roadmap 2024

Introduction
Features: Why GreptimeDB
Architecture
Try it for free
Getting Started
Project Status
Join the community
- Contributing
Tools & Extensions
License
Acknowledgement

Introduction

GreptimeDB is an open-source unified & cost-effective time-series database for Metrics, Logs, and Events (also Traces in plan). You can gain real-time insights from Edge to Cloud at Any Scale.

Why GreptimeDB

Our core developers have been building time-series data platforms for years. Based on our best practices, GreptimeDB was born to give you:

Unified Processing of Metrics, Logs, and Events

GreptimeDB unifies time series data processing by treating all data - whether metrics, logs, or events - as timestamped events with context. Users can analyze this data using either SQL or PromQL and leverage stream processing (Flow) to enable continuous aggregation. Read more.
Cloud-native Distributed Database

Built for Kubernetes. GreptimeDB achieves seamless scalability with its cloud-native architecture of separated compute and storage, built on object storage (AWS S3, Azure Blob Storage, etc.) while enabling cross-cloud deployment through a unified data access layer.
Performance and Cost-effective

Written in pure Rust for superior performance and reliability. GreptimeDB features a distributed query engine with intelligent indexing to handle high cardinality data efficiently. Its optimized columnar storage achieves 50x cost efficiency on cloud object storage through advanced compression. Benchmark reports.
Cloud-Edge Collaboration

GreptimeDB seamlessly operates across cloud and edge (ARM/Android/Linux), providing consistent APIs and control plane for unified data management and efficient synchronization. Learn how to run on Android.
Multi-protocol Ingestion, SQL & PromQL Ready

Widely adopted database protocols and APIs, including MySQL, PostgreSQL, InfluxDB, OpenTelemetry, Loki and Prometheus, etc. Effortless Adoption & Seamless Migration. Supported Protocols Overview.

For more detailed info please read Why GreptimeDB.

Try GreptimeDB

1. Live Demo

Try out the features of GreptimeDB right from your browser.

2. GreptimeCloud

Start instantly with a free cluster.

3. Docker Image

To install GreptimeDB locally, the recommended way is via Docker:

docker pull greptime/greptimedb

Start a GreptimeDB container with:

docker run -p 127.0.0.1:4000-4003:4000-4003 \
  -v "$(pwd)/greptimedb:/tmp/greptimedb" \
  --name greptime --rm \
  greptime/greptimedb:latest standalone start \
  --http-addr 0.0.0.0:4000 \
  --rpc-addr 0.0.0.0:4001 \
  --mysql-addr 0.0.0.0:4002 \
  --postgres-addr 0.0.0.0:4003

Access the dashboard via http://localhost:4000/dashboard.

Getting Started

Build

Check the prerequisite:

Rust toolchain (nightly)
Protobuf compiler (>= 3.15)
Python toolchain (optional): Required only if built with PyO3 backend. More details for compiling with PyO3 can be found in its documentation.

Build GreptimeDB binary:

make

Run a standalone server:

cargo run -- standalone start

Tools & Extensions

Kubernetes

GreptimeDB Operator

Dashboard

The dashboard UI for GreptimeDB

SDK

Grafana Dashboard

Our official Grafana dashboard for monitoring GreptimeDB is available at grafana directory.

Project Status

GreptimeDB is currently in Beta. We are targeting GA (General Availability) with v1.0 release by Early 2025.

While in Beta, GreptimeDB is already:

Being used in production by early adopters
Actively maintained with regular releases, about version number
Suitable for testing and evaluation

For production use, we recommend using the latest stable release.

Community

Our core team is thrilled to see you participate in any ways you like. When you are stuck, try to ask for help by filling an issue with a detailed description of what you were trying to do and what went wrong. If you have any questions or if you would like to get involved in our community, please check out:

GreptimeDB Community on Slack
GreptimeDB GitHub Discussions forum
Greptime official website

In addition, you may:

View our official Blog
Connect us with Linkedin
Follow us on Twitter

Commercial Support

If you are running GreptimeDB OSS in your organization, we offer additional enterprise add-ons, installation services, training, and consulting. Contact us and we will reach out to you with more detail of our commercial license.

License

GreptimeDB uses the Apache License 2.0 to strike a balance between open contributions and allowing you to use the software however you want.

Contributing

Please refer to contribution guidelines and internal concepts docs for more information.

Acknowledgement

Special thanks to all the contributors who have propelled GreptimeDB forward. For a complete list of contributors, please refer to AUTHOR.md.

GreptimeDB uses Apache Arrow™ as the memory model and Apache Parquet™ as the persistent file format.
GreptimeDB's query engine is powered by Apache Arrow DataFusion™.
Apache OpenDAL™ gives GreptimeDB a very general and elegant data access abstraction layer.
GreptimeDB's meta service is based on etcd.
GreptimeDB uses RustPython for experimental embedded python scripting.

Description

Open-source, cloud-native, unified observability database for metrics, logs and traces, supporting SQL/PromQL/Streaming.

analytics cloud-native database distributed greptimedb logs metrics monitoring observability observability-database observability-datalake promql rust rust-database sql time-series traces tsdb

Readme Apache-2.0 372 MiB