Lei, HUANG 31cfab81ad feat(mito): parquet memtable reader (#4967)
* wip: row group reader base

* wip: memtable row group reader

* Refactor MemtableRowGroupReader to streamline data fetching

 - Added early return when fetch_ranges is empty to optimize performance.
 - Replaced inline chunk data assignment with a call to `assign_dense_chunk` for cleaner code.

* wip: row group reader

* wip: reuse RowGroupReader

* wip: bulk part reader

* Enhance BulkPart Iteration with Filtering

 - Introduced `RangeBase` to `BulkIterContext` for improved filter handling.
 - Implemented filter application in `BulkPartIter` to prune batches based on predicates.
 - Updated `SimpleFilterContext::new_opt` to be public for broader access.

* chore: add prune test

* fix: clippy

* fix: introduce prune reader for memtable and add more prune test

* Enhance BulkPart read method to return Option<BoxedBatchIterator>

 - Modified `BulkPart::read` to return `Option<BoxedBatchIterator>` to handle cases where no row groups are selected.
 - Added logic to return `None` when all row groups are filtered out.
 - Updated tests to handle the new return type and added a test case to verify behavior when no row groups match the pr

* refactor/separate-paraquet-reader: Add helper function to parse parquet metadata and integrate it into BulkPartEncoder

* refactor/separate-paraquet-reader:
 Change BulkPartEncoder row_group_size from Option to usize and update tests

* refactor/separate-paraquet-reader: Add context module for bulk memtable iteration and refactor part reading

 • Introduce context module to encapsulate context for bulk memtable iteration.
 • Refactor BulkPart to use BulkIterContextRef for reading operations.
 • Remove redundant code in BulkPart by centralizing context creation and row group pruning logic in the new context module.
 • Create new file context.rs with structures and logic for handling iteration context.
 • Adjust part_reader.rs and row_group_reader.rs to reference the new BulkIterContextRef.

* refactor/separate-paraquet-reader: Refactor RowGroupReader traits and implementations in memtable and parquet reader modules

 • Rename RowGroupReaderVirtual to RowGroupReaderContext for clarity.
 • Replace BulkPartVirt with direct usage of BulkIterContextRef in MemtableRowGroupReader.
 • Simplify MemtableRowGroupReaderBuilder by directly passing context instead of creating a BulkPartVirt instance.
 • Update RowGroupReaderBase to use context field instead of virt, reflecting the trait renaming and usage.
 • Modify FileRangeVirt to FileRangeContextRef and adjust implementations accordingly.

* refactor/separate-paraquet-reader: Refactor column page reader creation and remove unused code

 • Centralize creation of SerializedPageReader in RowGroupBase::column_reader method.
 • Remove unused RowGroupCachedReader and related code from MemtableRowGroupPageFetcher.
 • Eliminate redundant error handling for invalid column index in multiple places.

* chore: rebase main and resolve conflicts

* fix: some comments

* chore: resolve conflicts

* chore: resolve conflicts
2025-01-04 02:12:27 +08:00
2023-08-10 08:08:37 +00:00
2023-06-25 11:05:46 +08:00
2025-01-04 02:12:27 +08:00
2023-07-03 10:08:53 +00:00
2023-11-09 10:38:12 +00:00
2023-03-28 19:14:29 +08:00

GreptimeDB Logo

Unified & Cost-Effective Time Series Database for Metrics, Logs, and Events

Introduction

GreptimeDB is an open-source unified & cost-effective time-series database for Metrics, Logs, and Events (also Traces in plan). You can gain real-time insights from Edge to Cloud at Any Scale.

Why GreptimeDB

Our core developers have been building time-series data platforms for years. Based on our best practices, GreptimeDB was born to give you:

  • Unified Processing of Metrics, Logs, and Events

    GreptimeDB unifies time series data processing by treating all data - whether metrics, logs, or events - as timestamped events with context. Users can analyze this data using either SQL or PromQL and leverage stream processing (Flow) to enable continuous aggregation. Read more.

  • Cloud-native Distributed Database

    Built for Kubernetes. GreptimeDB achieves seamless scalability with its cloud-native architecture of separated compute and storage, built on object storage (AWS S3, Azure Blob Storage, etc.) while enabling cross-cloud deployment through a unified data access layer.

  • Performance and Cost-effective

    Written in pure Rust for superior performance and reliability. GreptimeDB features a distributed query engine with intelligent indexing to handle high cardinality data efficiently. Its optimized columnar storage achieves 50x cost efficiency on cloud object storage through advanced compression. Benchmark reports.

  • Cloud-Edge Collaboration

    GreptimeDB seamlessly operates across cloud and edge (ARM/Android/Linux), providing consistent APIs and control plane for unified data management and efficient synchronization. Learn how to run on Android.

  • Multi-protocol Ingestion, SQL & PromQL Ready

    Widely adopted database protocols and APIs, including MySQL, PostgreSQL, InfluxDB, OpenTelemetry, Loki and Prometheus, etc. Effortless Adoption & Seamless Migration. Supported Protocols Overview.

For more detailed info please read Why GreptimeDB.

Try GreptimeDB

1. Live Demo

Try out the features of GreptimeDB right from your browser.

2. GreptimeCloud

Start instantly with a free cluster.

3. Docker Image

To install GreptimeDB locally, the recommended way is via Docker:

docker pull greptime/greptimedb

Start a GreptimeDB container with:

docker run -p 127.0.0.1:4000-4003:4000-4003 \
  -v "$(pwd)/greptimedb:/tmp/greptimedb" \
  --name greptime --rm \
  greptime/greptimedb:latest standalone start \
  --http-addr 0.0.0.0:4000 \
  --rpc-addr 0.0.0.0:4001 \
  --mysql-addr 0.0.0.0:4002 \
  --postgres-addr 0.0.0.0:4003

Access the dashboard via http://localhost:4000/dashboard.

Read more about Installation on docs.

Getting Started

Build

Check the prerequisite:

Build GreptimeDB binary:

make

Run a standalone server:

cargo run -- standalone start

Tools & Extensions

Kubernetes

Dashboard

SDK

Grafana Dashboard

Our official Grafana dashboard for monitoring GreptimeDB is available at grafana directory.

Project Status

GreptimeDB is currently in Beta. We are targeting GA (General Availability) with v1.0 release by Early 2025.

While in Beta, GreptimeDB is already:

  • Being used in production by early adopters
  • Actively maintained with regular releases, about version number
  • Suitable for testing and evaluation

For production use, we recommend using the latest stable release.

Community

Our core team is thrilled to see you participate in any ways you like. When you are stuck, try to ask for help by filling an issue with a detailed description of what you were trying to do and what went wrong. If you have any questions or if you would like to get involved in our community, please check out:

In addition, you may:

Commercial Support

If you are running GreptimeDB OSS in your organization, we offer additional enterprise add-ons, installation services, training, and consulting. Contact us and we will reach out to you with more detail of our commercial license.

License

GreptimeDB uses the Apache License 2.0 to strike a balance between open contributions and allowing you to use the software however you want.

Contributing

Please refer to contribution guidelines and internal concepts docs for more information.

Acknowledgement

Special thanks to all the contributors who have propelled GreptimeDB forward. For a complete list of contributors, please refer to AUTHOR.md.

Description
Languages
Rust 99.6%
Shell 0.1%