mirror of https://github.com/GreptimeTeam/greptimedb.git synced 2026-01-09 14:52:58 +00:00

Go to file

evenyag 7c779a9861 feat: Add region schema for storage engine (#171 )

* refactor: Merge RowKeyMetadata into ColumnsMetadata

Now RowKeyMetadata and ColumnsMetadata are almost always being used together, no need
to separate them into two structs. Now they are combined into the single
ColumnsMetadata struct.

chore: Make some fields of metadata private

feat: Replace schema in RegionMetadata by RegionSchema

The internal schema of a region should have the knownledge about all
internal columns that are reserved and used by the storage engine, such as
sequence, value type. So we introduce the `RegionSchema`, and it would
holds a `SchemaRef` that only contains the columns that user could see.

feat: Value derives Serialize and supports converting into json value

feat: Add version to schema

The schema version has an initial value 0 and would bump each time the
schema being altered.

feat: Adds internal columns to region metadata

Introduce the concept of reserved columns and internal columns.
Reserved columns are columns that their names, ids are reserved by the storage
engine, and could not be used by the user. Reserved columns usually have
special usage. Reserved columns expect the version columns are also
called internal columns (though the version could also be thought as a
special kind of internal column), are not visible to user, such as our
internal sequence, value_type columns.

The RegionMetadataBuilder always push internal columns used by the
engine to the columns in metadata. Internal columns are all stored
behind all user columns in the columns vector.

To avoid column id collision, the id reserved for columns has the most
significant bit set to 1. And the RegionMetadataBuilder would check the
uniqueness of the column id.

chore: Rebase develop and fix compile error

feat: add internal schema to region schema

feat: Add SchemaBuilder to build Schema

feat: Store row key end in region schema metadata

Also move the arrow schema construction to region::schema mod

feat: Add SstSchema

refactor: Replace MemtableSchema by RegionSchema

Now when writing sst files, we could use the arrow schema from our sst
schema, which contains the internal columns.

feat: Use SstSchema to read parquet

Adds user_column_end to metadata. When reading parquet file,
converts the arrow schema into SstSchema, then uses the row_key_end
and user_column_end to find out row key parts, value parts and internal
columns, instead of using the timestamp index, which may yields
incorrect index if we don't put the timestamp at the end of row key.

Move conversion from Batch to arrow Chunk to SstSchema, so SST mod doesn't
need to care the order of key, value and internal columns.

test: Add test for Value to serde_json::Value

feat: Add RawRegionMetadata to persist RegionMetadata

test: Add test to RegionSchema

fix: Fix clippy

To fix clippy::enum_clike_unportable_variant lint, define the column id
offset in ReservedColumnType and compute the final column id in
ReservedColumnId's const method

refactor: Move batch/chunk conversion to SstSchema

The parquet ChunkStream now holds the SstSchema and use its method to
convert Chunk into Batch.

chore: Address CR comment

Also add a test for pushing internal column to RegionMetadataBuilder

chore: Address CR comment

chore: Use bitwise or to compute column id

* chore: Address CR comment

2022-08-17 15:28:38 +08:00

.github

ci: ignore error.rs in coverage (#162 )

2022-08-11 15:01:45 +08:00

config

feat: unify servers and mysql server in datanode (#172 )

2022-08-17 14:29:12 +08:00

docs/how-to

feat: UDAF made generically (#91 )

2022-07-25 10:35:36 +08:00

src

feat: Add region schema for storage engine (#171 )

2022-08-17 15:28:38 +08:00

test-util

feat: MySQL protocol server (#158 )

2022-08-12 11:41:45 +08:00

.gitignore

feat: Prototype of the storage engine (#107 )

2022-07-25 15:26:00 +08:00

Cargo.lock

feat: unify servers and mysql server in datanode (#172 )

2022-08-17 14:29:12 +08:00

Cargo.toml

feat: unify servers and mysql server in datanode (#172 )

2022-08-17 14:29:12 +08:00

CONTRIBUTING.md

doc: contributing.md (#67 )

2022-07-07 15:54:34 +08:00

README.md

feat: adds datanode config file supporting, close #156 (#167 )

2022-08-15 16:17:56 +08:00

rust-toolchain

build: Specific rust-toolchain (nightly-2022-04-03)

2022-04-20 10:43:02 +08:00

rustfmt.toml

style: add rustfmt.toml, group imports

2022-04-21 18:01:31 +08:00

README.md

GreptimeDB

GreptimeDB: the next-generation hybrid timeseries/analytics processing database in the cloud.

Getting Started

Prerequisites

To compile GreptimeDB from source, you'll need the following:

Rust
Protobuf
OpenSSL

Rust

The easiest way to install Rust is to use rustup, which will check our rust-toolchain file and install correct Rust version for you.

Protobuf

protoc is required for compiling .proto files. protobuf is available from major package manager on macos and linux distributions. You can find an installation instructions here.

OpenSSL

For Ubuntu:

sudo apt install libssl-dev

For RedHat-based: Fedora, Oracle Linux, etc:

sudo dnf install openssl-devel

For macOS:

brew install openssl

Usage

// Start datanode with default options.
cargo run -- datanode start

OR

// Start datanode with `http-addr` option.
cargo run -- datanode start --http-addr=0.0.0.0:9999

OR

// Start datanode with `log-dir` and `log-level` options.
cargo run -- --log-dir=logs --log-level=debug datanode start

Start datanode with config file:

cargo run -- --log-dir=logs --log-level=debug datanode start -c ./config/datanode.example.toml

Description

Open-source, cloud-native, unified observability database for metrics, logs and traces, supporting SQL/PromQL/Streaming.

analytics cloud-native database distributed greptimedb logs metrics monitoring observability observability-database observability-datalake promql rust rust-database sql time-series traces tsdb

Readme Apache-2.0 433 MiB