Commit Graph

56 Commits

Author SHA1 Message Date
LFC
04df80e640 fix: further ease the restriction of executing SQLs in new GRPC interface (#797)
* fix: carry not recordbatch result in FlightData, to allow executing SQLs other than selection in new GRPC interface

* Update src/datanode/src/instance/flight/stream.rs

Co-authored-by: Jiachun Feng <jiachun_feng@proton.me>
2022-12-28 16:43:21 +08:00
Ruihang Xia
26a3e93ca7 chore: util workspace deps in more places (#792)
Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>
2022-12-27 16:26:59 +08:00
Yingwen
f8500e54c1 refactor: Remove PutOperation and Simplify WriteRequest API (#775)
* chore: Remove unused MutationExtra

* refactor(storage): Refactor Mutation and Payload

Change Mutation from enum to a struct that holds op type and record
batches so the encoder don't need to convert the mutation into record
batch. Now The Payload is no more an enum, it just holds the data, to
be serialized to the WAL, of the WriteBatch. The encoder and decoder
now deal with the Payload instead of the WriteBatch, so we could hold
more information not necessary to be stored to the WAL in the
WriteBatch.

This commit also merge variants in write_batch::Error to storage::Error
as some variants of them denote the same error.

* test(storage): Pass all tests in storage

* chore: Remove unused codes then format codes

* test(storage): Fix test_put_unknown_column test

* style(storage): Fix clippy

* chore: Remove some unused codes

* chore: Rebase upstream and fix clippy

* chore(storage): Remove unused codes

* chore(storage): Update comments

* feat: Remove PayloadType from wal.proto

* chore: Address CR comments

* chore: Remove unused write_batch.proto
2022-12-26 13:11:24 +08:00
LFC
dc52a51576 chore: upgrade to Arrow 29.0 and use workspace package and dependencies (#782)
* chore: upgrade to Arrow 29.0 and use workspace package and dependencies

* fix: resolve PR comments

Co-authored-by: luofucong <luofucong@greptime.com>
2022-12-23 14:28:37 +08:00
Lei, HUANG
0653301754 feat: replace arrow2 with official implementation 🎉 (#753)
* chore: kick off. change datafusion/arrow/parquet to target version

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* chore: replace one last datafusion dep

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* feat: arrow_array switch to arrow

* chore: update dep of binary vector

* chore: fix wrong merge commit

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* feat: Switch to datatypes2

* feat: Make recordbatch compile

* chore: sort Cargo.toml

* feat: Fix common::recordbatch compiler errors

* feat: Fix recordbatch test compiling issue

* fix: api crate (#708)

* fix: rename ConcreteDataType::timestamp_millis_type to ConcreteDataType::timestamp_millisecond_type. fix other warnings regarding timestamp

* fix: revert changes in datatypes2

* fix: helper

* chore: delete datatypes based on arrow2

* feat: Fix some compiler errors in common::query (#710)

* feat: Fix some compiler errors in common::query

* feat: test_collect use vectors api

* fix: common-query subcrate (#712)

* fix: record batch adapter

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* fix error enum

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* fix: Fix common::query compiler errors (#713)

* feat: Move conversion to ScalarValue to value.rs

* fix: Fix common::query compiler errors

This commit also make InnerError pub(crate)

* feat: Implements diff accumulator using WrapperType (#715)

* feat: Remove usage of opaque error from common::recordbatch

* feat: Remove opaque error from common::query

* feat: Fix diff compiler errors

Now common_function just use common_query's Error and Result. Adds
a LargestType associated type to LogicalPrimitiveType to get the largest
type a logical primitive type can cast to.

* feat: Remove LargestType from NativeType trait

* chore: Update comments

* feat: Restrict Scalar::RefType of WrapperType to itself

Add trait bound `for<'a> Scalar<RefType<'a> = Self>` to WrapperType

* chore: Address CR comments

* chore: Format codes

* fix: fix compile error for mean/polyval/pow/interp ops

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* Revert "fix: fix compile error for mean/polyval/pow/interp ops"

This reverts commit fb0b4eb826.

* fix: Fix compiler errors in argmax/rate/median/norm_cdf (#716)

* fix: Fix compiler errors in argmax/rate/median/norm_cdf

* chore: Address CR comments

* fix: fix compile error for mean/polyval/pow/interp ops (#717)

* fix: fix compile error for mean/polyval/pow/interp ops

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* simplify type bounds

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* fix: fix argmin/percentile/clip/interp/scipy_stats_norm_pdf errors (#718)

fix: fix argmin/percentile/clip/interp/scipy_stats_norm_pdf compiler errors

* fix: fix other compile error in common-function (#719)

* further fixing

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* fix all compile errors in common function

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* fix: Fix tests and clippy for common-function subcrate (#726)

* further fixing

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* fix all compile errors in common function

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* fix tests

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* fix clippy

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* revert test changes

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* fix: row group pruning (#725)

* fix: row group pruning

* chore: use macro to simplify stats implemetation

* fxi: CR comments

* fix: row group metadata length mismatch

* fix: simplify code

* fix: Fix common::grpc compiler errors (#722)

* fix: Fix common::grpc compiler errors

This commit refactors RecordBatch and holds vectors in the RecordBatch
struct, so we don't need to cast the array to vector when doing
serialization or iterating the batch.

Now we use the vector API instead of the arrow API in grpc crate.

* chore: Address CR comments

* fix common record batch

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* fix: Fix compile error in server subcrate (#727)

* fix: Fix compile error in server subcrate

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* remove unused type alias

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* explicitly panic

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* Update src/storage/src/sst/parquet.rs

Co-authored-by: Yingwen <realevenyag@gmail.com>

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>
Co-authored-by: Yingwen <realevenyag@gmail.com>

* fix: Fix common grpc expr (#730)

* fix compile errors

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* rename fn names

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* fix styles

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* fix wranings in common-time

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* fix: pre-cast to avoid tremendous match arms (#734)

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* feat: upgrade storage crate to arrow and parquet offcial impl (#738)

* fix: compile erros

* fix: parquet reader and writer

* fix: parquet reader and writer

* fix: WriteBatch IPC encode/decode

* fix: clippy errors in storage subcrate

* chore: remove suspicious unwrap

* fix: some cr comments

* fix: CR comments

* fix: CR comments

* fix: Fix compiler errors in catalog and mito crates (#742)

* fix: Fix compiler errors in mito

* fix: Fix compiler errors in catalog crate

* style: Fix clippy

* chore: Fix use

* Merge pull request #745

* fix nyc-taxi and util

* Merge branch 'replace-arrow2' into fix-others

* fix substrait

* fix warnings and error in test

* fix: Fix imports in optimizer.rs

* fix: errors in optimzer

* fix: remove unwrap

* fix: Fix compiler errors in query crate (#746)

* fix: Fix compiler errors in state.rs

* fix: fix compiler errors in state

* feat: upgrade sqlparser to 0.26

* fix: fix datafusion engine compiler errors

* fix: Fix some tests in query crate

* fix: Fix all warnings in tests

* feat: Remove `Type` from timestamp's type name

* fix: fix query tests

Now datafusion already supports median, so this commit also remove the
median function

* style: Fix clippy

* feat: Remove RecordBatch::pretty_print

* chore: Address CR comments

* Update src/query/src/query_engine/state.rs

Co-authored-by: Ruihang Xia <waynestxia@gmail.com>

* fix: frontend compile errors (#747)

fix: fix compile errors in frontend

* fix: Fix compiler errors in script crate (#749)

* fix: Fix compiler errors in state.rs

* fix: fix compiler errors in state

* feat: upgrade sqlparser to 0.26

* fix: fix datafusion engine compiler errors

* fix: Fix some tests in query crate

* fix: Fix all warnings in tests

* feat: Remove `Type` from timestamp's type name

* fix: fix query tests

Now datafusion already supports median, so this commit also remove the
median function

* style: Fix clippy

* feat: Remove RecordBatch::pretty_print

* chore: Address CR comments

* feat: Add column_by_name to RecordBatch

* feat: modify select_from_rb

* feat: Fix some compiler errors in vector.rs

* feat: Fix more compiler errors in vector.rs

* fix: fix table.rs

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* fix: Fix compiler errors in coprocessor

* fix: Fix some compiler errors

* fix: Fix compiler errors in script

* chore: Remove unused imports and format code

* test: disable interval tests

* test: Fix test_compile_execute test

* style: Fix clippy

* feat: Support interval

* feat: Add RecordBatch::columns and fix clippy

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>
Co-authored-by: Ruihang Xia <waynestxia@gmail.com>

* fix: Fix All The Tests! (#752)

* fix: Fix several tests compile errors

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* fix: some compile errors in tests

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* fix: compile errors in frontend tests

* fix: compile errors in frontend tests

* test: Fix tests in api and common-query

* test: Fix test in sql crate

* fix: resolve substrait error

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* chore: add more test

* test: Fix tests in servers

* fix instance_test

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* test: Fix tests in tests-integration

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>
Co-authored-by: Lei, HUANG <mrsatangel@gmail.com>
Co-authored-by: evenyag <realevenyag@gmail.com>

* fix: clippy errors

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>
Co-authored-by: Ruihang Xia <waynestxia@gmail.com>
Co-authored-by: evenyag <realevenyag@gmail.com>
2022-12-15 18:49:12 +08:00
Lei, HUANG
756c068166 feat: logstore compaction (#740)
* feat: add benchmark for wal

* add bin

* feat: impl wal compaction

* chore: This reverts commit ef9f2326

* chore: This reverts commit 9142ec0e

* fix: remove empty files

* fix: failing tests

* fix: CR comments

* fix: Mark log as stable after writer applies manifest

* fix: some cr comments and namings

* chore: rename all stable_xxx to obsolete_xxx

* chore: error message
2022-12-14 16:15:29 +08:00
Ruihang Xia
7ba512980a chore: add APACHE-2.0 license header (#518)
* feat: add license checker workflow

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* fix existing header

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* specify license for internal sub-crate

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* fix rustfmt

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>
2022-11-15 18:05:46 +08:00
Ruihang Xia
1565c8d236 chore: specify import style in rustfmt (#460)
Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>
2022-11-15 15:58:54 +08:00
dennis zhuang
74ea529d1a feat: move time index metadata from schema into field (#444)
* feat: move time index metadata from schema into field

* chore: remove useless code

* test: test select with column alias

* fix: conflicts with develop branch

* test: add test

* test: order by timestamp to ensure query results order

* fix: comment
2022-11-11 15:36:27 +08:00
Lei, Huang
6288fdb6bc feat: frontend catalog (#437)
* feat: add frontend catalog
2022-11-10 11:52:57 +08:00
Ruihang Xia
af1df2066c perf: enlarge write row group size (#413)
Signed-off-by: Ruihang Xia <waynestxia@gmail.com>
2022-11-08 11:23:10 +08:00
Yingwen
64dac51e83 feat: Holds ColumnMetadata in StoreSchema (#333)
* chore: Update StoreSchema comment

* feat: Add metadata to ColumnSchema

* feat: Impl conversion between ColumnMetadata and ColumnSchema

We could use this feature to store the ColumnMetadata as arrow's
Schema, since the ColumnSchema could be further converted to an arrow
schema. Then we could use ColumnMetadata in StoreSchema, which contains
more information, especially the column id.

* feat(storage): Merge schema::Error to metadata::Error

To avoid cyclic dependency of two Errors

* feat(storage): Store ColumnMetadata in StoreSchema

* feat(storage): Use StoreSchemaRef to avoid cloning the whole StoreSchema struct

* test(storage): Fix test_store_schema

* feat(datatypes): Return error on duplicate meta key

* chore: Address CR comments
2022-10-25 11:06:22 +08:00
Ruihang Xia
fbea07ea83 chore: remove unused dependencies (#319)
* chore: remove unused dependences

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

* fix: recover some dev-deps

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>
2022-10-19 14:08:54 +08:00
Yingwen
cdf3280fcf feat: Region supports write requests with old schema (#297)
* feat: Adds ColumnDefaultConstraint::create_default_vector

ColumnDefaultConstraint::create_default_vector is ported from
MitoTable::try_get_column_default_constraint_vector.

* refactor: Replace try_get_column_default_constraint_vector by create_default_vector

* style: Remove unnecessary map_err in MitoTable::insert

* feat: Adds compat_write

For column in `dest_schema` but not in `write_batch`, this method would insert a
vector with default value to the `write_batch`. If there are columns not in
`dest_schema`, an error would be returned.

* chore: Add info log to RegionInner::alter

* feat(storage): RegionImpl::write support request with old version

* feat: Add nullable check when creating default value

* feat: Validate nullable and default value

* chore: Modify PutOperation comments

* chore: Make ColumnDescriptor::is_nullable readonly and validate name

* feat: Use CompatWrite trait to replace campat::compat_write method

Adds a CompactWrite trait to support padding columns to WriteBatch:
- The WriteBatch and PutData implements this trait
- Fix the issue that WriteBatch::schema is not updated to the
  schema after compat
- Also validate the created column when adding to PutData

The WriteBatch would also pad default value to missing columns in
PutData, so the memtable inserter don't need to manually check whether
the column is nullable and then insert a NullVector. All WriteBatch is
ensured to have all columns defined by the schema in its PutData.

* feat: Validate constraint by ColumnDefaultConstraint::validate()

The ColumnDefaultConstraint::validate() would also ensure the default
value has the same data type as the column's.

* feat: Use NullVector for null columns

* fix: Fix BinaryType returns wrong logical_type_id

* fix: Fix tests and revert NullVector for null columns

NullVector doesn't support custom logical type make it hard to
encode/decode, which also cause the arrow/protobuf codec of write batch
fail.

* fix: create_default_vector use replicate to create vector with default value

This would fix the test_codec_with_none_column_protobuf test, as we need
to downcast the vector to construct the protobuf values.

* test: add tests for column default constraints

* test: Add tests for CompatWrite trait impl

* test: Test write region with old schema

* fix(storage): Fix replay() applies metadata too early

The committed sequence of the RegionChange action is the sequence of the
last entry that use the old metadata (schema). During replay, we should
apply the new metadata after we see an entry that has sequence greater
than (not equals to) the `RegionChange::committed_sequence`

Also remove duplicate `set_committed_sequence()` call in
persist_manifest_version()

* chore: Removes some unreachable codes

Also add more comments to document codes in these files

* refactor: Refactor MitoTable::insert

Return error if we could not create a default vector for given column,
instead of ignoring the error

* chore: Fix incorrect comments

* chore: Fix typo in error message
2022-10-18 10:47:24 +08:00
dennis zhuang
494a93c4f2 feat: manifest improvements (#303)
* feat: adds commited_sequence to RegionChange action, #281

* refactor: saving protocol action when writer version is changed

* feat: recover all region medata in manifest and replay them when replaying WAL, #282

* refactor: minor change and test recovering metadata after altering table schema

* fix: write wrong min_reader_version into manifest for region

* refactor: move up DataRow

* refactor: by CR comments

* test: assert recovered metadata

* refactor: by CR comments

* fix: comment
2022-10-13 15:43:35 +08:00
evenyag
ed89cc3e21 feat: Change signature of the Region::alter method (#287)
* feat: Change signature of the Region::alter method

* refactor: Add builders for ColumnsMetadata and ColumnFamiliesMetadata

* feat: Support altering the region metadata

Altering the region metadata is done in a copy-write fashion:
1. Convert the `RegionMetadata` into `RegionDescriptor` which is more
   convenient to mutate
2. Apply the `AlterOperation` to the `RegionDescriptor`. This would
   mutate the descriptor in-place
3. Create a `RegionMetadataBuilder` from the descriptor, bump the
   version and then build the new metadata

* feat: Implement altering table using the new Region::alter api

* refactor: Replaced wal name by region id

Region id is cheaper to clone than name

* chore: Remove pub(crate) of build_xxxx in engine mod

* style: fix clippy

* test: Add tests for AlterOperation and RegionMetadata::alter

* chore: ColumnsMetadataBuilder methods return &mut Self
2022-09-28 13:56:25 +08:00
dennis zhuang
5f322ba16e feat: impl default constraint for column (#273)
* feat: impl default value for column in schema

* test: add test for column's default value

* refactor: rename ColumnDefaultValue to ColumnDefaultConstraint

* fix: timestamp column may be a constant vector

* fix: test_shutdown_pg_server

* fix: typo

Co-authored-by: LFC <bayinamine@gmail.com>

* fix: typo

Co-authored-by: LFC <bayinamine@gmail.com>

* fix: typo

Co-authored-by: LFC <bayinamine@gmail.com>

* chore: use table_info directly

Co-authored-by: LFC <bayinamine@gmail.com>

* refactor: by CR comments

Co-authored-by: LFC <bayinamine@gmail.com>
2022-09-22 10:43:21 +08:00
Lei, Huang
35ba0868b5 feat: impl filter push down to parquet reader (#262)
* wip add predicate definition

* fix value move

* implement predicate and prune

* impl filter push down in chunk reader

* add more expr tests

* chore: rebase develop

* fix: unit test

* fix: field name/index lookup when building pruning stats

* chore: add some meaningless test

* fix: remove unnecessary extern crate

* fix: use datatypes::schema::SchemaRef
2022-09-21 11:47:55 +08:00
dennis zhuang
c8cb705d9e ci: pre-commit configuration and hooks (#261)
* feat: adds pre-commit config and hooks

* refactor: sort all Cargo.toml by cargo-sort

* ci: adds conventional-pre-commit hook to pre-commit

* fix: remove .pre-commit-hooks.yaml

* fix: readme

* Update .pre-commit-config.yaml

Co-authored-by: Lei, Huang <6406592+v0y4g3r@users.noreply.github.com>

* ci: move clippy hook to push stage

* docs: install pre-push github hook

Co-authored-by: Lei, Huang <6406592+v0y4g3r@users.noreply.github.com>
2022-09-15 11:30:08 +08:00
LFC
5e67301c00 feat: implement alter table (#218)
* feat: implement alter table

* Currently we have no plans to support altering the primary keys (maybe never), so removed the related codes.

* make `alter` a trait function in table

* address other CR comments

* cleanup

* rebase develop

* resolve code review comments

Co-authored-by: luofucong <luofucong@greptime.com>
2022-09-06 13:44:34 +08:00
evenyag
d71ae7934e feat: Upgrade rust to nightly-2022-07-14 (#217)
* feat: upgrade rust to nightly-2022-07-14

* style: Fix some clippy warnings

* style: clippy fix

* style: fix clippy

* style: Fix clippy

Some PartialEq warnings have been work around using cfg_attr test

* feat: Implement Eq and PartialEq for PrimitiveType

* chore: Remove unnecessary allow

* chore: Remove usage of cfg_attr for PartialEq
2022-09-01 17:50:48 +08:00
dennis zhuang
1caa94cd3e feat: save create table schema (#211)
* feat: save create table schema and respect user defined columns order when querying, close #179

* fix: address CR problems

* refactor: use with_context with ProjectedColumnNotFoundSnafu
2022-08-26 19:22:55 +08:00
evenyag
53637c90fd feat: Support projection (#192)
* feat: Add projected schema

* feat: Use projected schema to read sst

* feat: Use vector of column to implement Batch

* feat: Use projected schema to convert batch to chunk

* feat: Add no_projection() to build ProjectedSchema

* feat: Memtable supports projection

The btree memtable use `is_needed()` to filter unneeded value columns,
then use `ProjectedSchema::batch_from_parts()` to construct
batch, so it don't need to known the layout of internal columns.

* test: Add tests for ProjectedSchema

* test: Add tests for ProjectedSchema

Also returns error if the `projected_columns` used to build the
`ProjectedSchema` is empty.

* test: Add test for memtable projection

* feat: Table pass projection to storage engine

* fix: Use timestamp column name as schema metadata

This fix the issue that the metadata refer to the wrong timestamp column
if datafusion reorder the fields of the arrow schema.

* fix: Fix projected schema not passed to memtable

* feat: Add tests for region projection

* chore: fix clippy

* test: Add test for unordered projection

* chore: Move projected_schema to ReadOptions

Also fix some typo
2022-08-25 15:27:47 +08:00
evenyag
5c9b46fbf8 refactor: Rename value_type to op_type (#185) 2022-08-18 16:07:45 +08:00
evenyag
7c779a9861 feat: Add region schema for storage engine (#171)
* refactor: Merge RowKeyMetadata into ColumnsMetadata

Now RowKeyMetadata and ColumnsMetadata are almost always being used together, no need
to separate them into two structs. Now they are combined into the single
ColumnsMetadata struct.

chore: Make some fields of metadata private

feat: Replace schema in RegionMetadata by RegionSchema

The internal schema of a region should have the knownledge about all
internal columns that are reserved and used by the storage engine, such as
sequence, value type. So we introduce the `RegionSchema`, and it would
holds a `SchemaRef` that only contains the columns that user could see.

feat: Value derives Serialize and supports converting into json value

feat: Add version to schema

The schema version has an initial value 0 and would bump each time the
schema being altered.

feat: Adds internal columns to region metadata

Introduce the concept of reserved columns and internal columns.
Reserved columns are columns that their names, ids are reserved by the storage
engine, and could not be used by the user. Reserved columns usually have
special usage. Reserved columns expect the version columns are also
called internal columns (though the version could also be thought as a
special kind of internal column), are not visible to user, such as our
internal sequence, value_type columns.

The RegionMetadataBuilder always push internal columns used by the
engine to the columns in metadata. Internal columns are all stored
behind all user columns in the columns vector.

To avoid column id collision, the id reserved for columns has the most
significant bit set to 1. And the RegionMetadataBuilder would check the
uniqueness of the column id.

chore: Rebase develop and fix compile error

feat: add internal schema to region schema

feat: Add SchemaBuilder to build Schema

feat: Store row key end in region schema metadata

Also move the arrow schema construction to region::schema mod

feat: Add SstSchema

refactor: Replace MemtableSchema by RegionSchema

Now when writing sst files, we could use the arrow schema from our sst
schema, which contains the internal columns.

feat: Use SstSchema to read parquet

Adds user_column_end to metadata. When reading parquet file,
converts the arrow schema into SstSchema, then uses the row_key_end
and user_column_end to find out row key parts, value parts and internal
columns, instead of using the timestamp index, which may yields
incorrect index if we don't put the timestamp at the end of row key.

Move conversion from Batch to arrow Chunk to SstSchema, so SST mod doesn't
need to care the order of key, value and internal columns.

test: Add test for Value to serde_json::Value

feat: Add RawRegionMetadata to persist RegionMetadata

test: Add test to RegionSchema

fix: Fix clippy

To fix clippy::enum_clike_unportable_variant lint, define the column id
offset in ReservedColumnType and compute the final column id in
ReservedColumnId's const method

refactor: Move batch/chunk conversion to SstSchema

The parquet ChunkStream now holds the SstSchema and use its method to
convert Chunk into Batch.

chore: Address CR comment

Also add a test for pushing internal column to RegionMetadataBuilder

chore: Address CR comment

chore: Use bitwise or to compute column id

* chore: Address CR comment
2022-08-17 15:28:38 +08:00
Lei, Huang
a1c4921933 feat: impl create table sql execution (#168)
* catalog manager allocates table id

* rebase develop

* add some tests

* add some more test

* fix some cr comments

* insert into system catalog

* use slice pattern to simplify code

* add optional dependencies

* add sql-to-request test

* successfully recover

* fix unit tests

* rebase develop

* add some tests

* fix some cr comments

* fix some cr comments

* add a lock to CatalogManager

* feat: add gmt_created and gmt_modified columns to system catalog table
2022-08-17 10:53:19 +08:00
Lei, Huang
b695881c6a fix: logstore read supports namespace isolation (#163)
* logstore read supports namespace isolation

* add namespace isolation test

* update

* revert unexpected changes

* Update log.rs

remove unnecessary info log

* reformat code
2022-08-15 11:43:48 +08:00
dennis zhuang
41ffbe82f8 feat: impl table manifest (#157)
* feat: impl TableManifest and refactor table engine, object store etc.

* feat: persist table metadata when creating it

* fix: remove unused file src/storage/src/manifest/impl.rs

* feat: impl recover table info from manifest

* test: add open table test and table manifest test

* fix: resolve CR problems

* fix: compile error and remove region id

* doc: describe parent_dir

* fix: address CR problems

* fix: typo

* Revert "fix: compile error and remove region id"

This reverts commit c14c250f8a.

* fix: compile error and generate region id by table_id and region number
2022-08-12 10:47:33 +08:00
Lei, Huang
d141fbc674 fix: log store write and read (#97)
* add pwrite

* write

* fix write

* error handling in write thread

* wrap some LogFile field to state field

* remove some unwraps

* reStructure some code

* implement file chunk

* composite chunk decode

* add test for chunk stream

* fix buffer test

* remove some useless code

* add test for read_at and file_chunk_stream

* use bounded channel to implement back pressure

* reimplement entry read and decoding

* add some doc

* clean some code

* use Sender::blocking_send to replace manually spawn

* support synchronous file chunk stream

* remove useless clone

* remove set_offset from Entry trait

* cr: fix some comments

* fix: add peek methods for Buffer

* add test for read at the middle of file

* fix some minor issues on comments

* rebase on to develop

* add peek_to_slice and read_to_slice

* initialize file chunk on heap

* fix some comments in CR

* respect entry id set outside LogStore

* fix unit test

* Update src/log-store/src/fs/file.rs

Co-authored-by: evenyag <realevenyag@gmail.com>

* fix some cr comments

Co-authored-by: evenyag <realevenyag@gmail.com>
2022-08-10 11:16:04 +08:00
Lei, Huang
80372720bb refactor: open_region return None if region does not exist (#145)
* refactor: open_region return None if region does not exist

* fix some unit tests

* fix some CR comments
2022-08-08 16:53:52 +08:00
evenyag
f98d406580 refactor(storage): Add region id and name to metadata (#140)
* refactor(storage): Add region id and name to metadata

Add region id and name to `RegionMetadata`, simplify input arguments of
`RegionImpl::create()` and `RegionImpl::new()` method, since id and name
are already in metadata/version.

To avoid an atomic load of `Version` each time we access the region
id/name, we still store a copy of id/name in `SharedData`.

* chore: Remove todo in OpenOptions

Create region if missing when opening the region would be hard to
implement, since sometimes we may don't known the exact region schema user
would like to have.

* refactor: Make id and name of region readonly

By making `id` and `name` fields of `SharedData` and `RegionMetadata`
private and only exposing a pub getter.
2022-08-08 16:46:51 +08:00
dennis zhuang
e9d6546c12 feat: impl create_table for MitoEngine, #125 (#142)
* feat: impl create_table for MitoEngine, #125

* fix: typo

* fix: address CR problems

* fix: address CR problems

* fix: address CR problems

* fix: format

* refactor: minor change
2022-08-08 15:36:00 +08:00
evenyag
fb4495eb46 feat: Adds TableEngine::open_table() (#132)
* feat: Add `open_table()` method to `TableEngine`

* feat: Implements MitoEngine::open_table()

For simplicity, this implementation just use the table name as region
name, and using that name to open a region for that table. It also
introduce a mutex to avoid opening the same table simultaneously.

* refactor: Shorten generic param name

Use `S` instead of `Store` for `MitoEngine`.

* test: Mock storage engine for table engine test

Add a `MockEngine` to mock the storage engine, so that testing the mito
table engine can sometimes use the mocked storage.

* test: Add open table test

Also remove `storage::gen_region_name` method, and always use table name
as default region name, so the table engine can open the table created
by `create_table()`.

* chore: Add open table log
2022-08-04 17:35:17 +08:00
dennis zhuang
6db6106829 feat: impl recovering version from manifest for region (#127)
* feat: impl recovering version from manifest for region

* refactor: rename try_apply_edit to replay_edit

* fix: remove println

* fix: address CR problems

* feat: remove Metadata in manifest trait and update region manifest state after recovering
2022-08-03 11:05:52 +08:00
Ning Sun
e3267673a9 refactor: use auto generated collection build function (#128)
* refactor: use auto generated collection build function

* refactor: change functions of ColumeFamilyDescriptorBuilder to be owned
2022-08-03 09:21:28 +08:00
Ning Sun
cd42f308a8 refactor: remove constructors from trait (#121)
* refactor: remove constructors from trait

* refactor: move PutOp into its parent type

* refactor: move put constructor to write request

* refactor: change visibility of PutData constructors

call from WriteRequest instead

* refactor: consistent naming for entry constructor

* refactor: fix constructor form Namespace trait

* refactor: remove comment code

* doc: fix doc comments
2022-08-02 16:25:03 +08:00
Lei, Huang
b5fcdae01d LogStore::read takes a reference to namespace (#126) 2022-08-02 12:59:08 +08:00
Lei, Huang
868098d2b7 feat: impl Logstore::read by LogFile::create_stream (#124)
* feat: bridge LogStore::read to LogFile::create_stream

* fix some CR comments
2022-08-02 11:14:28 +08:00
evenyag
f06968f4f5 feat: Engine::open_region code skeleton (#120)
* refactor: Move fields in SharedData to EngineInner

Since `SharedData` isn't shared now, we move all its fields to
EngineInner, and remove the `SharedData` struct, also remove the
unused config field.

* feat: Store RegionSlot in engine's region map

A `RegionSlot` has three possible state:
- Opening
- Creating
- Ready (Holds the `RegionImpl`)

Also use the `RegionSlot` as a placeholder in the region map to indicate
the region is opening/creating, so another open/create request will
fail immediately. The `SlotGuard` is used to clean the slot if we failed
to create/open the region.

* feat: Add a blank method `RegionImpl::open`

* feat: Remove MetadataId from Manifest

Now metadata id of manifest is unused, also unnecessary as we have
manifest dir to build the manifest, but constructing the manifest
still needs a passing region id as argument, which is unavailable
during opening region. So we remove the metadata id from manifest so
`region_store_config()` don't need region id as input anymore

* feat: Remove region id from logstore::Namespace and Wal

This is necessary for implementing open, since we don't have region
id this time, but we need to build Wal and its logstore namespace. Now
this is ok as id is not actually used by logstore.

* feat: Setup `open_region` code skeleton
2022-07-29 17:52:33 +08:00
Ning Sun
62cb649389 refactor: use derive_builder for boilerplate builders (#116)
* refactor: remove boilerplate builder code with derive_builder macro

* refactor: better build creation using Default::default()

* refactor: resolve api change issues in benchmark code

* refactor: address some review issues

* refactor: address clippy issues

* chore: doc and todo update

* refactor: add builder for RegionDescriptor
2022-07-29 14:31:12 +08:00
evenyag
03e965954a feat: implement read framework (#108)
* feat: implement read framework

feat: chunk reader builder

refactor: rename BatchIteratorPtr to BoxedBatchIterator

feat: BatchReader to read batch from ssts

feat: Add a ConcatReader to concat sst readers

test: Add tests for concat reader

chore: Fix clippy

* feat: implement SST parquet reader (#109)

* feat: implement parquet sst reader

* chores: fix some CR comments

* gst

* fix sst writer flush issue

* feat: Implement FsAccessLayer::read_sst

* fix: remove lifetime from ChunkStream

* refactor: Store file name in FileMeta

- Store file name instead of path (`region-name/file-name`) in FileMeta.
- `AccessLayer::read()` takes file name instead of path, so the read/write api are consistent

Co-authored-by: Lei, Huang <6406592+v0y4g3r@users.noreply.github.com>
Co-authored-by: Lei, HUANG <mrsatangel@gmail.com>
2022-07-28 11:46:51 +08:00
Ning Sun
f81dfc9bed feat: add fmt::Debug for RegionImpl 2022-07-27 15:04:51 +08:00
evenyag
c9db093af7 feat: Cherry picks lost commits of flush (#111)
* fix: Fix write stall blocks flush applying version

refactor: Use store config to help constructing Region

chore: Address CR comments

* feat: adds manifest protocol supporting and refactor region metadata protocol

feat: ignore sqlparser log

refactor: PREV_VERSION_KEY constant

refactor: minor change for checking readable/writable

fix: address CR problems

refactor: use binary literal

Co-authored-by: Dennis Zhuang <killme2008@gmail.com>
2022-07-26 15:52:39 +08:00
evenyag
bf5975ca3e feat: Prototype of the storage engine (#107)
* feat: memtable flush (#63)

* wip: memtable flush

* optimize schema conversion

* remove unnecessary import

* add parquet file verfication

* add backtrace to error

* chore: upgrade opendal to 0.9 and fixed some problems

* rename error

* fix: error description

Co-authored-by: Dennis Zhuang <killme2008@gmail.com>

* feat: region manifest service (#57)

* feat: adds Manifest API

* feat: impl region manifest service

* refactor: by CR comments

* fix: storage error mod test

* fix: tweak storage cargo

* fix: tweak storage cargo

* refactor: by CR comments

* refactor: rename current_version

* feat: add wal writer (#60)

* feat: add Wal

* upgrade engine for wal

* fix: unit test for wal

* feat: wal into region

* fix: unix test

* fix clippy

* chore: by cr

* chore: by cr

* chore: prevent test data polution

* chore: by cr

* minor fix

* chore: by cr

* feat: Implement flush (#65)

* feat: Flush framework

- feat: Add id to memtable
- refactor: Rename MemtableSet/MutableMemtables to MemtableVersion/MemtableSet
- feat: Freeze memtable
- feat: Trigger flush
- feat: Background job pool
- feat: flush job
- feat: Sst access layer
- feat: Custom Deserialize for StringBytes
- feat: Use RegionWriter to apply file metas
- feat: Apply version edit
- chore: Remove unused imports

refactor: Use ParquetWriter to replace FlushTask

refactor: FsAccessLayer takes object store as param

chore: Remove todo from doc comments

feat: Move wal to WriterContext

chore: Fix clippy

chore: Add backtrace to WriteWal error

* feat: adds manifest to region and refactor sst/manifest dir config (#72)

* feat: adds manifest to region and refactor sst/manifest dir with EngineConfig

* refactor: ensure path ends with '/' in ManifestLogStorage

* fix: style

* refactor: normalize storage directory path and minor changes by CR

* refactor: doesn't need slash any more

* feat: Implement apply_edit() and add timestamp index to schema (#73)

* feat: Implement VersionControl::apply_edit()

* feat: Add timestamp index to schema

* feat: Implement Schema::timestamp_column()

* feat: persist region metadata to manifest (#74)

* feat: persist metadata when creating region or sst files

* fix: revert FileMeta comment

* feat: resolve todo

* fix: clippy warning

* fix: revert files_to_remove type in RegionEdit

* feat: impl SizeBasedStrategy for flush (#76)

* feat: impl SizeBasedStrategy for flush

* doc: get_mutable_limitation

* fix: code style and comment

* feat: align timestamp (#75)

* feat: align timestamps in write batch

* fix cr comments

* fix timestamp overflow

* simplify overflow check

* fix cr comments

* fix clippy issues

* test: Fix region tests (comment out some unsupported tests) (#82)

* feat: flush job (#80)

* feat: flush job

* fix cr comments

* move file name instead of clone

* comment log file test (#84)

* feat: improve MemtableVersion (#78)

* feat: improve MemtableVersion

* feat: remove flushed immutable memtables and test MemtableVersion

* refactor: by CR comments

* refactor: clone kv in iterator

* fix: clippy warning

* refactor: Make BatchIterator supertrait of Iterator (#85)

* refactor: rename Version to ManifestVersion and move out manifest from ShareData (#83)

* feat: Insert multiple memtables by time range (#77)

* feat: memtable::Inserter supports insert multiple memtables by time range

* chore: Update timestamp comment

* test: Add tests for Inserter

* test: Fix region tests (comment out some unsupported tests)

* refactor: align_timestamp() use TimestampMillis::aligned_by_bucket()

* chore: rename aligned_by_bucket to align_by_bucket

* fix: Fix compile errors

* fix: sst and manifest dir (#86)

* Set RowKeyDescriptor::enable_version_column to false by default

* feat: Implement write stall (#90)

* feat: Implement write stall

* chore: Update comments

* feat: Support reading multiple memtables (#93)

* feat: Support reading multiple memtables

* test: uncomment tests rely on snapshot read

* feat: wal format (#70)

* feat: wal codec

* chore: minor fix

* chore: comment

* chore: by cr

* chore: write_batch_codec mod

* chore: by cr

* chore: upgrade proto

* chore: by cr

* fix failing test

* fix failing test

* feat: manifest to wal (#100)

* feat: write manifest to wal

* chore: sequence into wal

* chore: by cr

* chore: by cr

* refactor: create log store (#104)

Co-authored-by: dennis zhuang <killme2008@gmail.com>
Co-authored-by: Lei, Huang <6406592+v0y4g3r@users.noreply.github.com>
Co-authored-by: fariygirl <clickmetoday@163.com>
Co-authored-by: Jiachun Feng <jiachun_feng@proton.me>
Co-authored-by: Lei, HUANG <mrsatangel@gmail.com>

* chore: Fix clippy

Co-authored-by: Lei, Huang <6406592+v0y4g3r@users.noreply.github.com>
Co-authored-by: Dennis Zhuang <killme2008@gmail.com>
Co-authored-by: Jiachun Feng <jiachun_feng@proton.me>
Co-authored-by: fariygirl <clickmetoday@163.com>
Co-authored-by: Lei, HUANG <mrsatangel@gmail.com>
2022-07-25 15:26:00 +08:00
Lei, Huang
008f62afc1 feat: buffer abstraction (#51)
* feat: add buffer abstraction and rewrite entry encode/decode process

* add some tests

* remove pad.rst

* fix some comments

* fix comments

* remove mmap mod

* feat: Bytes type implementation switch to bytes::Bytes

* fix: use Bytes::from(String) and Bytes::from(Vec<u8>)

* feat: add new method to Entry trait
2022-07-04 14:08:23 +08:00
evenyag
056185eb24 feat(storage): Implement snapshot scan for region (#46)
* feat: Maintain last sequence in VersionControl

* refactor(recordbatch): Replace `Arc<Schema>` by SchemaRef

* feat: Memtable support filter rows with invisible sequence

* feat: snapshot wip

* feat: Implement scan for SnapshotImpl

* test: Add a test that simply puts and scans a region

* chore: Fix clippy

* fix(memtable): Fix memtable returning duplicate keys

* test(memtable): Add sequence visibility test

* test: Add ValueType test

* chore: Address cr comments

* fix: Fix value is not storing but adding to committed sequence
2022-06-20 14:09:31 +08:00
dennis zhuang
e78c015fc0 TableEngine and SqlHandler impl (#45)
* Impl TableEngine, bridge to storage

* Impl sql handler to process insert sql

* fix: minor changes and typo

* test: add datanode test

* test: add table-engine test

* fix: code style

* refactor: split out insert mod from sql and minor changes by CR

* refactor: replace with_context with context
2022-06-17 11:36:49 +08:00
Lei, Huang
e03ac2fc2b Implement log store append and file set management (#43)
* add log store impl

* add some test

* delete failing test

* fix: concurrent close issue

* feat: use arcswap to replace unsafe AtomicPtr

* fix: use lock to protect rolling procedure.
fix: use try_recv to replace poll_recv on appender task.

* chores: 1. use direct tmp dir instead of creating TempDir instance; 2. inline some short function; 3. rename some structs; 4. optimize namespace to arc wrapper inner struct.
2022-06-16 19:09:09 +08:00
evenyag
7a55d988fb test: Add simple write/iter test for memtable 2022-06-10 11:53:27 +08:00
evenyag
727bdb8b86 feat: Add sequences and value_types to Batch 2022-06-09 17:30:17 +08:00