* feat: save create table schema and respect user defined columns order when querying, close#179
* fix: address CR problems
* refactor: use with_context with ProjectedColumnNotFoundSnafu
* fix: ListVector::get returns Null if index is invalid
* feat: Implement eq for vector
* feat: Derive PartialEq for Batch
Simplify some test codes in schema mod
* refactor: Use macro to simplify vector equality check
* feat: Add projected schema
* feat: Use projected schema to read sst
* feat: Use vector of column to implement Batch
* feat: Use projected schema to convert batch to chunk
* feat: Add no_projection() to build ProjectedSchema
* feat: Memtable supports projection
The btree memtable use `is_needed()` to filter unneeded value columns,
then use `ProjectedSchema::batch_from_parts()` to construct
batch, so it don't need to known the layout of internal columns.
* test: Add tests for ProjectedSchema
* test: Add tests for ProjectedSchema
Also returns error if the `projected_columns` used to build the
`ProjectedSchema` is empty.
* test: Add test for memtable projection
* feat: Table pass projection to storage engine
* fix: Use timestamp column name as schema metadata
This fix the issue that the metadata refer to the wrong timestamp column
if datafusion reorder the fields of the arrow schema.
* fix: Fix projected schema not passed to memtable
* feat: Add tests for region projection
* chore: fix clippy
* test: Add test for unordered projection
* chore: Move projected_schema to ReadOptions
Also fix some typo
* feat: implement DateTime type
* add some tests
* Update src/common/time/src/datetime.rs
Co-authored-by: Ning Sun <sunng@protonmail.com>
* Update src/common/time/src/datetime.rs
Co-authored-by: Ning Sun <sunng@protonmail.com>
* wip: add Date type and value
* fix some cr comments
* impl Date values
* finish date type
* optimize Date value serialization
* add some tests
* fix some cr comments
* add some more test
Use `error!(e; xxx)` pattern so we could get backtrace in error log.
Also use BoxedError as error source of ExecuteQuery instead of String,
so we could carry backtrace and other info in it.
* feat: add `BorrowedValue` and DF Array access by index
This `BorrowedValue` can hold from datafusion arrow without copy.
`arrow_array_access` provides an index access to Arrow array and it holds the
result with our `BorrowedValue`. So we don't have to copy string/binary when
converting to `Value`.
* refactor: use borrowed types and iterator for recordbatch access
* fix: return Null with early check
* fix: i64 type error addressed by unit test
* refactor: give arrow_array_access a better name
* refactor: removed borrowed value and use value for now
* refactor: make iterator to return result of vec
* refactor: lift recordbatch iterator into common module
* fix: address clippy warnings
- Don't run github actions on draft pull requests
- Now the title checker won't be affected, seems due to it was triggered by pull_request_target, not pull_request event
This also fixes the dead code warning of `create_test_table()` as the
files under `datanode/tests` are considered as individual libs. Moves
them to src dir makes sharing codes much easier.
* refactor: Merge RowKeyMetadata into ColumnsMetadata
Now RowKeyMetadata and ColumnsMetadata are almost always being used together, no need
to separate them into two structs. Now they are combined into the single
ColumnsMetadata struct.
chore: Make some fields of metadata private
feat: Replace schema in RegionMetadata by RegionSchema
The internal schema of a region should have the knownledge about all
internal columns that are reserved and used by the storage engine, such as
sequence, value type. So we introduce the `RegionSchema`, and it would
holds a `SchemaRef` that only contains the columns that user could see.
feat: Value derives Serialize and supports converting into json value
feat: Add version to schema
The schema version has an initial value 0 and would bump each time the
schema being altered.
feat: Adds internal columns to region metadata
Introduce the concept of reserved columns and internal columns.
Reserved columns are columns that their names, ids are reserved by the storage
engine, and could not be used by the user. Reserved columns usually have
special usage. Reserved columns expect the version columns are also
called internal columns (though the version could also be thought as a
special kind of internal column), are not visible to user, such as our
internal sequence, value_type columns.
The RegionMetadataBuilder always push internal columns used by the
engine to the columns in metadata. Internal columns are all stored
behind all user columns in the columns vector.
To avoid column id collision, the id reserved for columns has the most
significant bit set to 1. And the RegionMetadataBuilder would check the
uniqueness of the column id.
chore: Rebase develop and fix compile error
feat: add internal schema to region schema
feat: Add SchemaBuilder to build Schema
feat: Store row key end in region schema metadata
Also move the arrow schema construction to region::schema mod
feat: Add SstSchema
refactor: Replace MemtableSchema by RegionSchema
Now when writing sst files, we could use the arrow schema from our sst
schema, which contains the internal columns.
feat: Use SstSchema to read parquet
Adds user_column_end to metadata. When reading parquet file,
converts the arrow schema into SstSchema, then uses the row_key_end
and user_column_end to find out row key parts, value parts and internal
columns, instead of using the timestamp index, which may yields
incorrect index if we don't put the timestamp at the end of row key.
Move conversion from Batch to arrow Chunk to SstSchema, so SST mod doesn't
need to care the order of key, value and internal columns.
test: Add test for Value to serde_json::Value
feat: Add RawRegionMetadata to persist RegionMetadata
test: Add test to RegionSchema
fix: Fix clippy
To fix clippy::enum_clike_unportable_variant lint, define the column id
offset in ReservedColumnType and compute the final column id in
ReservedColumnId's const method
refactor: Move batch/chunk conversion to SstSchema
The parquet ChunkStream now holds the SstSchema and use its method to
convert Chunk into Batch.
chore: Address CR comment
Also add a test for pushing internal column to RegionMetadataBuilder
chore: Address CR comment
chore: Use bitwise or to compute column id
* chore: Address CR comment
* address PR comments
address PR comments
use 3306 for mysql server's default port
upgrade metric to version 0.20
move crate "servers" out of "common"
make mysql io threads count configurable in config file
add snafu backtrace for errors with source
use common-server error for mysql server
add test for grpc server
refactor testing codes
fix rustfmt check
start mysql server in datanode
move grpc server codes from datanode to common-servers
feat: unify servers
* rebase develop and resolve conflicts
* remove an unnecessary todo
Co-authored-by: luofucong <luofucong@greptime.com>
* fix: Rename current_timestamp to current_time_millis, fix resolution
Fix current_timestamp returns seconds resolution, also add a test for
this method
* chore: Use slice of array instead of Vec
Save some heap allocations
* test: Compare std and chrono timestamp
The original test always success even the current_time_millis returns in
seconds resolution
* chore: Store current time in gmt_created/gmt_modified
* catalog manager allocates table id
* rebase develop
* add some tests
* add some more test
* fix some cr comments
* insert into system catalog
* use slice pattern to simplify code
* add optional dependencies
* add sql-to-request test
* successfully recover
* fix unit tests
* rebase develop
* add some tests
* fix some cr comments
* fix some cr comments
* add a lock to CatalogManager
* feat: add gmt_created and gmt_modified columns to system catalog table
* SelectExpr: change to oneof expr
* Convert between Vec<u8> and SelectResult
* Chore: use encode_to_vec and decode, instead of encode_length_delimited_to_vec and decode_length_delimited
* Chore: move bitset into separate file
* Grpc select impl
* feat: impl TableManifest and refactor table engine, object store etc.
* feat: persist table metadata when creating it
* fix: remove unused file src/storage/src/manifest/impl.rs
* feat: impl recover table info from manifest
* test: add open table test and table manifest test
* fix: resolve CR problems
* fix: compile error and remove region id
* doc: describe parent_dir
* fix: address CR problems
* fix: typo
* Revert "fix: compile error and remove region id"
This reverts commit c14c250f8a.
* fix: compile error and generate region id by table_id and region number
Implement catalog manager that provides a vision of all existing tables while instance start. Current implementation is based on local table engine, all catalog info is stored in an system catalog table.
* add pwrite
* write
* fix write
* error handling in write thread
* wrap some LogFile field to state field
* remove some unwraps
* reStructure some code
* implement file chunk
* composite chunk decode
* add test for chunk stream
* fix buffer test
* remove some useless code
* add test for read_at and file_chunk_stream
* use bounded channel to implement back pressure
* reimplement entry read and decoding
* add some doc
* clean some code
* use Sender::blocking_send to replace manually spawn
* support synchronous file chunk stream
* remove useless clone
* remove set_offset from Entry trait
* cr: fix some comments
* fix: add peek methods for Buffer
* add test for read at the middle of file
* fix some minor issues on comments
* rebase on to develop
* add peek_to_slice and read_to_slice
* initialize file chunk on heap
* fix some comments in CR
* respect entry id set outside LogStore
* fix unit test
* Update src/log-store/src/fs/file.rs
Co-authored-by: evenyag <realevenyag@gmail.com>
* fix some cr comments
Co-authored-by: evenyag <realevenyag@gmail.com>
* feat: protobuf codec
* chore: minor fix
* chore: beatify the macro code
* chore: minor fix
* chore: by cr
* chore: by cr and impl wal with proto
* bugfix: invalid num_rows for multi put_data in mutations
Co-authored-by: jiachun <jiachun_fjc@163.com>
* refactor(storage): Add region id and name to metadata
Add region id and name to `RegionMetadata`, simplify input arguments of
`RegionImpl::create()` and `RegionImpl::new()` method, since id and name
are already in metadata/version.
To avoid an atomic load of `Version` each time we access the region
id/name, we still store a copy of id/name in `SharedData`.
* chore: Remove todo in OpenOptions
Create region if missing when opening the region would be hard to
implement, since sometimes we may don't known the exact region schema user
would like to have.
* refactor: Make id and name of region readonly
By making `id` and `name` fields of `SharedData` and `RegionMetadata`
private and only exposing a pub getter.
* feat: memtable backed by DataFusion to ease testing
* move test utility codes out of src folder
* Implement our own MemTable because DataFusion's MemTable does not support limit; and replace the original testing numbers table.
* fix: address PR comments
* fix: "testutil" -> "test-util"
* roll back "NumbersTable"
Co-authored-by: luofucong <luofucong@greptime.com>
* feat: Add `open_table()` method to `TableEngine`
* feat: Implements MitoEngine::open_table()
For simplicity, this implementation just use the table name as region
name, and using that name to open a region for that table. It also
introduce a mutex to avoid opening the same table simultaneously.
* refactor: Shorten generic param name
Use `S` instead of `Store` for `MitoEngine`.
* test: Mock storage engine for table engine test
Add a `MockEngine` to mock the storage engine, so that testing the mito
table engine can sometimes use the mocked storage.
* test: Add open table test
Also remove `storage::gen_region_name` method, and always use table name
as default region name, so the table engine can open the table created
by `create_table()`.
* chore: Add open table log
* feat: Implements RegionWriter::replay()
Refactors `preprocess_write()`, wraps time ranges calculation and
memtable creation to `prepare_memtables()` so these logic can be reused
by `WriterInner::replay()`. Then implements `WriterInner::replay()`
which reads write batch from wal and inserts it into memtables.
* feat: Use sequence in request as committed sequence
Also checks that sequence should increase monotonically and returns
error if found sequence decreases
* chore: Remove OpenOptions param from RegionWriter::replay
* test: Add region reopen tests
refactor(storage): Rename read_write test mod to basic
refactor(storage): Move common region test logic to TesterBase
Let read/write Tester and flush Tester share the same TesterBase struct,
which implements common operations like put/full_scan.
* feat: Constructs RegionImpl in open()
Constructs RegionImpl after replay in `RegionImpl::open()`
* feat: Adds RegionImpl::create()
Adds `RegionImpl::create()` method to persist region metadata to
manifest, then create the RegionImpl instance, so the storage engine
just invoke `RegionImpl::create()` instead of `RegionImpl::new()` to
create the region instance, and don't need to update manifest after
creating region instance anymore. Now `RegionImpl::new()` need to takes
version instead of metadata as input.
This change is also a necessary part to pass the region open test, since
to open a region, need to persist something to manifest first.
* feat: Pass region open test
Use LocalFileLogStore for region test since NoopLogStore won't persist
data to the file system.
Create dir in `LocalFileLogStore::open` if it is not exist, so we don't
need to create the dir before using the logstore.
To pass the test, we always recover from flushed_sequence and use
`req_sequence + 1` as last sequence.
* test: Test reopen region multiple times
* chore: Address CR comments
Add more info to replay log and add an assert to check committed
sequence after reopen.
* refactor: Add cfg(test) to Version::new()
Remove `VersionControl::new()`, and add `#[cfg(test)]` to
`Version::new()` as it is only used by tests.
* feat: impl recovering version from manifest for region
* refactor: rename try_apply_edit to replay_edit
* fix: remove println
* fix: address CR problems
* feat: remove Metadata in manifest trait and update region manifest state after recovering