* fix: dedup should not mark element as unneeded
It should only mark element as selected, because some column of
different rows may have same value.
* refactor: Rename dedup to find_unique
As the original `dedup` method only mark bitmap to true when it finds
the element is unique, so `find_unique` is more appropriate for its
name.
* test: Renew bitmap in test_batch_find_unique
* chore: Update comments
* refactor: dependency, from frontend depends on datanode to datanode depends on frontend
* wip: start frontend in datanode
* wip: migrate create database to frontend
* wip: impl alter table
* fix: CR comments
* feat: move time index metadata from schema into field
* chore: remove useless code
* test: test select with column alias
* fix: conflicts with develop branch
* test: add test
* test: order by timestamp to ensure query results order
* fix: comment
* feat: supports list array in arrow_array_get
* feat: supports string and list type conversions in python coprocessor
* test: add test cases for returning list in coprocessor
* fix: Fix int64 type not considered in DEFAULT CURRENT_TIMESTAMP() constraint
Also avoid using `ConstantVector` in default constraint, as other user
may try to downcast it to a concrete type, and sometimes may forget to
check whether it is a constant vector.
* test: Add test for writing default value
* refactor: Serialize Schema/TableMeta/TableInfo to raw structs
* test: Add tests for raw struct conversion
* style: Fix clippy
* refactor: SchemaBuilder::timestamp_index takes Option<usize>
So caller could chain the timestamp_index method call where there is no
timestamp index.
* style(datatypes): Chains SchemaBuilder method calls
* feat: align_bucket support i64 and timestamp values
* feat: add Int64 to timestamp
* feat: support query i64 timestamp vector
* test: fix failling tests
* refactor: simplify some code
* fix: CR comments and add insert and query test for i64 timestamp column
* chore: Update StoreSchema comment
* feat: Add metadata to ColumnSchema
* feat: Impl conversion between ColumnMetadata and ColumnSchema
We could use this feature to store the ColumnMetadata as arrow's
Schema, since the ColumnSchema could be further converted to an arrow
schema. Then we could use ColumnMetadata in StoreSchema, which contains
more information, especially the column id.
* feat(storage): Merge schema::Error to metadata::Error
To avoid cyclic dependency of two Errors
* feat(storage): Store ColumnMetadata in StoreSchema
* feat(storage): Use StoreSchemaRef to avoid cloning the whole StoreSchema struct
* test(storage): Fix test_store_schema
* feat(datatypes): Return error on duplicate meta key
* chore: Address CR comments
* feat: Adds ColumnDefaultConstraint::create_default_vector
ColumnDefaultConstraint::create_default_vector is ported from
MitoTable::try_get_column_default_constraint_vector.
* refactor: Replace try_get_column_default_constraint_vector by create_default_vector
* style: Remove unnecessary map_err in MitoTable::insert
* feat: Adds compat_write
For column in `dest_schema` but not in `write_batch`, this method would insert a
vector with default value to the `write_batch`. If there are columns not in
`dest_schema`, an error would be returned.
* chore: Add info log to RegionInner::alter
* feat(storage): RegionImpl::write support request with old version
* feat: Add nullable check when creating default value
* feat: Validate nullable and default value
* chore: Modify PutOperation comments
* chore: Make ColumnDescriptor::is_nullable readonly and validate name
* feat: Use CompatWrite trait to replace campat::compat_write method
Adds a CompactWrite trait to support padding columns to WriteBatch:
- The WriteBatch and PutData implements this trait
- Fix the issue that WriteBatch::schema is not updated to the
schema after compat
- Also validate the created column when adding to PutData
The WriteBatch would also pad default value to missing columns in
PutData, so the memtable inserter don't need to manually check whether
the column is nullable and then insert a NullVector. All WriteBatch is
ensured to have all columns defined by the schema in its PutData.
* feat: Validate constraint by ColumnDefaultConstraint::validate()
The ColumnDefaultConstraint::validate() would also ensure the default
value has the same data type as the column's.
* feat: Use NullVector for null columns
* fix: Fix BinaryType returns wrong logical_type_id
* fix: Fix tests and revert NullVector for null columns
NullVector doesn't support custom logical type make it hard to
encode/decode, which also cause the arrow/protobuf codec of write batch
fail.
* fix: create_default_vector use replicate to create vector with default value
This would fix the test_codec_with_none_column_protobuf test, as we need
to downcast the vector to construct the protobuf values.
* test: add tests for column default constraints
* test: Add tests for CompatWrite trait impl
* test: Test write region with old schema
* fix(storage): Fix replay() applies metadata too early
The committed sequence of the RegionChange action is the sequence of the
last entry that use the old metadata (schema). During replay, we should
apply the new metadata after we see an entry that has sequence greater
than (not equals to) the `RegionChange::committed_sequence`
Also remove duplicate `set_committed_sequence()` call in
persist_manifest_version()
* chore: Removes some unreachable codes
Also add more comments to document codes in these files
* refactor: Refactor MitoTable::insert
Return error if we could not create a default vector for given column,
instead of ignoring the error
* chore: Fix incorrect comments
* chore: Fix typo in error message
* feat: add type conversion optimizer
* feat: add expr rewrite logical plan optimizer
* chore: add some doc
* fix: unit test
* fix: time zone issue in unit tests
* chore: add more tests
* fix: some CR comments
* chore: rebase develop
* chore: fix unit tests
* fix: unit test use timestamp with time zone
* chore: add more tests
* feat: Handle empty NullVector in replicate_null
* chore: Rename ChunkReaderImpl::sst_reader to batch_reader
* feat: dedup reader wip
* feat: Add BatchOp
Add BatchOp to support dedup/filter Batch and implement BatchOp for
ProjectedSchema.
Moves compare_row_of_batch to BatchOp::compare_row.
* feat: Allow Batch has empty columns
* feat: Implement DedupReader
Also add From<MutableBitmap> for BooleanVector
* test: Test dedup reader
Fix issue that compare_row compare by full key not row key
* chore: Add comments to BatchOp
* feat: Dedup results from merge reader
* test: Test merge read after flush
* test: Test merge read after flush and reopen
* test: Test replicate empty NullVector
* test: Add tests for `ProjectedSchema::dedup/filter`
* feat: Filter empty batch in DedepReader
Also fix clippy warnings and refactor some codes
* feat: Dedup vector
* refactor: Re-export Date/DateTime/Timestamp
* refactor: Named field for ListValueRef::Ref
Use field val instead of tuple for variant ListValueRef::Ref to keep
consistence with ListValueRef::Indexed
* feat: Implement ScalarVector for ListVector
Also implements ScalarVectorBuilder for ListVectorBuilder, Scalar for
ListValue and ScalarRef for ListValueRef
* test: Add tests for ScalarVector implementation of ListVector
* feat: Implement dedup using match_scalar_vector
* refactor: Move dedup func to individual mod
* chore: Update ListValueRef comments
* refactor: Move replicate to VectorOp
Move compute operations to VectorOp trait and acts as an super trait of
Vector. So we could later put dedup/filter methods to VectorOp trait,
avoid to define too many methods in Vector trait.
* refactor: Move scalar bounds to PrimitiveElement
Move Scalar and ScalarRef trait bounds to PrimitiveElement, so for each
native type which implements PrimitiveElement, its PrimitiveVector
always implements ScalarVector, so we could use it as ScalarVector
without adding additional trait bounds
* refactor: Move dedup to VectorOp
Remove compute mod and move dedup logic to operations::dedup
* feat: Implement VectorOp::filter
* test: Move replicate test of primitive to replicate.rs
* test: Add more replicate tests
* test: Add tests for dedup and filter
Also fix NullVector::dedup and ConstantVector::dedup
* style: fix clippy
* chore: Remove unused scalar.rs
* test: Add more tests for VectorOp and fix failed tests
Also fix TimestampVector eq not implemented.
* chore: Address CR comments
* chore: mention vector should be sorted in comment
* refactor: slice the vector directly in replicate_primitive_with_type
* feat: initial commit of postgres protocol adapter
* initial commit of postgres server
* feat: use common_io runtime and correct testcase
* fix previous tests
* feat: adopt pgwire api changes and add support for text encoded data
* feat: initial integration with datanode
* test: add feature flag to test
* fix: resolve lint warnings
* feat: add postgres feature flags for datanode
* feat: add support for newly introduced timestamp type
* feat: adopt latest datanode changes
* fix: address clippy warning for flattern scenario
* fix: make clippy great again
* fix: address issues found in review
* chore: sort dependencies by name
* feat: adopt new Output api
* fix: return error on unsupported data types
* refactor: extract common code dealing with record batches
* fix: resolve clippy warnings
* test: adds some unit tests postgres handler
* test: correct test for cargo update
* fix: update query module name
* test: add assertion for error content
* fix: forbid use int64 as timestamp column data type
* fix unit test
* fix unit tests
* change gmt_created and gmt_modified data type in system tables to timestamp
* also change data type in readme
* feat: frontend instance
* no need to carry column length in `Column` proto
* add more tests
* rebase develop
* create a new variant with already provisioned RecordBatches in Output
* resolve code review comments
* new frontend instance does not connect datanode grpc
* add more tests
* add more tests
* rebase develop
Co-authored-by: luofucong <luofucong@greptime.com>
LogicalTypeId to ConcreteDataType is only allowed in tests, since some
additional info is not stored in LogicalTypeId now. It is just an id, or
kind, not contains full type info.
* wip: impl timestamp data type
* add timestamp vectors
* adapt to recent changes to vector module
* fix all unit test
* rebase develop
* fix slice
* change default time unit to millisecond
* add more tests
* fix some CR comments
* fix some CR comments
* fix clippy
* fix some cr comments
* fix some CR comments
* fix some CR comments
* remove time unit in LogicalTypeId::Timestamp
* feat: Impl cmp_element() for Vector
* chore: Add doc comments to MutableVector
* feat: Add create_mutable() to DataType
Add `create_mutable()` to create a MutableVector for each DataType.
Implement ListVectorBuilder and NullVectorBuilder for ListType and
NullType.
* feat: Add ValueRef
ValueRef is a reference to value, could be used to avoid some allocation
when getting data from Vector. To support ValueRef, also implement a
ListValueRef for ListValue, but comparision of ListValueRef still
requires some allocation, due to the complexity of ListValue and
ListVector.
Impl some From trait for ValueRef
* feat: Implement get_ref for Vector
* feat: Remove cmp_element from Vector
`cmp_element` could be replaced by `get_ref` and then compare
* feat: Implement push/extend for PrimitiveVectorBuilder
Implement push_value_ref() and extend_slice_of() for
PrimitiveVectorBuilder.
Also refactor the DataTypeBuilder trait for
primitive types to PrimitiveElement trait, adds necessary cast helper
methods to it.
- Cast a reference to Vector to reference arrow's primitive array
- Cast a ValueRef to primitive type
- Also make PrimitiveElement super trait of Primitive
* feat: Implement push/extend for all vector builders
Implement push_value_ref() and extend_slice_of() for remaining vector
builders. Add some helpful cast method to ValueRef and a method to
cast Value to ValueRef.
Change the behavior of PrimitiveElement::cast_xxx to panic when unable
to cast, since push_value_ref() and extend_slice_of() always panic
when given invalid input data type.
* feat: MutableVector returns error if data type unmatch
* test: Add tests for ValueRef
* feat: Add tests for Vector::get_ref
* feat: NullVector returns error if data type unmatch
* test: Add tests for vector builders
* fix: Fix compile error in python coprocessor
* refactor: Add lifetime param to IntoValueRef
The Primitive trait just use the `IntoValueRef<'static>` bound. Also
rename create_mutable to create_mutable_vector.
* chore: Address CR comments
* feat: Customize PartialOrd/Ord for Value/ValueRef
Panics if values/refs have different data type
* style: Fix clippy
* refactor: Use macro to generate body of ValueRef::as_xxx
* feat: upgrade rust to nightly-2022-07-14
* style: Fix some clippy warnings
* style: clippy fix
* style: fix clippy
* style: Fix clippy
Some PartialEq warnings have been work around using cfg_attr test
* feat: Implement Eq and PartialEq for PrimitiveType
* chore: Remove unnecessary allow
* chore: Remove usage of cfg_attr for PartialEq
* feat: save create table schema and respect user defined columns order when querying, close#179
* fix: address CR problems
* refactor: use with_context with ProjectedColumnNotFoundSnafu
* fix: ListVector::get returns Null if index is invalid
* feat: Implement eq for vector
* feat: Derive PartialEq for Batch
Simplify some test codes in schema mod
* refactor: Use macro to simplify vector equality check
* feat: Add projected schema
* feat: Use projected schema to read sst
* feat: Use vector of column to implement Batch
* feat: Use projected schema to convert batch to chunk
* feat: Add no_projection() to build ProjectedSchema
* feat: Memtable supports projection
The btree memtable use `is_needed()` to filter unneeded value columns,
then use `ProjectedSchema::batch_from_parts()` to construct
batch, so it don't need to known the layout of internal columns.
* test: Add tests for ProjectedSchema
* test: Add tests for ProjectedSchema
Also returns error if the `projected_columns` used to build the
`ProjectedSchema` is empty.
* test: Add test for memtable projection
* feat: Table pass projection to storage engine
* fix: Use timestamp column name as schema metadata
This fix the issue that the metadata refer to the wrong timestamp column
if datafusion reorder the fields of the arrow schema.
* fix: Fix projected schema not passed to memtable
* feat: Add tests for region projection
* chore: fix clippy
* test: Add test for unordered projection
* chore: Move projected_schema to ReadOptions
Also fix some typo
* feat: implement DateTime type
* add some tests
* Update src/common/time/src/datetime.rs
Co-authored-by: Ning Sun <sunng@protonmail.com>
* Update src/common/time/src/datetime.rs
Co-authored-by: Ning Sun <sunng@protonmail.com>
* wip: add Date type and value
* fix some cr comments
* impl Date values
* finish date type
* optimize Date value serialization
* add some tests
* fix some cr comments
* add some more test
* feat: add `BorrowedValue` and DF Array access by index
This `BorrowedValue` can hold from datafusion arrow without copy.
`arrow_array_access` provides an index access to Arrow array and it holds the
result with our `BorrowedValue`. So we don't have to copy string/binary when
converting to `Value`.
* refactor: use borrowed types and iterator for recordbatch access
* fix: return Null with early check
* fix: i64 type error addressed by unit test
* refactor: give arrow_array_access a better name
* refactor: removed borrowed value and use value for now
* refactor: make iterator to return result of vec
* refactor: lift recordbatch iterator into common module
* fix: address clippy warnings
* refactor: Merge RowKeyMetadata into ColumnsMetadata
Now RowKeyMetadata and ColumnsMetadata are almost always being used together, no need
to separate them into two structs. Now they are combined into the single
ColumnsMetadata struct.
chore: Make some fields of metadata private
feat: Replace schema in RegionMetadata by RegionSchema
The internal schema of a region should have the knownledge about all
internal columns that are reserved and used by the storage engine, such as
sequence, value type. So we introduce the `RegionSchema`, and it would
holds a `SchemaRef` that only contains the columns that user could see.
feat: Value derives Serialize and supports converting into json value
feat: Add version to schema
The schema version has an initial value 0 and would bump each time the
schema being altered.
feat: Adds internal columns to region metadata
Introduce the concept of reserved columns and internal columns.
Reserved columns are columns that their names, ids are reserved by the storage
engine, and could not be used by the user. Reserved columns usually have
special usage. Reserved columns expect the version columns are also
called internal columns (though the version could also be thought as a
special kind of internal column), are not visible to user, such as our
internal sequence, value_type columns.
The RegionMetadataBuilder always push internal columns used by the
engine to the columns in metadata. Internal columns are all stored
behind all user columns in the columns vector.
To avoid column id collision, the id reserved for columns has the most
significant bit set to 1. And the RegionMetadataBuilder would check the
uniqueness of the column id.
chore: Rebase develop and fix compile error
feat: add internal schema to region schema
feat: Add SchemaBuilder to build Schema
feat: Store row key end in region schema metadata
Also move the arrow schema construction to region::schema mod
feat: Add SstSchema
refactor: Replace MemtableSchema by RegionSchema
Now when writing sst files, we could use the arrow schema from our sst
schema, which contains the internal columns.
feat: Use SstSchema to read parquet
Adds user_column_end to metadata. When reading parquet file,
converts the arrow schema into SstSchema, then uses the row_key_end
and user_column_end to find out row key parts, value parts and internal
columns, instead of using the timestamp index, which may yields
incorrect index if we don't put the timestamp at the end of row key.
Move conversion from Batch to arrow Chunk to SstSchema, so SST mod doesn't
need to care the order of key, value and internal columns.
test: Add test for Value to serde_json::Value
feat: Add RawRegionMetadata to persist RegionMetadata
test: Add test to RegionSchema
fix: Fix clippy
To fix clippy::enum_clike_unportable_variant lint, define the column id
offset in ReservedColumnType and compute the final column id in
ReservedColumnId's const method
refactor: Move batch/chunk conversion to SstSchema
The parquet ChunkStream now holds the SstSchema and use its method to
convert Chunk into Batch.
chore: Address CR comment
Also add a test for pushing internal column to RegionMetadataBuilder
chore: Address CR comment
chore: Use bitwise or to compute column id
* chore: Address CR comment
* catalog manager allocates table id
* rebase develop
* add some tests
* add some more test
* fix some cr comments
* insert into system catalog
* use slice pattern to simplify code
* add optional dependencies
* add sql-to-request test
* successfully recover
* fix unit tests
* rebase develop
* add some tests
* fix some cr comments
* fix some cr comments
* add a lock to CatalogManager
* feat: add gmt_created and gmt_modified columns to system catalog table
* feat: protobuf codec
* chore: minor fix
* chore: beatify the macro code
* chore: minor fix
* chore: by cr
* chore: by cr and impl wal with proto
* bugfix: invalid num_rows for multi put_data in mutations
Co-authored-by: jiachun <jiachun_fjc@163.com>
* feat: UDAF implementation backed by DataFusion.
Directly Transplant DataFusion's UDAF related structs, traits and functions, like `AggregateUDF`, `Accumulator` or `create_udaf` etc.
Implement median UDAF on top of it and used in unit testing.
Refs: #61
* feat: UDAF made generically
Refs: #61
* fix: cargo fmt
* fix: use prelude
* fix: uniform the name
* fix: move maybe commonly used functions together
* fix: make comments more clear
* fix: resolve conversations in CR
* fix: store input types in AccumulatorCreator, and use ScalarVector's iterator
* feat: introducing List value and List datatype
* refactor: use ArcSwap instead of Mutext
* refactor: shorten some namings
* refactor: move median UDAF out of tests
* refactor: rename
* feat: aggregate function registry
* fix: make `Value` satisfy ordering again
* fix: clippy warnings
* doc: add "how to write aggregate function"
* fix: address PR comments
* fix: trying to get rid of unwraps
Co-authored-by: luofucong <luofucong@greptime.com>
* feat: add buffer abstraction and rewrite entry encode/decode process
* add some tests
* remove pad.rst
* fix some comments
* fix comments
* remove mmap mod
* feat: Bytes type implementation switch to bytes::Bytes
* fix: use Bytes::from(String) and Bytes::from(Vec<u8>)
* feat: add new method to Entry trait
* feat: improve try_into_vector function
* feat: impl memory_size function for vectors
* fix: forgot memory_size assertion in null vector test
* feat: use LargeUtf8 instead of utf8 for string, and rename LargeBianryArray to BinaryArray
* feat: memory_size only calculates heap size
* feat: impl scanning data from storage for MitoTable
* adds test mod to setup table engine test
* fix: comment error
* fix: boyan -> dennis in todo comments
* fix: remove necessary send in BatchIteratorPtr
* feat: Maintain last sequence in VersionControl
* refactor(recordbatch): Replace `Arc<Schema>` by SchemaRef
* feat: Memtable support filter rows with invisible sequence
* feat: snapshot wip
* feat: Implement scan for SnapshotImpl
* test: Add a test that simply puts and scans a region
* chore: Fix clippy
* fix(memtable): Fix memtable returning duplicate keys
* test(memtable): Add sequence visibility test
* test: Add ValueType test
* chore: Address cr comments
* fix: Fix value is not storing but adding to committed sequence
* Impl TableEngine, bridge to storage
* Impl sql handler to process insert sql
* fix: minor changes and typo
* test: add datanode test
* test: add table-engine test
* fix: code style
* refactor: split out insert mod from sql and minor changes by CR
* refactor: replace with_context with context