mirror of
https://github.com/GreptimeTeam/greptimedb.git
synced 2026-01-13 16:52:56 +00:00
* refactor: Merge RowKeyMetadata into ColumnsMetadata Now RowKeyMetadata and ColumnsMetadata are almost always being used together, no need to separate them into two structs. Now they are combined into the single ColumnsMetadata struct. chore: Make some fields of metadata private feat: Replace schema in RegionMetadata by RegionSchema The internal schema of a region should have the knownledge about all internal columns that are reserved and used by the storage engine, such as sequence, value type. So we introduce the `RegionSchema`, and it would holds a `SchemaRef` that only contains the columns that user could see. feat: Value derives Serialize and supports converting into json value feat: Add version to schema The schema version has an initial value 0 and would bump each time the schema being altered. feat: Adds internal columns to region metadata Introduce the concept of reserved columns and internal columns. Reserved columns are columns that their names, ids are reserved by the storage engine, and could not be used by the user. Reserved columns usually have special usage. Reserved columns expect the version columns are also called internal columns (though the version could also be thought as a special kind of internal column), are not visible to user, such as our internal sequence, value_type columns. The RegionMetadataBuilder always push internal columns used by the engine to the columns in metadata. Internal columns are all stored behind all user columns in the columns vector. To avoid column id collision, the id reserved for columns has the most significant bit set to 1. And the RegionMetadataBuilder would check the uniqueness of the column id. chore: Rebase develop and fix compile error feat: add internal schema to region schema feat: Add SchemaBuilder to build Schema feat: Store row key end in region schema metadata Also move the arrow schema construction to region::schema mod feat: Add SstSchema refactor: Replace MemtableSchema by RegionSchema Now when writing sst files, we could use the arrow schema from our sst schema, which contains the internal columns. feat: Use SstSchema to read parquet Adds user_column_end to metadata. When reading parquet file, converts the arrow schema into SstSchema, then uses the row_key_end and user_column_end to find out row key parts, value parts and internal columns, instead of using the timestamp index, which may yields incorrect index if we don't put the timestamp at the end of row key. Move conversion from Batch to arrow Chunk to SstSchema, so SST mod doesn't need to care the order of key, value and internal columns. test: Add test for Value to serde_json::Value feat: Add RawRegionMetadata to persist RegionMetadata test: Add test to RegionSchema fix: Fix clippy To fix clippy::enum_clike_unportable_variant lint, define the column id offset in ReservedColumnType and compute the final column id in ReservedColumnId's const method refactor: Move batch/chunk conversion to SstSchema The parquet ChunkStream now holds the SstSchema and use its method to convert Chunk into Batch. chore: Address CR comment Also add a test for pushing internal column to RegionMetadataBuilder chore: Address CR comment chore: Use bitwise or to compute column id * chore: Address CR comment