mirror of
https://github.com/GreptimeTeam/greptimedb.git
synced 2026-05-16 13:00:40 +00:00
* feat(metric-engine): support bulk inserts Implement `RegionRequest::BulkInserts` to support efficient columnar data ingestion in the metric engine. Key changes: - Implement `bulk_insert_region` to handle logical-to-physical region mapping and dispatch writes. - Add `batch_modifier` for `RecordBatch` transformations, specifically for `__tsid` generation and sparse primary key encoding. - Integrate `BulkInserts` into the `MetricEngine` request handling logic. - Provide a row-based fallback mechanism if the underlying storage doesn't support bulk writes. Signed-off-by: Lei, HUANG <mrsatangel@gmail.com> * feat/metric-engine-bulk-insert: ### Update `bulk_insert.rs` to Support Partition Expression Version - **Enhancements**: - Added support for `partition_expr_version` in `RegionBulkInsertsRequest` and `RegionPutRequest`. - Modified the handling of `partition_expr_version` to be dynamically set from the `request` object. Files affected: - `src/metric-engine/src/engine/bulk_insert.rs` Signed-off-by: Lei, HUANG <mrsatangel@gmail.com> * fix: cargo lock revert Signed-off-by: Lei, HUANG <mrsatangel@gmail.com> * add doc for conversions Signed-off-by: Lei, HUANG <mrsatangel@gmail.com> * chore: simplify test Signed-off-by: Lei, HUANG <mrsatangel@gmail.com> * feat/metric-engine-bulk-insert: ### Refactor `bulk_insert.rs` in `metric-engine` - **Refactor Functionality**: - Replaced `resolve_tag_columns` with `resolve_tag_columns_from_metadata` to streamline tag column resolution. - Moved logic for resolving tag columns directly into `resolve_tag_columns_from_metadata`, removing the need for an external function call. - **Enhancements**: - Improved error handling and context provision for missing physical regions and columns. - Optimized tag column sorting and index management within the batch processing logic. Signed-off-by: Lei, HUANG <mrsatangel@gmail.com> * feat/metric-engine-bulk-insert: ### Refactor `record_batch_to_rows` Function in `bulk_insert.rs` - Simplified the `record_batch_to_rows` function by removing the `logical_metadata` parameter and directly validating column types within the function. - Enhanced error handling for timestamp, value, and tag columns by checking their data types and providing detailed error messages. - Replaced the use of `Helper::try_into_vector` with direct downcasting to `TimestampMillisecondArray`, `Float64Array`, and `StringArray` for improved type safety and clarity. - Updated the construction of `api::v1::Rows` to directly handle null values and construct `api::v1::Value` objects accordingly. Signed-off-by: Lei, HUANG <mrsatangel@gmail.com> * feat/metric-engine-bulk-insert: ## Commit Message Refactor `bulk_insert.rs` to optimize state access - Moved the state read operation inside a new block to limit its scope and improve code clarity. - Adjusted logic for processing `tag_columns` and `non_tag_indices` to work within the new block structure. Signed-off-by: Lei, HUANG <mrsatangel@gmail.com> * feat/metric-engine-bulk-insert: ### Refactor `compute_tsid_array` Function - **Refactored `compute_tsid_array` function**: Modified the function signature to accept `tag_arrays` as a parameter instead of building it internally. This change affects the following files: - `src/metric-engine/src/batch_modifier.rs` - **Updated test cases**: Adjusted test cases to accommodate the new `compute_tsid_array` function signature by passing `tag_arrays` explicitly. Signed-off-by: Lei, HUANG <mrsatangel@gmail.com> * docs: add doc for bulk_insert_region Signed-off-by: Lei, HUANG <mrsatangel@gmail.com> * feat/metric-engine-bulk-insert: ### Commit Message Refactor `bulk_insert.rs` in `metric-engine`: - Removed error handling for unsupported status codes in `write_data` method. - Eliminated `record_batch_to_rows` function, simplifying the data insertion process. - Streamlined the `write_data` method by removing fallback logic for unsupported operations. Signed-off-by: Lei, HUANG <mrsatangel@gmail.com> * feat/metric-engine-bulk-insert: - **Optimize Primary Key Construction**: Refactored `modify_batch_sparse` in `batch_modifier.rs` to use `BinaryBuilder` for more efficient primary key construction. - **Add Fallback for Unsupported Bulk Inserts**: Updated `bulk_insert.rs` to handle unsupported bulk inserts by converting record batches to rows and using `RegionPutRequest`. - **Implement Record Batch to Rows Conversion**: Added `record_batch_to_rows` function in `bulk_insert.rs` to convert `RecordBatch` to `api::v1::Rows` for fallback operations. Signed-off-by: Lei, HUANG <mrsatangel@gmail.com> * feat/metric-engine-bulk-insert: Add test for handling null values in `record_batch_to_rows` - Added a new test `test_record_batch_to_rows_with_null_values` in `bulk_insert.rs` to verify the handling of null values in the `record_batch_to_rows` function. - The test checks the conversion of a `RecordBatch` with null values in various fields to ensure correct row creation and schema handling. Signed-off-by: Lei, HUANG <mrsatangel@gmail.com> * feat/metric-engine-bulk-insert: Add fallback path for unsupported status and improve error context handling - **`bulk_insert.rs`**: - Added a fallback path for `PartitionTreeMemtable` in case of unsupported status code. - Enhanced error handling by using `with_context` for better error messages when timestamp and value columns are not found in `RecordBatch`. Signed-off-by: Lei, HUANG <mrsatangel@gmail.com> --------- Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>