* feat(metric-engine): support bulk inserts
Implement `RegionRequest::BulkInserts` to support efficient columnar data
ingestion in the metric engine.
Key changes:
- Implement `bulk_insert_region` to handle logical-to-physical region mapping
and dispatch writes.
- Add `batch_modifier` for `RecordBatch` transformations, specifically for
`__tsid` generation and sparse primary key encoding.
- Integrate `BulkInserts` into the `MetricEngine` request handling logic.
- Provide a row-based fallback mechanism if the underlying storage doesn't
support bulk writes.
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* feat/metric-engine-bulk-insert:
### Update `bulk_insert.rs` to Support Partition Expression Version
- **Enhancements**:
- Added support for `partition_expr_version` in `RegionBulkInsertsRequest` and `RegionPutRequest`.
- Modified the handling of `partition_expr_version` to be dynamically set from the `request` object.
Files affected:
- `src/metric-engine/src/engine/bulk_insert.rs`
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* fix: cargo lock revert
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* add doc for conversions
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* chore: simplify test
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* feat/metric-engine-bulk-insert:
### Refactor `bulk_insert.rs` in `metric-engine`
- **Refactor Functionality**:
- Replaced `resolve_tag_columns` with `resolve_tag_columns_from_metadata` to streamline tag column resolution.
- Moved logic for resolving tag columns directly into `resolve_tag_columns_from_metadata`, removing the need for an external function call.
- **Enhancements**:
- Improved error handling and context provision for missing physical regions and columns.
- Optimized tag column sorting and index management within the batch processing logic.
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* feat/metric-engine-bulk-insert:
### Refactor `record_batch_to_rows` Function in `bulk_insert.rs`
- Simplified the `record_batch_to_rows` function by removing the `logical_metadata` parameter and directly validating column types within the function.
- Enhanced error handling for timestamp, value, and tag columns by checking their data types and providing detailed error messages.
- Replaced the use of `Helper::try_into_vector` with direct downcasting to `TimestampMillisecondArray`, `Float64Array`, and `StringArray` for improved type safety and clarity.
- Updated the construction of `api::v1::Rows` to directly handle null values and construct `api::v1::Value` objects accordingly.
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* feat/metric-engine-bulk-insert:
## Commit Message
Refactor `bulk_insert.rs` to optimize state access
- Moved the state read operation inside a new block to limit its scope and improve code clarity.
- Adjusted logic for processing `tag_columns` and `non_tag_indices` to work within the new block structure.
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* feat/metric-engine-bulk-insert:
### Refactor `compute_tsid_array` Function
- **Refactored `compute_tsid_array` function**: Modified the function signature to accept `tag_arrays` as a parameter instead of building it internally. This change affects the following files:
- `src/metric-engine/src/batch_modifier.rs`
- **Updated test cases**: Adjusted test cases to accommodate the new `compute_tsid_array` function signature by passing `tag_arrays` explicitly.
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* docs: add doc for bulk_insert_region
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* feat/metric-engine-bulk-insert:
### Commit Message
Refactor `bulk_insert.rs` in `metric-engine`:
- Removed error handling for unsupported status codes in `write_data` method.
- Eliminated `record_batch_to_rows` function, simplifying the data insertion process.
- Streamlined the `write_data` method by removing fallback logic for unsupported operations.
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* feat/metric-engine-bulk-insert:
- **Optimize Primary Key Construction**: Refactored `modify_batch_sparse` in `batch_modifier.rs` to use `BinaryBuilder` for more efficient primary key construction.
- **Add Fallback for Unsupported Bulk Inserts**: Updated `bulk_insert.rs` to handle unsupported bulk inserts by converting record batches to rows and using `RegionPutRequest`.
- **Implement Record Batch to Rows Conversion**: Added `record_batch_to_rows` function in `bulk_insert.rs` to convert `RecordBatch` to `api::v1::Rows` for fallback operations.
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* feat/metric-engine-bulk-insert:
Add test for handling null values in `record_batch_to_rows`
- Added a new test `test_record_batch_to_rows_with_null_values` in `bulk_insert.rs` to verify the handling of null values in the `record_batch_to_rows` function.
- The test checks the conversion of a `RecordBatch` with null values in various fields to ensure correct row creation and schema handling.
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* feat/metric-engine-bulk-insert:
Add fallback path for unsupported status and improve error context handling
- **`bulk_insert.rs`**:
- Added a fallback path for `PartitionTreeMemtable` in case of unsupported status code.
- Enhanced error handling by using `with_context` for better error messages when timestamp and value columns are not found in `RecordBatch`.
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
---------
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* feat(http): improve error logging with client IP
- Add logging to ErrorResponse::from_error_message()
- Add middleware to log HTTP errors with client IP
Closes#7328
Signed-off-by: maximk777 <maximkirienkov777@gmail.com>
* fix(http): address review comments for error logging
Restore rich Debug logging in from_error(), add URI/method/matched path
to client IP middleware, and only log when client address is available.
Signed-off-by: evenyag <realevenyag@gmail.com>
---------
Signed-off-by: maximk777 <maximkirienkov777@gmail.com>
Signed-off-by: evenyag <realevenyag@gmail.com>
Co-authored-by: evenyag <realevenyag@gmail.com>
* feat: support write flat as primary key format
Signed-off-by: evenyag <realevenyag@gmail.com>
* feat: migrate flush to always use FlatSource
Add FormatType propagation in SstWriteRequest and use it to choose
Flat vs PrimaryKey write paths (write_all_flat vs
write_all_flat_as_primary_key) in AccessLayer and WriteCache. Make
compactor and flush derive the sst_write_format from region options or
engine config. Simplify flush logic and remove the old memtable_source
helper. Update tests to set default sst_write_format.
Signed-off-by: evenyag <realevenyag@gmail.com>
* refactor: compaction use flat source
Signed-off-by: evenyag <realevenyag@gmail.com>
* refactor: read parquet sequentially as flat batches
Signed-off-by: evenyag <realevenyag@gmail.com>
* refactor: remove new_batch_with_binary in favor of new_record_batch_with_binary
Replace PrimaryKeyWriteFormat with FlatWriteFormat in test_read_large_binary
test and use new_record_batch_with_binary directly, removing the now-unused
new_batch_with_binary function and its BinaryArray import.
Signed-off-by: evenyag <realevenyag@gmail.com>
* test: add tests for PrimaryKeyWriteFormat::convert_flat_batch
Signed-off-by: evenyag <realevenyag@gmail.com>
* refactor: remove Either from SstWriteRequest
Signed-off-by: evenyag <realevenyag@gmail.com>
* fix: handle index build mode
Signed-off-by: evenyag <realevenyag@gmail.com>
* fix: consider sparse encoding and last non null in flush
Signed-off-by: evenyag <realevenyag@gmail.com>
* test: add unit tests for field_column_start edge cases
Signed-off-by: evenyag <realevenyag@gmail.com>
---------
Signed-off-by: evenyag <realevenyag@gmail.com>
* feat(procedure): detect potential deadlock when parent/child share lock keys
Add a deadlock detection mechanism in submit_subprocedure() to warn
when a child procedure's lock_key overlaps with its parent's lock_key.
When this happens, the parent holds the lock while waiting for the child
to complete (at child_notify.notified().await), but the child blocks
forever trying to acquire the same lock. This is a classic Hold-and-Wait
deadlock.
The detection:
- Emits a warn! log in all builds (visible in production)
- Triggers debug_assert!(false) in debug/test builds for early CI detection
This partially addresses the TODO at line 121-122 and is a follow-up
to the discussion in: https://github.com/GreptimeTeam/greptimedb/issues/7692
Signed-off-by: YZL0v3ZZ <2055877225@qq.com>
* style: fix trailing whitespace
Signed-off-by: YZL0v3ZZ <2055877225@qq.com>
* refactor(procedure): extract deadlock detection into a testable pure function
Signed-off-by: YZL0v3ZZ <2055877225@qq.com>
* fix(procedure): preserve lock mode when detecting parent/child deadlock
Signed-off-by: YZL0v3ZZ <2055877225@qq.com>
* re-run ci check
Signed-off-by: YZL0v3ZZ <2055877225@qq.com>
---------
Signed-off-by: YZL0v3ZZ <2055877225@qq.com>
* refactor/prom-related-code:
### Commit Message
Refactor Byte Handling and Improve Decoding Logic
- **`prom_decode.rs`**: Removed `Bytes` usage in favor of `Vec<u8>` for handling raw data, improving memory management and simplifying the decoding process.
- **`prom_store.rs`**: Updated `try_decompress` function to return `Vec<u8>` instead of `Bytes`, aligning with the new data handling approach.
- **`prom_row_builder.rs`**: Modified `TablesBuilder` to use `Vec<u8>` for `raw_data`, enhancing data manipulation capabilities.
- **`proto.rs`**: Refactored `PromWriteRequest` decoding logic to use `Vec<u8>`, optimizing the buffer management and decoding flow.
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* refactor: mod structure
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* refactor/prom-related-code:
- **Refactor `prom_store.rs` and `prom_remote_write/mod.rs`:** Moved `decode_remote_write_request` and `try_decompress` functions from `prom_store.rs` to `prom_remote_write/mod.rs`. This change centralizes the logic related to remote write request
decoding and decompression.
- **Update `PromValidationMode` in `validation.rs`:** Implemented `Default` trait using the `#[derive(Default)]` attribute for `PromValidationMode` and updated related methods to use `Result` instead of `std::result::Result`.
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* refactor/prom-related-code:
### Remove `proto.rs` and Update References
- **Removed**: Deleted the `proto.rs` file, which contained re-exports for Prometheus remote write decode types.
- **Updated References**: Adjusted references to `PromSeriesProcessor` and `PromWriteRequest` in `prom_decode.rs` and `prom_store.rs` to import directly from `prom_remote_write`.
- **Modified Modules**: Removed the `proto` module from `lib.rs`.
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* fix: lint
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* fix: remove assert_eq
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* refactor/prom-related-code:
### Refactor Prometheus Remote Write Module
- **Modularization of `prom_remote_write`:**
- Split `PromValidationMode` and `validate_label_name` into a new `validation` module.
- Moved `PromSeriesProcessor` and `PromWriteRequest` to a `decode` module.
- Separated `PromLabel` into a `types` module and adjusted visibility.
- **Visibility Adjustments:**
- Changed `PromTimeSeries` and `PromLabel` structs to `pub(crate)` for internal use.
- **File Updates:**
- Updated references in `prom_decode.rs`, `http.rs`, `prom_store.rs`, `decode.rs`, `mod.rs`, `row_builder.rs`, `types.rs`, `prom_store_test.rs`, and `test_util.rs` to reflect module changes.
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
---------
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* test: fix unstable index meta list test
Signed-off-by: evenyag <realevenyag@gmail.com>
* fix: raise bucket_size threshold to avoid bucketing sizes in [512, 999] to 0
Signed-off-by: evenyag <realevenyag@gmail.com>
---------
Signed-off-by: evenyag <realevenyag@gmail.com>
* feat: initial function rewriter for json_get
* feat: make sure rewrite rule is applied
* feat: keep analyzer's default rules
* feat: implement rewriter for arrow_cast
* test: add unit test for tht rewriter
* chore: format
* refactor: extract some more functions
* Apply suggestion from @waynexia
Co-authored-by: Ruihang Xia <waynestxia@gmail.com>
---------
Co-authored-by: Ruihang Xia <waynestxia@gmail.com>
* perf: optimize Prometheus label name decoding with byte-level validation
Add `decode_label_name` and `validate_label_name` to skip redundant
UTF-8 validation for Prometheus label names, which are guaranteed ASCII
(`[a-zA-Z_][a-zA-Z0-9_]*`). Rename `validate_bytes` to `validate_utf8`
for clarity and add benchmarks for label name validation and UTF-8
validation (std vs simdutf8).
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* perf(servers): optimize validate_label_name with lookup table and loop unrolling
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* perf/decode-prom-2:
- **Refactor UTF-8 Validation and Label Decoding**:
- Removed `validate_utf8` method and integrated label name validation directly in `decode_label_name` in `http.rs`.
- Updated `decode_label_name` to always enforce Prometheus label name validation across all modes.
- Adjusted test cases in `http.rs` to reflect the new validation logic.
- **Enhance Label Validation in `prom_row_builder.rs`**:
- Replaced UTF-8 validation with direct label name validation using `validate_label_name`.
- Updated `decode_label_name` usage to return `&str` and adjusted related logic.
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* perf/decode-prom-2:
**Refactor `TableBuilder` to Use `RawBytes` for Column Indexes**
- Updated `TableBuilder` in `prom_row_builder.rs` to use `RawBytes` instead of `Vec<u8>` for `col_indexes`.
- Modified `with_capacity` method to directly insert `RawBytes` for timestamp and value columns.
- Adjusted schema handling to use `to_owned` for `tag_name` and directly insert `raw_tag_name` into `col_indexes`.
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* perf/decode-prom-2:
### Commit Message
Refactor `PromWriteRequest` Method and Enhance Data Handling
- **Refactor Method**: Renamed the `merge` method to `decode` in `PromWriteRequest` to better reflect its functionality. Updated references in `prom_decode.rs`, `prom_store.rs`, and `prom_row_builder.rs`.
- **Enhance Data Handling**: Introduced `raw_data` field in `PromWriteRequest` to store a clone of the buffer for potential future use. Updated the `clear` method to reset `raw_data`.
Files affected: `prom_decode.rs`, `prom_store.rs`, `prom_row_builder.rs`, `proto.rs`.
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* perf/decode-prom-2:
**Commit Summary:**
- **Enhancement in `prom_row_builder.rs`:**
- Added a new field `raw_data` of type `Bytes` to `TablesBuilder`.
- Implemented `set_raw_data` method to update `raw_data`.
- Modified `clear` method to reset `raw_data`.
- **Refactor in `proto.rs`:**
- Removed `raw_data` field from `PromWriteRequest`.
- Updated `decode_and_process` method to use `set_raw_data` from `TablesBuilder` for handling raw data.
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* chore: remove duplicated validation
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* perf/decode-prom-2:
### Commit Message
Refactor `TablesBuilder` and `TableBuilder` to Use Lifetime Annotations
- Updated `prom_store.rs`:
- Modified `PROM_WRITE_REQUEST_POOL` and `decode_remote_write_request` to use lifetime annotations for `PromWriteRequest` and `TablesBuilder`.
- Updated `prom_row_builder.rs`:
- Refactored `TablesBuilder` and `TableBuilder` structs to include lifetime annotations.
- Adjusted methods in `TablesBuilder` and `TableBuilder` to accommodate lifetime changes.
- Updated `proto.rs`:
- Added lifetime annotations to `PromWriteRequest` and its methods.
- Modified `add_to_table_data` to use lifetime annotations for `TablesBuilder`.
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* chore: fmt
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
---------
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* feat: cast filters type for scanbench
Signed-off-by: evenyag <realevenyag@gmail.com>
* chore: pub file_range mod
So we can use the pub struct FileRange in other places
Signed-off-by: evenyag <realevenyag@gmail.com>
* fix: add api as dev-dependency to cmd for clippy
Signed-off-by: evenyag <realevenyag@gmail.com>
* feat: support profiling after warmup
Signed-off-by: evenyag <realevenyag@gmail.com>
---------
Signed-off-by: evenyag <realevenyag@gmail.com>
* perf/prom-decode:
**Refactor `PromLabel` to Use Raw Byte Slices**
- Updated `PromLabel` struct in `proto.rs` to use `RawBytes` for `name` and `value` fields, replacing `Bytes` with static byte slices.
- Modified test cases in `prom_row_builder.rs` to accommodate changes in `PromLabel` by using byte literals.
- Simplified `merge_bytes` function in `proto.rs` to directly assign byte slices, removing unnecessary memory operations.
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* perf/prom-decode:
- **Add UTF-8 Validation**: Introduced `validate_bytes` method in `http.rs` to validate UTF-8 encoding using `simdutf8` for `PromValidationMode::Strict`.
- **Update Column Indexing**: Modified `prom_row_builder.rs` to use `Vec<u8>` for `col_indexes` keys, ensuring UTF-8 validation for label names.
- **Dependency Update**: Added `simdutf8` version `0.1.5` to `Cargo.toml` and updated `Cargo.lock` to include this new dependency.
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* fix: style issues
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
---------
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* chore(version): refresh build info on demand
Introduce a `refresh-build-info` feature to `common-version` to control
whether build timestamps are updated. By default, timestamps are no longer
refreshed, and `shadow.rs` regeneration is skipped if it already exists.
This prevents the build script from invalidating incremental compilation
results when nothing else has changed. CI and release builds are updated
to explicitly enable this feature.
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* chore/refresh-build-info-on-demand:
### Update Build Configuration
- **Remove `refresh-build-info` Feature:**
- Removed the `refresh-build-info` feature from `action.yml`, `release.yml`, and `Cargo.toml`.
- Updated `build.rs` to refresh timestamps by default in release builds, with an option to disable via `DISABLE_BUILD_INFO`.
- **Modify GitHub Actions:**
- Updated `.github/actions/build-linux-artifacts/action.yml` and `.github/workflows/release.yml` to exclude `refresh-build-info` from the `features` list.
- **Enhance Build Script Logic:**
- Adjusted logic in `build.rs` to handle timestamp refreshing based on build profile and environment variables.
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
---------
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>