* feat/flat-for-time-series:
### Commit Message
Enhance `TimeSeriesMemtable` with Record Batch Support
- **`time_series.rs`**:
- Introduced `BatchToRecordBatchContext` to facilitate conversion of batch iterators to record batch iterators.
- Added `build_record_batch` method in `TimeSeriesIterBuilder` to support record batch creation.
- Implemented multiple test cases to validate the functionality of record batch creation, including tests for projections,
deduplication, sequence filtering, and data correctness.
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* feat/flat-for-time-series:
Refactor `TimeSeriesMemtable` and `TimeSeriesIterBuilder`
- Renamed `adapter_context` to `batch_to_record_batch` in `TimeSeriesMemtable` for clarity.
- Simplified `MemtableRangeContext` initialization by removing the `batch_to_record_batch` parameter.
- Added `is_record_batch` method to `TimeSeriesIterBuilder` to indicate record batch status.
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* feat/flat-for-time-series:
### Add Time Range Filtering and Predicate Group Enhancements
- **`memtable.rs`**: Updated `IterBuilder` to include `time_range` parameter in `build_record_batch` method, enhancing record batch iteration with time range filtering.
- **`time_series.rs`**: Modified `TimeSeriesIterBuilder` to use `PredicateGroup` instead of `Predicate`, and integrated `PruneTimeIterator` for time-based filtering.
- **`memtable_util.rs`**: Removed unused `Predicate` import, reflecting changes in predicate handling.
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
---------
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
chore: remove GrpcQueryHandler::put_record_batch, we should use GrpcQueryHandler::handle_put_record_batch_stream instead
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* fix: resolve optimization issue for extended query
* fix: type cast from subquery
* chore: update error information in sqlness
* chore: switch to released pgwire
* refactor: remove optimize function completely
* chore: add more tests
* test: attempt to fix the fuzz issue
* fix: try to resolve the test issue
* perf: support group accumulators for state wrapper
* new tests and avoid clone
Signed-off-by: Ruihang Xia <waynestxia@gmail.com>
---------
Signed-off-by: Ruihang Xia <waynestxia@gmail.com>
* feat(metric-engine): support bulk inserts
Implement `RegionRequest::BulkInserts` to support efficient columnar data
ingestion in the metric engine.
Key changes:
- Implement `bulk_insert_region` to handle logical-to-physical region mapping
and dispatch writes.
- Add `batch_modifier` for `RecordBatch` transformations, specifically for
`__tsid` generation and sparse primary key encoding.
- Integrate `BulkInserts` into the `MetricEngine` request handling logic.
- Provide a row-based fallback mechanism if the underlying storage doesn't
support bulk writes.
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* feat/metric-engine-bulk-insert:
### Update `bulk_insert.rs` to Support Partition Expression Version
- **Enhancements**:
- Added support for `partition_expr_version` in `RegionBulkInsertsRequest` and `RegionPutRequest`.
- Modified the handling of `partition_expr_version` to be dynamically set from the `request` object.
Files affected:
- `src/metric-engine/src/engine/bulk_insert.rs`
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* fix: cargo lock revert
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* add doc for conversions
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* chore: simplify test
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* feat/metric-engine-bulk-insert:
### Refactor `bulk_insert.rs` in `metric-engine`
- **Refactor Functionality**:
- Replaced `resolve_tag_columns` with `resolve_tag_columns_from_metadata` to streamline tag column resolution.
- Moved logic for resolving tag columns directly into `resolve_tag_columns_from_metadata`, removing the need for an external function call.
- **Enhancements**:
- Improved error handling and context provision for missing physical regions and columns.
- Optimized tag column sorting and index management within the batch processing logic.
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* feat/metric-engine-bulk-insert:
### Refactor `record_batch_to_rows` Function in `bulk_insert.rs`
- Simplified the `record_batch_to_rows` function by removing the `logical_metadata` parameter and directly validating column types within the function.
- Enhanced error handling for timestamp, value, and tag columns by checking their data types and providing detailed error messages.
- Replaced the use of `Helper::try_into_vector` with direct downcasting to `TimestampMillisecondArray`, `Float64Array`, and `StringArray` for improved type safety and clarity.
- Updated the construction of `api::v1::Rows` to directly handle null values and construct `api::v1::Value` objects accordingly.
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* feat/metric-engine-bulk-insert:
## Commit Message
Refactor `bulk_insert.rs` to optimize state access
- Moved the state read operation inside a new block to limit its scope and improve code clarity.
- Adjusted logic for processing `tag_columns` and `non_tag_indices` to work within the new block structure.
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* feat/metric-engine-bulk-insert:
### Refactor `compute_tsid_array` Function
- **Refactored `compute_tsid_array` function**: Modified the function signature to accept `tag_arrays` as a parameter instead of building it internally. This change affects the following files:
- `src/metric-engine/src/batch_modifier.rs`
- **Updated test cases**: Adjusted test cases to accommodate the new `compute_tsid_array` function signature by passing `tag_arrays` explicitly.
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* docs: add doc for bulk_insert_region
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* feat/metric-engine-bulk-insert:
### Commit Message
Refactor `bulk_insert.rs` in `metric-engine`:
- Removed error handling for unsupported status codes in `write_data` method.
- Eliminated `record_batch_to_rows` function, simplifying the data insertion process.
- Streamlined the `write_data` method by removing fallback logic for unsupported operations.
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* feat/metric-engine-bulk-insert:
- **Optimize Primary Key Construction**: Refactored `modify_batch_sparse` in `batch_modifier.rs` to use `BinaryBuilder` for more efficient primary key construction.
- **Add Fallback for Unsupported Bulk Inserts**: Updated `bulk_insert.rs` to handle unsupported bulk inserts by converting record batches to rows and using `RegionPutRequest`.
- **Implement Record Batch to Rows Conversion**: Added `record_batch_to_rows` function in `bulk_insert.rs` to convert `RecordBatch` to `api::v1::Rows` for fallback operations.
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* feat/metric-engine-bulk-insert:
Add test for handling null values in `record_batch_to_rows`
- Added a new test `test_record_batch_to_rows_with_null_values` in `bulk_insert.rs` to verify the handling of null values in the `record_batch_to_rows` function.
- The test checks the conversion of a `RecordBatch` with null values in various fields to ensure correct row creation and schema handling.
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* feat/metric-engine-bulk-insert:
Add fallback path for unsupported status and improve error context handling
- **`bulk_insert.rs`**:
- Added a fallback path for `PartitionTreeMemtable` in case of unsupported status code.
- Enhanced error handling by using `with_context` for better error messages when timestamp and value columns are not found in `RecordBatch`.
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
---------
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* feat(http): improve error logging with client IP
- Add logging to ErrorResponse::from_error_message()
- Add middleware to log HTTP errors with client IP
Closes#7328
Signed-off-by: maximk777 <maximkirienkov777@gmail.com>
* fix(http): address review comments for error logging
Restore rich Debug logging in from_error(), add URI/method/matched path
to client IP middleware, and only log when client address is available.
Signed-off-by: evenyag <realevenyag@gmail.com>
---------
Signed-off-by: maximk777 <maximkirienkov777@gmail.com>
Signed-off-by: evenyag <realevenyag@gmail.com>
Co-authored-by: evenyag <realevenyag@gmail.com>
* feat: support write flat as primary key format
Signed-off-by: evenyag <realevenyag@gmail.com>
* feat: migrate flush to always use FlatSource
Add FormatType propagation in SstWriteRequest and use it to choose
Flat vs PrimaryKey write paths (write_all_flat vs
write_all_flat_as_primary_key) in AccessLayer and WriteCache. Make
compactor and flush derive the sst_write_format from region options or
engine config. Simplify flush logic and remove the old memtable_source
helper. Update tests to set default sst_write_format.
Signed-off-by: evenyag <realevenyag@gmail.com>
* refactor: compaction use flat source
Signed-off-by: evenyag <realevenyag@gmail.com>
* refactor: read parquet sequentially as flat batches
Signed-off-by: evenyag <realevenyag@gmail.com>
* refactor: remove new_batch_with_binary in favor of new_record_batch_with_binary
Replace PrimaryKeyWriteFormat with FlatWriteFormat in test_read_large_binary
test and use new_record_batch_with_binary directly, removing the now-unused
new_batch_with_binary function and its BinaryArray import.
Signed-off-by: evenyag <realevenyag@gmail.com>
* test: add tests for PrimaryKeyWriteFormat::convert_flat_batch
Signed-off-by: evenyag <realevenyag@gmail.com>
* refactor: remove Either from SstWriteRequest
Signed-off-by: evenyag <realevenyag@gmail.com>
* fix: handle index build mode
Signed-off-by: evenyag <realevenyag@gmail.com>
* fix: consider sparse encoding and last non null in flush
Signed-off-by: evenyag <realevenyag@gmail.com>
* test: add unit tests for field_column_start edge cases
Signed-off-by: evenyag <realevenyag@gmail.com>
---------
Signed-off-by: evenyag <realevenyag@gmail.com>
* feat(procedure): detect potential deadlock when parent/child share lock keys
Add a deadlock detection mechanism in submit_subprocedure() to warn
when a child procedure's lock_key overlaps with its parent's lock_key.
When this happens, the parent holds the lock while waiting for the child
to complete (at child_notify.notified().await), but the child blocks
forever trying to acquire the same lock. This is a classic Hold-and-Wait
deadlock.
The detection:
- Emits a warn! log in all builds (visible in production)
- Triggers debug_assert!(false) in debug/test builds for early CI detection
This partially addresses the TODO at line 121-122 and is a follow-up
to the discussion in: https://github.com/GreptimeTeam/greptimedb/issues/7692
Signed-off-by: YZL0v3ZZ <2055877225@qq.com>
* style: fix trailing whitespace
Signed-off-by: YZL0v3ZZ <2055877225@qq.com>
* refactor(procedure): extract deadlock detection into a testable pure function
Signed-off-by: YZL0v3ZZ <2055877225@qq.com>
* fix(procedure): preserve lock mode when detecting parent/child deadlock
Signed-off-by: YZL0v3ZZ <2055877225@qq.com>
* re-run ci check
Signed-off-by: YZL0v3ZZ <2055877225@qq.com>
---------
Signed-off-by: YZL0v3ZZ <2055877225@qq.com>
* refactor/prom-related-code:
### Commit Message
Refactor Byte Handling and Improve Decoding Logic
- **`prom_decode.rs`**: Removed `Bytes` usage in favor of `Vec<u8>` for handling raw data, improving memory management and simplifying the decoding process.
- **`prom_store.rs`**: Updated `try_decompress` function to return `Vec<u8>` instead of `Bytes`, aligning with the new data handling approach.
- **`prom_row_builder.rs`**: Modified `TablesBuilder` to use `Vec<u8>` for `raw_data`, enhancing data manipulation capabilities.
- **`proto.rs`**: Refactored `PromWriteRequest` decoding logic to use `Vec<u8>`, optimizing the buffer management and decoding flow.
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* refactor: mod structure
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* refactor/prom-related-code:
- **Refactor `prom_store.rs` and `prom_remote_write/mod.rs`:** Moved `decode_remote_write_request` and `try_decompress` functions from `prom_store.rs` to `prom_remote_write/mod.rs`. This change centralizes the logic related to remote write request
decoding and decompression.
- **Update `PromValidationMode` in `validation.rs`:** Implemented `Default` trait using the `#[derive(Default)]` attribute for `PromValidationMode` and updated related methods to use `Result` instead of `std::result::Result`.
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* refactor/prom-related-code:
### Remove `proto.rs` and Update References
- **Removed**: Deleted the `proto.rs` file, which contained re-exports for Prometheus remote write decode types.
- **Updated References**: Adjusted references to `PromSeriesProcessor` and `PromWriteRequest` in `prom_decode.rs` and `prom_store.rs` to import directly from `prom_remote_write`.
- **Modified Modules**: Removed the `proto` module from `lib.rs`.
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* fix: lint
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* fix: remove assert_eq
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* refactor/prom-related-code:
### Refactor Prometheus Remote Write Module
- **Modularization of `prom_remote_write`:**
- Split `PromValidationMode` and `validate_label_name` into a new `validation` module.
- Moved `PromSeriesProcessor` and `PromWriteRequest` to a `decode` module.
- Separated `PromLabel` into a `types` module and adjusted visibility.
- **Visibility Adjustments:**
- Changed `PromTimeSeries` and `PromLabel` structs to `pub(crate)` for internal use.
- **File Updates:**
- Updated references in `prom_decode.rs`, `http.rs`, `prom_store.rs`, `decode.rs`, `mod.rs`, `row_builder.rs`, `types.rs`, `prom_store_test.rs`, and `test_util.rs` to reflect module changes.
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
---------
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* test: fix unstable index meta list test
Signed-off-by: evenyag <realevenyag@gmail.com>
* fix: raise bucket_size threshold to avoid bucketing sizes in [512, 999] to 0
Signed-off-by: evenyag <realevenyag@gmail.com>
---------
Signed-off-by: evenyag <realevenyag@gmail.com>
* feat: initial function rewriter for json_get
* feat: make sure rewrite rule is applied
* feat: keep analyzer's default rules
* feat: implement rewriter for arrow_cast
* test: add unit test for tht rewriter
* chore: format
* refactor: extract some more functions
* Apply suggestion from @waynexia
Co-authored-by: Ruihang Xia <waynestxia@gmail.com>
---------
Co-authored-by: Ruihang Xia <waynestxia@gmail.com>
* perf: optimize Prometheus label name decoding with byte-level validation
Add `decode_label_name` and `validate_label_name` to skip redundant
UTF-8 validation for Prometheus label names, which are guaranteed ASCII
(`[a-zA-Z_][a-zA-Z0-9_]*`). Rename `validate_bytes` to `validate_utf8`
for clarity and add benchmarks for label name validation and UTF-8
validation (std vs simdutf8).
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* perf(servers): optimize validate_label_name with lookup table and loop unrolling
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* perf/decode-prom-2:
- **Refactor UTF-8 Validation and Label Decoding**:
- Removed `validate_utf8` method and integrated label name validation directly in `decode_label_name` in `http.rs`.
- Updated `decode_label_name` to always enforce Prometheus label name validation across all modes.
- Adjusted test cases in `http.rs` to reflect the new validation logic.
- **Enhance Label Validation in `prom_row_builder.rs`**:
- Replaced UTF-8 validation with direct label name validation using `validate_label_name`.
- Updated `decode_label_name` usage to return `&str` and adjusted related logic.
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* perf/decode-prom-2:
**Refactor `TableBuilder` to Use `RawBytes` for Column Indexes**
- Updated `TableBuilder` in `prom_row_builder.rs` to use `RawBytes` instead of `Vec<u8>` for `col_indexes`.
- Modified `with_capacity` method to directly insert `RawBytes` for timestamp and value columns.
- Adjusted schema handling to use `to_owned` for `tag_name` and directly insert `raw_tag_name` into `col_indexes`.
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* perf/decode-prom-2:
### Commit Message
Refactor `PromWriteRequest` Method and Enhance Data Handling
- **Refactor Method**: Renamed the `merge` method to `decode` in `PromWriteRequest` to better reflect its functionality. Updated references in `prom_decode.rs`, `prom_store.rs`, and `prom_row_builder.rs`.
- **Enhance Data Handling**: Introduced `raw_data` field in `PromWriteRequest` to store a clone of the buffer for potential future use. Updated the `clear` method to reset `raw_data`.
Files affected: `prom_decode.rs`, `prom_store.rs`, `prom_row_builder.rs`, `proto.rs`.
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* perf/decode-prom-2:
**Commit Summary:**
- **Enhancement in `prom_row_builder.rs`:**
- Added a new field `raw_data` of type `Bytes` to `TablesBuilder`.
- Implemented `set_raw_data` method to update `raw_data`.
- Modified `clear` method to reset `raw_data`.
- **Refactor in `proto.rs`:**
- Removed `raw_data` field from `PromWriteRequest`.
- Updated `decode_and_process` method to use `set_raw_data` from `TablesBuilder` for handling raw data.
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* chore: remove duplicated validation
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* perf/decode-prom-2:
### Commit Message
Refactor `TablesBuilder` and `TableBuilder` to Use Lifetime Annotations
- Updated `prom_store.rs`:
- Modified `PROM_WRITE_REQUEST_POOL` and `decode_remote_write_request` to use lifetime annotations for `PromWriteRequest` and `TablesBuilder`.
- Updated `prom_row_builder.rs`:
- Refactored `TablesBuilder` and `TableBuilder` structs to include lifetime annotations.
- Adjusted methods in `TablesBuilder` and `TableBuilder` to accommodate lifetime changes.
- Updated `proto.rs`:
- Added lifetime annotations to `PromWriteRequest` and its methods.
- Modified `add_to_table_data` to use lifetime annotations for `TablesBuilder`.
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* chore: fmt
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
---------
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>