fix: avoid cloning serialized view plan on resolve
- Change `ViewInfoValue.view_info` from `Vec<u8>` to
`common_base::bytes::Bytes` so resolving a view no longer clones
the full serialized plan buffer on every decode.
- To keep the change narrow, the metadata write boundary still
accepts `Vec<u8>` and converts once when constructing/updating
`ViewInfoValue`. The hot read path now uses a cheap clone of the
stored bytes.
- The benches introduced revealed up to 82% resolution time
improvment.
Signed-off-by: Yao ACHI <achi.noel@hotmail.com>
* fix(datatypes): compare ConstantVector rhs inner in vector equality
When either operand is a ConstantVector, the recursive equal() call must
compare lhs.inner() against rhs.inner(). The second argument incorrectly
used lhs twice, breaking equality when only the rhs was constant.
Signed-off-by: Weixie Cui <cuiweixie@gmail.com>
* fix: review
Signed-off-by: Weixie Cui <cuiweixie@gmail.com>
---------
Signed-off-by: Weixie Cui <cuiweixie@gmail.com>
* bench(promql): add range-function benchmark suite
* perf(promql): use flat buffers in range function hot loops
* perf(promql): reuse quantile scratch buffers
* feat: metric batch 2s PoC
Signed-off-by: jeremyhi <fengjiachun@gmail.com>
* chore: max_concurrent_flushes
Signed-off-by: jeremyhi <fengjiachun@gmail.com>
* chore: work channel size
Signed-off-by: jeremyhi <fengjiachun@gmail.com>
* feat(servers): add metrics and logs for pending rows batch flush
Add the `FLUSH_ELAPSED` histogram metric to track the duration of pending
rows batch flushes in the Prometheus store protocol handler. This provides
better observability into the performance and latency of the batcher.
Also update telemetry by:
- Recording elapsed time for both successful and failed flush operations.
- Adding an informational log upon successful flush including row count and duration.
- Including elapsed time in error logs when a flush fails.
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* feat(servers): implement columnar batching for pending rows
Refactor PendingRowsBatcher to use columnar batching for the metrics
store. Incoming RowInsertRequests are now converted to RecordBatches,
partitioned, and flushed via BulkInsert requests to datanodes.
- Enhance MultiDimPartitionRule to handle scalar boolean predicates.
- Add metrics for tracking flush failures and dropped rows.
- Update dependencies to support columnar batching in servers.
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* feat(servers): add backpressure for pending rows
Implement backpressure in PendingRowsBatcher by limiting in-flight
requests with a semaphore and making the submission wait for the flush
result. This ensures Prometheus write requests are throttled and only
return once the data has been successfully flushed to datanodes.
- Add max_inflight_requests to PromStoreOptions.
- Use oneshot channels to notify submitters of flush completion.
- Limit concurrent requests using a new inflight_semaphore.
- Update PendingRowsBatcher::submit to wait for the flush outcome.
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* feat: add stage-level metrics for bulk ingestion
Introduce histograms to track the elapsed time of various stages in the
metric engine bulk insert path and the server's pending rows batcher.
This provides better observability into the performance bottlenecks
of the ingestion pipeline.
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* - `src/metric-engine/src/engine/bulk_insert.rs`: Removed the fallback mechanism that converted record batches to rows when bulk inserts were unsupported, along with related helper functions and unused imports.
- `src/operator/src/insert.rs`: Removed an unused import (`common_time::TimeToLive::Instant`).
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* feat(servers): columnar Prom remote write
Optimize the Prometheus remote write path by allowing direct conversion
from decoded Prometheus samples to Arrow RecordBatches. This bypasses
intermediate row-based representations when `PendingRowsBatcher` is
active and no pipeline is used, improving ingestion efficiency.
- Implement `as_record_batch_groups` in `TablesBuilder` and `PromWriteRequest`.
- Add `submit_prom_record_batch_groups` to `PendingRowsBatcher`.
- Introduce `DecodedPromWriteRequest` in `prom_store`.
- Implement row-to-RecordBatch conversion logic in `prom_row_builder`.
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* Revert "feat(servers): columnar Prom remote write"
This reverts commit efbb63c12a3e7fcec03858ea0351efd94fec8242.
* refactor(servers): improve row to RecordBatch conversion
- Use `snafu::ensure` for row validation in `rows_to_record_batch`.
- Add explicit type hint for `MutableVector` to improve clarity.
- Reorganize and clean up imports in `pending_rows_batcher.rs`.
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* perf(servers): use arrow builders for row conversion
This commit optimizes the conversion from `api::v1::Rows` to `RecordBatch`
by using Arrow builders directly. This avoids the overhead of
`MutableVector` and `common_recordbatch`, leading to better performance
in the `pending_rows_batcher`.
Additionally, the `#[allow(dead_code)]` attribute is removed from
`modify_batch_sparse` in the metric engine as it is now utilized.
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* perf(metric-engine): optimize batch modification
Optimize `modify_batch_sparse` by reusing buffers, using Arrow
builders, and employing fast-path encoding methods. This reduces
allocations and avoids redundant downcasting and serializer overhead.
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* feat/metric-engine-support-bulk:
**Add Environment Variable for Batch Sync Control**
- `pending_rows_batcher.rs`: Introduced an environment variable `PENDING_ROWS_BATCH_SYNC` to control the synchronization behavior of batch processing. If set to true, the function will wait for the flush result; otherwise, it will return immediatel
with the total rows count.
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* wip
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* chore: update and fix clippy
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* fix: failing test
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* picking-pending-rows-batcher:
### Commit Message
Remove Unused Code and Simplify Error Handling
- **`src/error.rs`**: Removed the `BatcherQueueFull` error variant and its associated logic, simplifying the error handling by removing unused code.
- **`src/http/prom_store.rs`**: Eliminated the `try_decompress` function, streamlining the decompression logic by directly using `snappy_decompress` in `decode_remote_read_request`.
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* chore: parse PENDING_ROWS_BATCH_SYNC once
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* chore: revert unrelated changes
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* **Refactor Prometheus Write Handling**
- **`prom_store.rs`**: Introduced `pre_write` method in `PromStoreProtocolHandler` to handle pre-write checks for Prometheus remote write requests. Updated `write` method to utilize `pre_write`.
- **`server.rs`**: Modified `PendingRowsBatcher` initialization to conditionally create a batcher based on `with_metric_engine` flag.
- **`http/prom_store.rs`**: Integrated `pre_write` checks before submitting requests to `PendingRowsBatcher`.
- **`query_handler.rs`**: Added `pre_write` method to `PromStoreProtocolHandler` trait for pre-write operations.
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* picking-pending-rows-batcher:
- **Fix Label Typo**: Corrected a typo in the label value from `"flush_wn ite_region"` to `"flush_write_region"` in `pending_rows_batcher.rs`.
- **Refactor Array Building Logic**: Introduced a macro `build_array!` to streamline the construction of `ArrayRef` for different data types, reducing code duplication in `pending_rows_batcher.rs`.
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* format toml
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* picking-pending-rows-batcher:
### Update PromStore and PendingRowsBatcher Configuration
- **`prom_store.rs`**: Set `pending_rows_flush_interval` to `Duration::ZERO` to disable automatic flushing.
- **`pending_rows_batcher.rs`**: Enhance validation to disable the batcher when `flush_interval` is zero or configuration values like `max_batch_rows`, `max_concurrent_flushes`, `worker_channel_capacity`, or `max_inflight_requests` are zero, preventing potential panics or deadlocks.
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* picking-pending-rows-batcher:
### Update `pending_rows_flush_interval` to Zero
- **Files Modified**:
- `src/frontend/src/service_config/prom_store.rs`
- `tests-integration/tests/http.rs`
- **Key Changes**:
- Updated `pending_rows_flush_interval` from `Duration::from_secs(2)` to `Duration::ZERO` in `prom_store.rs`.
- Changed `pending_rows_flush_interval` configuration from `"2s"` to `"0s"` in `http.rs`.
These changes set the flush interval to zero, potentially affecting how frequently pending rows are flushed.
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* picking-pending-rows-batcher:
**Add Worker Management Enhancements**
- **`metrics.rs`**: Introduced `PENDING_WORKERS` gauge to track active pending rows batch workers.
- **`pending_rows_batcher.rs`**:
- Added worker idle timeout logic with `WORKER_IDLE_TIMEOUT_MULTIPLIER`.
- Implemented worker management functions: `spawn_worker`, `remove_worker_if_same_channel`, and `should_close_worker_on_idle_timeout`.
- Enhanced worker lifecycle management to handle idle workers and ensure proper cleanup.
- **Tests**: Added unit tests for worker removal and idle timeout logic.
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* fix: clippy
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
---------
Signed-off-by: jeremyhi <fengjiachun@gmail.com>
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
Co-authored-by: jeremyhi <fengjiachun@gmail.com>
Allow flush edits with equal entry ids when flushed sequence advances, so close-time flush after truncate still succeeds for skip-wal regions while stale pre-truncate flushes are rejected. Add a regression test for create->truncate->write->close timing.
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* feat/flat-for-time-series:
### Commit Message
Enhance `TimeSeriesMemtable` with Record Batch Support
- **`time_series.rs`**:
- Introduced `BatchToRecordBatchContext` to facilitate conversion of batch iterators to record batch iterators.
- Added `build_record_batch` method in `TimeSeriesIterBuilder` to support record batch creation.
- Implemented multiple test cases to validate the functionality of record batch creation, including tests for projections,
deduplication, sequence filtering, and data correctness.
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* feat/flat-for-time-series:
Refactor `TimeSeriesMemtable` and `TimeSeriesIterBuilder`
- Renamed `adapter_context` to `batch_to_record_batch` in `TimeSeriesMemtable` for clarity.
- Simplified `MemtableRangeContext` initialization by removing the `batch_to_record_batch` parameter.
- Added `is_record_batch` method to `TimeSeriesIterBuilder` to indicate record batch status.
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* feat/flat-for-time-series:
### Add Time Range Filtering and Predicate Group Enhancements
- **`memtable.rs`**: Updated `IterBuilder` to include `time_range` parameter in `build_record_batch` method, enhancing record batch iteration with time range filtering.
- **`time_series.rs`**: Modified `TimeSeriesIterBuilder` to use `PredicateGroup` instead of `Predicate`, and integrated `PruneTimeIterator` for time-based filtering.
- **`memtable_util.rs`**: Removed unused `Predicate` import, reflecting changes in predicate handling.
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
---------
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
chore: remove GrpcQueryHandler::put_record_batch, we should use GrpcQueryHandler::handle_put_record_batch_stream instead
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* fix: resolve optimization issue for extended query
* fix: type cast from subquery
* chore: update error information in sqlness
* chore: switch to released pgwire
* refactor: remove optimize function completely
* chore: add more tests
* test: attempt to fix the fuzz issue
* fix: try to resolve the test issue
* perf: support group accumulators for state wrapper
* new tests and avoid clone
Signed-off-by: Ruihang Xia <waynestxia@gmail.com>
---------
Signed-off-by: Ruihang Xia <waynestxia@gmail.com>
* feat(metric-engine): support bulk inserts
Implement `RegionRequest::BulkInserts` to support efficient columnar data
ingestion in the metric engine.
Key changes:
- Implement `bulk_insert_region` to handle logical-to-physical region mapping
and dispatch writes.
- Add `batch_modifier` for `RecordBatch` transformations, specifically for
`__tsid` generation and sparse primary key encoding.
- Integrate `BulkInserts` into the `MetricEngine` request handling logic.
- Provide a row-based fallback mechanism if the underlying storage doesn't
support bulk writes.
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* feat/metric-engine-bulk-insert:
### Update `bulk_insert.rs` to Support Partition Expression Version
- **Enhancements**:
- Added support for `partition_expr_version` in `RegionBulkInsertsRequest` and `RegionPutRequest`.
- Modified the handling of `partition_expr_version` to be dynamically set from the `request` object.
Files affected:
- `src/metric-engine/src/engine/bulk_insert.rs`
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* fix: cargo lock revert
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* add doc for conversions
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* chore: simplify test
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* feat/metric-engine-bulk-insert:
### Refactor `bulk_insert.rs` in `metric-engine`
- **Refactor Functionality**:
- Replaced `resolve_tag_columns` with `resolve_tag_columns_from_metadata` to streamline tag column resolution.
- Moved logic for resolving tag columns directly into `resolve_tag_columns_from_metadata`, removing the need for an external function call.
- **Enhancements**:
- Improved error handling and context provision for missing physical regions and columns.
- Optimized tag column sorting and index management within the batch processing logic.
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* feat/metric-engine-bulk-insert:
### Refactor `record_batch_to_rows` Function in `bulk_insert.rs`
- Simplified the `record_batch_to_rows` function by removing the `logical_metadata` parameter and directly validating column types within the function.
- Enhanced error handling for timestamp, value, and tag columns by checking their data types and providing detailed error messages.
- Replaced the use of `Helper::try_into_vector` with direct downcasting to `TimestampMillisecondArray`, `Float64Array`, and `StringArray` for improved type safety and clarity.
- Updated the construction of `api::v1::Rows` to directly handle null values and construct `api::v1::Value` objects accordingly.
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* feat/metric-engine-bulk-insert:
## Commit Message
Refactor `bulk_insert.rs` to optimize state access
- Moved the state read operation inside a new block to limit its scope and improve code clarity.
- Adjusted logic for processing `tag_columns` and `non_tag_indices` to work within the new block structure.
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* feat/metric-engine-bulk-insert:
### Refactor `compute_tsid_array` Function
- **Refactored `compute_tsid_array` function**: Modified the function signature to accept `tag_arrays` as a parameter instead of building it internally. This change affects the following files:
- `src/metric-engine/src/batch_modifier.rs`
- **Updated test cases**: Adjusted test cases to accommodate the new `compute_tsid_array` function signature by passing `tag_arrays` explicitly.
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* docs: add doc for bulk_insert_region
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* feat/metric-engine-bulk-insert:
### Commit Message
Refactor `bulk_insert.rs` in `metric-engine`:
- Removed error handling for unsupported status codes in `write_data` method.
- Eliminated `record_batch_to_rows` function, simplifying the data insertion process.
- Streamlined the `write_data` method by removing fallback logic for unsupported operations.
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* feat/metric-engine-bulk-insert:
- **Optimize Primary Key Construction**: Refactored `modify_batch_sparse` in `batch_modifier.rs` to use `BinaryBuilder` for more efficient primary key construction.
- **Add Fallback for Unsupported Bulk Inserts**: Updated `bulk_insert.rs` to handle unsupported bulk inserts by converting record batches to rows and using `RegionPutRequest`.
- **Implement Record Batch to Rows Conversion**: Added `record_batch_to_rows` function in `bulk_insert.rs` to convert `RecordBatch` to `api::v1::Rows` for fallback operations.
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* feat/metric-engine-bulk-insert:
Add test for handling null values in `record_batch_to_rows`
- Added a new test `test_record_batch_to_rows_with_null_values` in `bulk_insert.rs` to verify the handling of null values in the `record_batch_to_rows` function.
- The test checks the conversion of a `RecordBatch` with null values in various fields to ensure correct row creation and schema handling.
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* feat/metric-engine-bulk-insert:
Add fallback path for unsupported status and improve error context handling
- **`bulk_insert.rs`**:
- Added a fallback path for `PartitionTreeMemtable` in case of unsupported status code.
- Enhanced error handling by using `with_context` for better error messages when timestamp and value columns are not found in `RecordBatch`.
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
---------
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* feat(http): improve error logging with client IP
- Add logging to ErrorResponse::from_error_message()
- Add middleware to log HTTP errors with client IP
Closes#7328
Signed-off-by: maximk777 <maximkirienkov777@gmail.com>
* fix(http): address review comments for error logging
Restore rich Debug logging in from_error(), add URI/method/matched path
to client IP middleware, and only log when client address is available.
Signed-off-by: evenyag <realevenyag@gmail.com>
---------
Signed-off-by: maximk777 <maximkirienkov777@gmail.com>
Signed-off-by: evenyag <realevenyag@gmail.com>
Co-authored-by: evenyag <realevenyag@gmail.com>