* fix: add overflow check before interleave()
Signed-off-by: evenyag <realevenyag@gmail.com>
* refactor: pass batches and column index to check_interleave_bytes_overflow
Refactor check_interleave_bytes_overflow to accept batches and a column
index directly, avoiding the intermediate Vec collection of arrays.
Signed-off-by: evenyag <realevenyag@gmail.com>
---------
Signed-off-by: evenyag <realevenyag@gmail.com>
* feat: support alter from primary_key to flat
Signed-off-by: evenyag <realevenyag@gmail.com>
* chore: alter flat to primary_key
Signed-off-by: evenyag <realevenyag@gmail.com>
* feat: change default_experimental_flat_format to true
Signed-off-by: evenyag <realevenyag@gmail.com>
* feat: compute channel size from splitted batch size
Signed-off-by: evenyag <realevenyag@gmail.com>
* test: add tests for split and channel size
Signed-off-by: evenyag <realevenyag@gmail.com>
* fix: always set sst_format from manifest on region open
sanitize_region_options did not set options.sst_format when the
default (PrimaryKey) matched the manifest value, leaving it as None
after reopen. This caused the alter format change to appear lost.
Signed-off-by: evenyag <realevenyag@gmail.com>
* test: fix tests
Signed-off-by: evenyag <realevenyag@gmail.com>
* test: show create table after alteration
Signed-off-by: evenyag <realevenyag@gmail.com>
* refactor!: rename default_experimental_flat_format to default_flat_format
The flat format is no longer experimental. Remove "experimental" from
the config field name, doc comments, and all references.
Signed-off-by: evenyag <realevenyag@gmail.com>
* chore: fix clippy
Signed-off-by: evenyag <realevenyag@gmail.com>
---------
Signed-off-by: evenyag <realevenyag@gmail.com>
* perf/schema-align:
**Refactor and Enhance Error Handling in `pending_rows_batcher.rs`**
- **Refactored `record_failure` Macro**: Moved the `record_failure` macro outside of the `flush_batch_physical` function to improve code reuse and maintainability.
- **Enhanced Batch Transformation**: Introduced `transform_logical_batches_to_physical` function to handle the transformation of logical table batches into physical format.
- **Batch Concatenation**: Added `concat_modified_batches` function to concatenate modified batches into a single batch.
- **Region Write Splitting**: Implemented `split_and_encode_region_writes` function to split combined batches into region-specific writes based on partition rules.
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* perf/schema-align:
Add tests for `transform_logical_batches_to_physical` in `pending_rows_batcher.rs`
- Implemented `mock_tag_batch` function to create mock `RecordBatch` instances for testing.
- Added multiple test cases for `transform_logical_batches_to_physical`:
- `test_transform_logical_batches_to_physical_success`: Verifies successful transformation of logical to physical batches.
- `test_transform_logical_batches_to_physical_taxonomy_failure`: Tests failure scenario when column IDs are missing.
- `test_transform_logical_batches_to_physical_multiple_batches`: Checks handling of multiple batches.
- `test_transform_logical_batches_to_physical_mixed_success_failure`: Tests mixed success and failure scenarios.
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* perf/schema-align: refactor `flush_batch_physical` for better testability
Introduced several traits to abstract dependencies on CatalogManager, PartitionRuleManager,
and NodeManager, enabling easier unit testing with mock implementations.
- Added `PhysicalFlushCatalogProvider`, `PhysicalFlushPartitionProvider`, and `PhysicalFlushNodeRequester` traits.
- Implemented adapters for existing managers to satisfy the new traits.
- Refactored `flush_batch_physical` to use these traits instead of concrete manager references.
- Modularized region write planning, resolution, and encoding into standalone functions.
- Added comprehensive unit tests for the refactored logic, including edge cases for table lookup and region routing.
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* perf/schema-align:
### Enhance Error Handling and Simplify Code in `error.rs` and `pending_rows_batcher.rs`
- **Error Handling Improvements**:
- Added new error variants `Partition` and `MetricEngine` in `error.rs` to handle specific error cases.
- Updated error propagation using `ResultExt` and `context` for better error messages and handling in `pending_rows_batcher.rs`.
- **Code Simplification**:
- Removed `FlushWriteResult` enum and refactored `flush_region_writes_concurrently` to return `Result<()>`.
- Simplified error handling in `flush_batch_physical` and related functions by removing `first_error` and using `Result` for error propagation.
- **Test Adjustments**:
- Updated tests to align with the new error handling approach, ensuring they check for specific error messages and conditions.
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* perf/schema-align: refactor `PendingBatch` to use `Option` for cleaner state management
Refactored `PendingBatch` in `pending_rows_batcher.rs` to use `Option<PendingBatch>`
within the worker loop. This change simplifies initialization and cleanup logic
by leveraging `Option::get_or_insert_with` and `Option::take`.
- Updated `PendingBatch` fields `created_at` and `ctx` to be non-optional.
- Modified `drain_batch` to take `&mut Option<PendingBatch>` and return the
drained batch, removing the need for `flush_with_error`.
- Simplified the worker loop logic for batch creation and flushing.
- Added a unit test `test_drain_batch_takes_initialized_pending_batch_from_option`
to verify the new draining logic.
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* perf/schema-align: share errors across waiters using `Arc<Error>`
Enhanced error reporting in `PendingRowsBatcher` by using `Arc<Error>` in
`FlushWaiter` and `WorkerCommand`. This allows the same error instance to be
shared among all waiters of a batch, avoiding redundant error string conversions
and providing more structured error information.
- Added `SubmitBatch` variant to `Error` in `error.rs`.
- Updated `FlushWaiter` and `WorkerCommand` to use `std::result::Result<(), Arc<Error>>`.
- Refactored `notify_waiters` to distribute the shared `Arc<Error>`.
- Added `SubmitBatchSnafu` context when receiving results from the worker.
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* perf/schema-align: export types for benchmarking
Exported several internal types and traits from `pending_rows_batcher.rs` to enable
external benchmarking of the physical batch flushing logic.
- Made `PhysicalTableMetadata`, `PhysicalFlushCatalogProvider`,
`PhysicalFlushPartitionProvider`, `PhysicalFlushNodeRequester`,
`TableBatch`, and `flush_batch_physical` public.
- Added a new criterion benchmark `flush_batch_physical.rs` to measure the
performance of physical batch flushing with varying numbers of logical
tables and rows per table.
- Registered the new benchmark in `src/servers/Cargo.toml`.
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* fix: typo
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* refactor(servers): improve error handling and documentation in batcher
Refactored error handling in `pending_rows_batcher.rs` by using `ArrowSnafu`
for RecordBatch projection errors and simplified partition rule fetching.
Added comprehensive documentation for `flush_batch_physical` and updated
error display for `SubmitBatch`.
- Added `Location` to `Arrow` error variant for better traceability.
- Updated `SubmitBatch` display to include source error.
- Replaced manual error mapping with `context(error::ArrowSnafu)` in
`strip_partition_columns_from_batch`.
- Added doc comments to `flush_batch_physical` outlining the pipeline steps.
- Optimized capacity allocation in `transform_logical_batches_to_physical`.
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* refactor(servers): clarify physical table metadata and simplify planned batch
Renamed `name_to_ids` to `col_name_to_ids` in `PhysicalTableMetadata` to
better reflect its purpose. Refactored `PlannedRegionBatch` to use a
`num_rows()` method instead of storing a redundant `row_count` field.
- Updated `PhysicalTableMetadata` and its usages in `pending_rows_batcher.rs`
and benchmarks.
- Removed `row_count` field from `PlannedRegionBatch` and added a `num_rows()`
helper.
- Cleaned up manual `with_context` closures for table lookups.
- Fixed a minor formatting issue in worker command processing.
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* refactor(servers): simplify flush write structs and centralize metrics
Removed redundant `row_count` fields from `FlushRegionWrite` and
`PlannedRegionBatch` (made the helper method test-only). Centralized the
incrementing of `FLUSH_TOTAL` and `FLUSH_ROWS` metrics into `flush_batch`
to avoid duplication and ensure consistency.
- Removed `row_count` from `FlushRegionWrite` and `PlannedRegionBatch`.
- Marked `PlannedRegionBatch::num_rows()` as `#[cfg(test)]`.
- Updated `flush_batch` to handle `FLUSH_TOTAL` and `FLUSH_ROWS` metrics.
- Simplified concurrent and sequential flush logic by removing local metric
updates.
- Cleaned up related tests to match the structural changes.
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
---------
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* feat/auto-schema-align:
- **Error Handling Improvements**:
- Removed `CatalogSnafu` context from various `.await` calls in `dashboard.rs`, `influxdb.rs`, `jaeger.rs`, `prometheus.rs`, `event.rs`, and `pipeline.rs` to streamline error handling.
- **Prometheus Store Enhancements**:
- Added support for auto-creating tables and adding missing Prometheus tag columns in `prom_store.rs` and `pending_rows_batcher.rs`.
- Introduced `PendingRowsSchemaAlterer` trait for schema alterations in `pending_rows_batcher.rs`.
- **Test Additions**:
- Added tests for new Prometheus store functionalities in `prom_store.rs` and `pending_rows_batcher.rs`.
- **Error Message Improvements**:
- Enhanced error messages for catalog access in `error.rs`.
- **Server Configuration Updates**:
- Updated server configuration to include Prometheus store options in `server.rs`.
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* reformat
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* feat/auto-schema-align:
### Add DataTypes Error Handling and Column Renaming Logic
- **`error.rs`**: Introduced a new `DataTypes` error variant to handle errors from `datatypes::error::Error`. Updated `ErrorExt` implementation to include `DataTypes`.
- **`pending_rows_batcher.rs`**: Added functions `find_prom_special_column_names` and `rename_prom_special_columns_for_existing_schema` to handle renaming of special Prometheus columns. Updated `build_prom_create_table_schema` to simplify error handling with
`ConcreteDataType`.
- **Tests**: Added a test case `test_rename_prom_special_columns_for_existing_schema` to verify the renaming logic for Prometheus special columns.
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* feat/auto-schema-align:
- Refactored `PendingRowsBatcher` to accommodate Prometheus record batches:
- Introduced `accommodate_record_batch_for_target_schema` to normalize incoming record batches against existing table schemas.
- Removed `collect_missing_prom_tag_columns` and `rename_prom_special_columns_for_existing_schema` in favor of the new function.
- Added `unzip_logical_region_schema` to extract schema components.
- Updated tests in `pending_rows_batcher.rs`:
- Added tests for `accommodate_record_batch_for_target_schema` to verify handling of missing tag columns and renaming of special columns.
- Ensured error handling for missing timestamp and field columns in target schema.
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* feat/auto-schema-align:
### Commit Summary
- **Enhancement in Table Creation Logic**: Updated `prom_store.rs` to modify the handling of `table_options` during table creation. Specifically, `table_options` are now extended differently based on the `AutoCreateTableType`. For `Physical` tables, enforced
`sst_format=flat` to optimize pending-rows writes by leveraging bulk memtables.
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* feat/auto-schema-align:
Enhance Performance Monitoring in `pending_rows_batcher.rs`
- Added performance monitoring timers to various stages of the `PendingRowsBatcher` process, including schema cache checks, table resolution, schema creation, and record batch alignment.
- Improved schema handling by adding timers around schema alteration and missing column addition processes.
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* feat/auto-schema-align:
- **Enhance Concurrent Write Handling**: Introduced `FlushRegionWrite` and `FlushWriteResult` structs to manage region writes and their results. Added `flush_region_writes_concurrently` function to handle concurrent flushing of region writes based on
`should_dispatch_concurrently` logic in `pending_rows_batcher.rs`.
- **Testing Enhancements**: Added tests for concurrent dispatching of region writes and the logic for determining concurrent dispatch in `pending_rows_batcher.rs`.
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* feat/auto-schema-align:
### Add Histogram for Flush Stage Elapsed Time
- **`metrics.rs`**: Introduced a new `HistogramVec` named `PENDING_ROWS_BATCH_FLUSH_STAGE_ELAPSED` to track the elapsed time of pending rows batch flush stages.
- **`pending_rows_batcher.rs`**: Replaced instances of `PENDING_ROWS_BATCH_INGEST_STAGE_ELAPSED` with `PENDING_ROWS_BATCH_FLUSH_STAGE_ELAPSED` to measure the elapsed time for various flush stages, including `flush_write_region`, `flush_concat_table_batches`,
`flush_resolve_table`, `flush_fetch_partition_rule`, `flush_split_record_batch`, `flush_filter_record_batch`, `flush_resolve_region_leader`, and `flush_encode_ipc`.
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* Add design doc for physical table batching in PendingRowsBatcher
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* Add implementation plan for physical table batching in PendingRowsBatcher
* feat/auto-schema-align:
### Commit Message
**Enhance Metric Engine with Physical Batch Processing**
- **Add `metric-engine` Dependency**: Updated `Cargo.lock` and `Cargo.toml` to include `metric-engine` as a workspace dependency.
- **Expose Batch Modifier Functions**: Changed visibility of `TagColumnInfo`, `compute_tsid_array`, and `modify_batch_sparse` in `batch_modifier.rs` to public, and made `batch_modifier` a public module in `lib.rs`.
- **Implement Physical Batch Processing**:
- Added functions `bulk_insert_physical_region` and `bulk_insert_logical_region` in `bulk_insert.rs` to handle physical and logical batch insertions.
- Updated `pending_rows_batcher.rs` to attempt physical batch processing before falling back to logical processing, including new functions `flush_batch_physical` and `flush_batch_per_logical_table`.
- **Enhance Testing**:
- Added tests for physical region passthrough and empty batch handling in `bulk_insert.rs`.
- Introduced `with_mito_config` in `test_util.rs` for customized test environments.
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* feat/auto-schema-align:
### Enhance Batch Processing for Table Creation and Alteration
- **`prom_store.rs`**:
- Added `create_tables_if_missing_batch` and
`add_missing_prom_tag_columns_batch` methods to handle batch creation of tables
and batch alteration to add missing tag columns.
- Implemented logic to determine missing tables and columns, and perform batch
operations accordingly.
- **`pending_rows_batcher.rs`**:
- Updated `PendingRowsBatcher` to utilize batch methods for creating tables an
adding missing columns.
- Enhanced logic to resolve table schemas and accommodate record batches after
batch operations.
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* perf: concurrent catalog lookups and eliminate redundant concat_batches on ingest path
Replace sequential catalog_manager.table() calls with concurrent
futures::future::join_all in align_table_batches_to_region_schema.
This affects all three lookup loops: initial table resolution,
post-create resolution, and post-alter schema refresh. Reduces
O(N) sequential RPC latency to O(1) wall-clock time for requests
with many distinct logical tables (e.g. Prometheus remote_write).
Remove the per-logical-table concat_batches in flush_batch_physical.
Instead of merging all chunks of a table into one RecordBatch before
calling modify_batch_sparse, apply modify_batch_sparse directly to
each chunk and collect all modified chunks for a single final concat.
This eliminates one full data copy per logical table on the flush path.
* refactor: extract Prometheus schema alignment helpers into prom_row_builder module
Move six functions and their eight unit tests from pending_rows_batcher.rs
(~2386 lines) into a new prom_row_builder.rs module (~776 lines), leaving
the batcher at ~1665 lines focused on flush/worker machinery.
Extracted functions:
- accommodate_record_batch_for_target_schema (normalize incoming batch
against existing table schema)
- unzip_logical_region_schema (extract ts/field/tag columns)
- build_prom_create_table_schema (build ColumnSchema vec for table creation)
- align_record_batch_to_schema (reorder/fill/cast columns to target schema)
- rows_to_record_batch (convert proto Rows to Arrow RecordBatch)
- build_arrow_array (build Arrow arrays from proto values)
Cleaned up 12 now-unused imports from pending_rows_batcher.rs.
* feat/auto-schema-align:
### Enhance `PendingRowsBatcher` and `prom_row_builder` for Efficient Schema Handling
- **`pending_rows_batcher.rs`:**
- Refactored `submit` method to integrate table batch building and alignment into a single method `build_and_align_table_batches`.
- Removed intermediate `RecordBatch` creation, optimizing the process by directly converting proto `RowInsertRequests` into aligned `RecordBatch`es.
- Enhanced schema handling by identifying missing columns directly from proto schemas.
- **`prom_row_builder.rs`:**
- Introduced `rows_to_aligned_record_batch` for direct conversion of proto `Rows` into aligned `RecordBatch`es.
- Added `identify_missing_columns_from_proto` to detect absent tag columns without intermediate `RecordBatch`.
- Implemented `build_prom_create_table_schema_from_proto` to construct table schemas directly from proto schemas.
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* feat/auto-schema-align:
Add elapsed time metrics for bulk insert operations
- Updated `bulk_insert` method in `bulk_insert.rs` to record elapsed time metrics using `MITO_OPERATION_ELAPSED` for both physical and logical regions.
- Added a new test `test_bulk_insert_records_elapsed_metric` to verify that the elapsed time metric is recorded correctly during bulk insert operations.
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* remove flush per logical region
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* feat/auto-schema-align:
**Refactor `flush_batch` and `flush_batch_physical` functions**
- Removed unused `catalog` and `schema` variables from `flush_batch` in `pending_rows_batcher.rs`.
- Updated `flush_batch_physical` to directly use `ctx.current_catalog()` and `ctx.current_schema()` for resolving table names.
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* feat/auto-schema-align:
### Remove Unused Function and Associated Test
- **File:** `src/servers/src/prom_row_builder.rs`
- Removed the unused function `build_prom_create_table_schema` which was responsible for building a `Vec<ColumnSchema>` from an Arrow schema.
- Deleted the associated test `test_build_prom_create_table_schema_from_request_schema` that validated the removed function.
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* feat/auto-schema-align:
- **Remove Test**: Deleted the `test_bulk_insert_records_elapsed_metric` test from `bulk_insert.rs`.
- **Refactor Table Resolution**: Introduced `TableResolutionPlan` struct and refactored table resolution logic in `pending_rows_batcher.rs`.
- **Enhance Table Handling**: Added functions for collecting non-empty table rows, unique table schemas, and handling table creation and alteration in `pending_rows_batcher.rs`.
- **Add Tests**: Implemented tests for `collect_non_empty_table_rows` and `collect_unique_table_schemas` in `pending_rows_batcher.rs`.
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* feat/auto-schema-align:
- **Refactor Error Handling**: Updated error handling in `pending_rows_batcher.rs` and `prom_row_builder.rs` to use `Snafu` error context for more descriptive error messages.
- **Remove Unused Functionality**: Eliminated the `rows_to_record_batch` function and related test in `prom_row_builder.rs` as it was redundant.
- **Simplify Function Return Types**: Modified `rows_to_aligned_record_batch` in `prom_row_builder.rs` to return only `RecordBatch` without missing columns, simplifying the function's interface and related tests.
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* feat/auto-schema-align:
### Add Helper Function for Table Options in `prom_store.rs`
- Introduced `fill_metric_physical_table_options` function to encapsulate logic for setting table options, ensuring the use of flat SST format and physical table metadata.
- Updated `Instance` implementation to utilize the new helper function for setting table options.
- Added a unit test `test_metric_physical_table_options_forces_flat_sst_format` to verify the correct application of table options.
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* feat/auto-schema-align:
- **Refactor `PendingRowsBatcher`**: Simplified worker retrieval logic in `get_or_spawn_worker` method by using a more concise conditional check.
- **Metrics Update**: Added `PENDING_ROWS_BATCH_FLUSH_STAGE_ELAPSED` metric in `pending_rows_batcher.rs`.
- **Remove Unused Code**: Deleted multiple test functions related to record batch alignment and schema preparation in `pending_rows_batcher.rs` and `prom_row_builder.rs`.
- **Function Visibility Change**: Made `build_prom_create_table_schema_from_proto` public in `prom_row_builder.rs`.
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* chore: remove plan
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* feat/auto-schema-align:
### Refactor and Simplify Schema Alteration Logic
- **Removed Unused Methods**: Deleted `create_table_if_missing` and `add_missing_prom_tag_columns` methods from `PendingRowsSchemaAlterer` trait in `prom_store.rs` and `pending_rows_batcher.rs`.
- **Error Handling Improvement**: Enhanced error handling in `create_tables_if_missing_batch` method to return a specific error message for unsupported `AutoCreateTableType` in `prom_store.rs`.
- **Visibility Change**: Made `as_str` method public in `AutoCreateTableType` enum in `insert.rs` to support external access.
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* feat/auto-schema-align:
### Commit Message
Improve safety in `prom_row_builder.rs`
- Updated `unzip_logical_region_schema` to use `saturating_sub` for safer capacity calculation of `tag_columns`.
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* feat/auto-schema-align:
Add TODO comments for future improvements in `pending_rows_batcher.rs`
- Added a TODO comment to consider bounding the `flush_region_writes_concurrently` function.
- Added a TODO comment to potentially limit the maximum rows to concatenate in the `flush_batch_physical` function.
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* feat/auto-schema-align:
### Commit Message
Enhance error handling in `pending_rows_batcher.rs`
- Updated `collect_unique_table_schemas` to return a `Result` type, enabling error handling for duplicate table names.
- Modified the function to return an error when duplicate table names are found in `table_rows`.
- Adjusted test cases to handle the new `Result` return type in `collect_unique_table_schemas`.
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* feat/auto-schema-align:
- **Refactor `partition_columns` Method**: Updated the `partition_columns` method in `multi_dim.rs`, `partition.rs`, and `splitter.rs` to return a slice reference instead of a cloned vector, improving performance by avoiding unnecessary cloning.
- **Enhance Partition Handling**: Added functions `collect_tag_columns_and_non_tag_indices` and `strip_partition_columns_from_batch` in `pending_rows_batcher.rs` to manage partition columns more efficiently, including stripping partition columns from record batches.
- **Update Tests**: Modified existing tests and added new ones in `pending_rows_batcher.rs` to verify the functionality of partition column handling, ensuring correct behavior of the new methods.
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* feat/auto-schema-align:
### Enhance Schema Handling and Validation in `pending_rows_batcher.rs`
- **Schema Validation Enhancements**:
- Added checks for essential columns (`timestamp`, `value`) in `collect_tag_columns_and_non_tag_indices`.
- Introduced `PHYSICAL_REGION_ESSENTIAL_COLUMN_COUNT` to ensure minimum column count in `strip_partition_columns_from_batch`.
- Improved error handling for unexpected data types and duplicated columns.
- **Function Modifications**:
- Updated `strip_partition_columns_from_batch` to project essential columns without lookup.
- Modified `flush_batch_physical` to use `essential_col_indices` instead of `non_tag_indices`.
- **Test Enhancements**:
- Added tests for schema validation, including checks for unexpected data types and duplicated columns.
- Verified correct projection of essential columns in `strip_partition_columns_from_batch`.
Files affected: `pending_rows_batcher.rs`, `tests`.
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* feat/auto-schema-align:
- **Add `smallvec` Dependency**: Updated `Cargo.lock` and `Cargo.toml` to include `smallvec` as a workspace dependency.
- **Refactor Function**: Renamed `collect_tag_columns_and_non_tag_indices` to `columns_taxonomy` in `pending_rows_batcher.rs` and updated its return type to use `SmallVec`.
- **Update Tests**: Modified test cases in `pending_rows_batcher.rs` to reflect changes in function name and return type.
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* feat/auto-schema-align:
**Refactor `pending_rows_batcher.rs` to Simplify Table ID Handling**
- Updated `TableBatch` struct to use `TableId` directly instead of `Option<u32>` for `table_id`.
- Simplified logic in `flush_batch_physical` by removing the check for `None` in `table_id`.
- Adjusted related logic in `start_worker` to accommodate the change in `table_id` handling.
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* feat/auto-schema-align:
### Enhance Batch Processing Logic
- **`pending_rows_batcher.rs`**:
- Moved column taxonomy resolution inside the loop to handle schema variations across batches.
- Added checks to skip processing if both tag columns and essential column indices are empty.
- **Tests**:
- Added `test_modify_batch_sparse_with_taxonomy_per_batch` to verify batch modification logic with varying schemas.
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* feat/auto-schema-align:
### Remove Primary Key Column Check in `pending_rows_batcher.rs`
- Removed the check for the primary key column and other essential column names in the function `strip_partition_columns_from_batch` within `pending_rows_batcher.rs`.
- Simplified the logic by eliminating the validation of column order against expected essential names.
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* feat/auto-schema-align:
### Refactor error handling and iteration in `otlp.rs` and `pending_rows_batcher.rs`
- **`otlp.rs`**: Simplified error handling by removing `CatalogSnafu` context when awaiting table retrieval.
- **`pending_rows_batcher.rs`**: Streamlined iteration over tables by removing unnecessary `into_iter()` calls, improving code readability and efficiency.
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* chore/metrics-for-bulk:
Add timing metrics for batch processing in `pending_rows_batcher.rs`
- Introduced `modify_elapsed` and `columns_taxonomy_elapsed` to measure time spent in `modify_batch_sparse` and `columns_taxonomy` functions.
- Updated `flush_batch_physical` to record these metrics using `PENDING_ROWS_BATCH_FLUSH_STAGE_ELAPSED`.
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* feat/auto-schema-align:
### Commit Summary
- **Remove Unused Code**: Eliminated the `#[allow(dead_code)]` attribute from the `compute_tsid_array` function in `batch_modifier.rs`.
- **Error Handling Improvement**: Enhanced error handling in `flush_batch_physical` function by adjusting the `match` block in `pending_rows_batcher.rs`.
- **Simplify Logic**: Streamlined the logic in `rows_to_aligned_record_batch` by removing unnecessary type casting in `prom_row_builder.rs`.
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* feat/auto-schema-align:
**Refactor `flush_batch_physical` in `pending_rows_batcher.rs`:**
- Moved partition column stripping logic to a single location before processing region batches.
- Updated the use of `combined_batch` to `stripped_batch` for consistency in batch processing.
- Removed redundant partition column stripping logic within the region batch loop.
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* feat/auto-schema-align:
### Update `batch_modifier.rs` Documentation and Parameter Naming
- Enhanced documentation for `compute_tsid_array` and `modify_batch_sparse` functions to clarify their logic and parameters.
- Renamed parameter `non_tag_column_indices` to `extra_column_indices` in `modify_batch_sparse` for better clarity.
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
---------
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
fix: avoid cloning serialized view plan on resolve
- Change `ViewInfoValue.view_info` from `Vec<u8>` to
`common_base::bytes::Bytes` so resolving a view no longer clones
the full serialized plan buffer on every decode.
- To keep the change narrow, the metadata write boundary still
accepts `Vec<u8>` and converts once when constructing/updating
`ViewInfoValue`. The hot read path now uses a cheap clone of the
stored bytes.
- The benches introduced revealed up to 82% resolution time
improvment.
Signed-off-by: Yao ACHI <achi.noel@hotmail.com>
* fix(datatypes): compare ConstantVector rhs inner in vector equality
When either operand is a ConstantVector, the recursive equal() call must
compare lhs.inner() against rhs.inner(). The second argument incorrectly
used lhs twice, breaking equality when only the rhs was constant.
Signed-off-by: Weixie Cui <cuiweixie@gmail.com>
* fix: review
Signed-off-by: Weixie Cui <cuiweixie@gmail.com>
---------
Signed-off-by: Weixie Cui <cuiweixie@gmail.com>
* bench(promql): add range-function benchmark suite
* perf(promql): use flat buffers in range function hot loops
* perf(promql): reuse quantile scratch buffers
* feat: metric batch 2s PoC
Signed-off-by: jeremyhi <fengjiachun@gmail.com>
* chore: max_concurrent_flushes
Signed-off-by: jeremyhi <fengjiachun@gmail.com>
* chore: work channel size
Signed-off-by: jeremyhi <fengjiachun@gmail.com>
* feat(servers): add metrics and logs for pending rows batch flush
Add the `FLUSH_ELAPSED` histogram metric to track the duration of pending
rows batch flushes in the Prometheus store protocol handler. This provides
better observability into the performance and latency of the batcher.
Also update telemetry by:
- Recording elapsed time for both successful and failed flush operations.
- Adding an informational log upon successful flush including row count and duration.
- Including elapsed time in error logs when a flush fails.
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* feat(servers): implement columnar batching for pending rows
Refactor PendingRowsBatcher to use columnar batching for the metrics
store. Incoming RowInsertRequests are now converted to RecordBatches,
partitioned, and flushed via BulkInsert requests to datanodes.
- Enhance MultiDimPartitionRule to handle scalar boolean predicates.
- Add metrics for tracking flush failures and dropped rows.
- Update dependencies to support columnar batching in servers.
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* feat(servers): add backpressure for pending rows
Implement backpressure in PendingRowsBatcher by limiting in-flight
requests with a semaphore and making the submission wait for the flush
result. This ensures Prometheus write requests are throttled and only
return once the data has been successfully flushed to datanodes.
- Add max_inflight_requests to PromStoreOptions.
- Use oneshot channels to notify submitters of flush completion.
- Limit concurrent requests using a new inflight_semaphore.
- Update PendingRowsBatcher::submit to wait for the flush outcome.
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* feat: add stage-level metrics for bulk ingestion
Introduce histograms to track the elapsed time of various stages in the
metric engine bulk insert path and the server's pending rows batcher.
This provides better observability into the performance bottlenecks
of the ingestion pipeline.
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* - `src/metric-engine/src/engine/bulk_insert.rs`: Removed the fallback mechanism that converted record batches to rows when bulk inserts were unsupported, along with related helper functions and unused imports.
- `src/operator/src/insert.rs`: Removed an unused import (`common_time::TimeToLive::Instant`).
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* feat(servers): columnar Prom remote write
Optimize the Prometheus remote write path by allowing direct conversion
from decoded Prometheus samples to Arrow RecordBatches. This bypasses
intermediate row-based representations when `PendingRowsBatcher` is
active and no pipeline is used, improving ingestion efficiency.
- Implement `as_record_batch_groups` in `TablesBuilder` and `PromWriteRequest`.
- Add `submit_prom_record_batch_groups` to `PendingRowsBatcher`.
- Introduce `DecodedPromWriteRequest` in `prom_store`.
- Implement row-to-RecordBatch conversion logic in `prom_row_builder`.
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* Revert "feat(servers): columnar Prom remote write"
This reverts commit efbb63c12a3e7fcec03858ea0351efd94fec8242.
* refactor(servers): improve row to RecordBatch conversion
- Use `snafu::ensure` for row validation in `rows_to_record_batch`.
- Add explicit type hint for `MutableVector` to improve clarity.
- Reorganize and clean up imports in `pending_rows_batcher.rs`.
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* perf(servers): use arrow builders for row conversion
This commit optimizes the conversion from `api::v1::Rows` to `RecordBatch`
by using Arrow builders directly. This avoids the overhead of
`MutableVector` and `common_recordbatch`, leading to better performance
in the `pending_rows_batcher`.
Additionally, the `#[allow(dead_code)]` attribute is removed from
`modify_batch_sparse` in the metric engine as it is now utilized.
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* perf(metric-engine): optimize batch modification
Optimize `modify_batch_sparse` by reusing buffers, using Arrow
builders, and employing fast-path encoding methods. This reduces
allocations and avoids redundant downcasting and serializer overhead.
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* feat/metric-engine-support-bulk:
**Add Environment Variable for Batch Sync Control**
- `pending_rows_batcher.rs`: Introduced an environment variable `PENDING_ROWS_BATCH_SYNC` to control the synchronization behavior of batch processing. If set to true, the function will wait for the flush result; otherwise, it will return immediatel
with the total rows count.
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* wip
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* chore: update and fix clippy
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* fix: failing test
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* picking-pending-rows-batcher:
### Commit Message
Remove Unused Code and Simplify Error Handling
- **`src/error.rs`**: Removed the `BatcherQueueFull` error variant and its associated logic, simplifying the error handling by removing unused code.
- **`src/http/prom_store.rs`**: Eliminated the `try_decompress` function, streamlining the decompression logic by directly using `snappy_decompress` in `decode_remote_read_request`.
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* chore: parse PENDING_ROWS_BATCH_SYNC once
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* chore: revert unrelated changes
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* **Refactor Prometheus Write Handling**
- **`prom_store.rs`**: Introduced `pre_write` method in `PromStoreProtocolHandler` to handle pre-write checks for Prometheus remote write requests. Updated `write` method to utilize `pre_write`.
- **`server.rs`**: Modified `PendingRowsBatcher` initialization to conditionally create a batcher based on `with_metric_engine` flag.
- **`http/prom_store.rs`**: Integrated `pre_write` checks before submitting requests to `PendingRowsBatcher`.
- **`query_handler.rs`**: Added `pre_write` method to `PromStoreProtocolHandler` trait for pre-write operations.
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* picking-pending-rows-batcher:
- **Fix Label Typo**: Corrected a typo in the label value from `"flush_wn ite_region"` to `"flush_write_region"` in `pending_rows_batcher.rs`.
- **Refactor Array Building Logic**: Introduced a macro `build_array!` to streamline the construction of `ArrayRef` for different data types, reducing code duplication in `pending_rows_batcher.rs`.
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* format toml
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* picking-pending-rows-batcher:
### Update PromStore and PendingRowsBatcher Configuration
- **`prom_store.rs`**: Set `pending_rows_flush_interval` to `Duration::ZERO` to disable automatic flushing.
- **`pending_rows_batcher.rs`**: Enhance validation to disable the batcher when `flush_interval` is zero or configuration values like `max_batch_rows`, `max_concurrent_flushes`, `worker_channel_capacity`, or `max_inflight_requests` are zero, preventing potential panics or deadlocks.
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* picking-pending-rows-batcher:
### Update `pending_rows_flush_interval` to Zero
- **Files Modified**:
- `src/frontend/src/service_config/prom_store.rs`
- `tests-integration/tests/http.rs`
- **Key Changes**:
- Updated `pending_rows_flush_interval` from `Duration::from_secs(2)` to `Duration::ZERO` in `prom_store.rs`.
- Changed `pending_rows_flush_interval` configuration from `"2s"` to `"0s"` in `http.rs`.
These changes set the flush interval to zero, potentially affecting how frequently pending rows are flushed.
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* picking-pending-rows-batcher:
**Add Worker Management Enhancements**
- **`metrics.rs`**: Introduced `PENDING_WORKERS` gauge to track active pending rows batch workers.
- **`pending_rows_batcher.rs`**:
- Added worker idle timeout logic with `WORKER_IDLE_TIMEOUT_MULTIPLIER`.
- Implemented worker management functions: `spawn_worker`, `remove_worker_if_same_channel`, and `should_close_worker_on_idle_timeout`.
- Enhanced worker lifecycle management to handle idle workers and ensure proper cleanup.
- **Tests**: Added unit tests for worker removal and idle timeout logic.
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* fix: clippy
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
---------
Signed-off-by: jeremyhi <fengjiachun@gmail.com>
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
Co-authored-by: jeremyhi <fengjiachun@gmail.com>
Allow flush edits with equal entry ids when flushed sequence advances, so close-time flush after truncate still succeeds for skip-wal regions while stale pre-truncate flushes are rejected. Add a regression test for create->truncate->write->close timing.
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>