greptimedb

mirror of https://github.com/GreptimeTeam/greptimedb.git synced 2026-05-25 17:30:41 +00:00

Author	SHA1	Message	Date
Yvan Wang	d1873ca31d	fix(metric-engine): validate column types and require time index in verify_rows (#8018 ) * fix(metric-engine): validate column types and require time index in verify_rows The remote-write path into the metric engine previously bypassed schema validation. When a row's time index column carried a non-timestamp datatype (e.g. a string), the request reached mito's ValueBuilder::push for the timestamp builder and panicked instead of surfacing a typed error. Cache the (column_id, data_type, semantic_type) tuple for each physical column on PhysicalRegionState and use it in verify_rows to: - reject columns whose datatype or semantic type disagrees with the physical region's schema (mirrors mito's WriteRequest::check_schema) - reject requests that omit the time index column entirely Field columns stay optional; tag completeness needs per-logical-region metadata that verify_rows doesn't have and is left to a follow-up. Fixes #7990. Signed-off-by: BootstrapperSBL <yvanwww01@gmail.com> * refactor(metric-engine): simplify PhysicalColumnInfo construction - Add From<ColumnMetadata> and From<&ColumnMetadata> for PhysicalColumnInfo so call sites can use metadata.into() instead of repeating the field list. - Replace the four struct-literal constructions in create.rs, open.rs and alter.rs with the conversion. - In verify_rows, pass &col.column_name to ColumnNotFoundSnafu instead of cloning it explicitly (snafu's context handles the conversion). Signed-off-by: BootstrapperSBL <yvanwww01@gmail.com> * perf(metric-engine): cache time index column name in PhysicalRegionState verify_rows previously scanned every physical column on each row batch to find the timestamp column. Since the time index is fixed at region creation and never changes, stash its name on PhysicalRegionState when the region is first registered and read it directly from there. add_physical_columns carries a debug_assert to document the invariant that alter never introduces a new time index. Signed-off-by: BootstrapperSBL <yvanwww01@gmail.com> * perf(metric-engine): borrow physical column names when building name_to_id On the row-write path we built a HashMap<String, ColumnId> by cloning every column name out of the physical region's cached state. The map is scoped to the block that holds the state's read guard, so there's no need to own the keys. Switch the map to HashMap<&str, ColumnId> and widen RowsIter::new / IterIndex::new to accept any key type that borrows as str. Existing test helpers that pass HashMap<String, ColumnId> keep working through the Borrow<str> bound. Signed-off-by: BootstrapperSBL <yvanwww01@gmail.com> * fix: validate metric rows against physical schema Cache physical column metadata in the metric engine state so row validation and row modification can use the same source of truth for column IDs, data types, and semantic types. Validate incoming metric rows against the physical schema before writes. Put requests now require the time index and the expected field column, while delete requests keep accepting primary-key-plus-timestamp payloads by skipping the field completeness check. Pass physical column metadata directly into RowsIter instead of rebuilding a name-to-column-id map at each call site, and cover the new validation paths with tests for missing time indexes, missing fields, and duplicate field columns. Signed-off-by: evenyag <realevenyag@gmail.com> * fix: do not allow adding a new field Signed-off-by: evenyag <realevenyag@gmail.com> * fix: fill default value for fields Signed-off-by: evenyag <realevenyag@gmail.com> * fix: fill default for nullable fields Signed-off-by: evenyag <realevenyag@gmail.com> --------- Signed-off-by: BootstrapperSBL <yvanwww01@gmail.com> Signed-off-by: evenyag <realevenyag@gmail.com> Co-authored-by: BootstrapperSBL <yvanwww01@gmail.com> Co-authored-by: evenyag <realevenyag@gmail.com>	2026-05-07 12:41:07 +00:00
QuakeWang	45e990b7f3	refactor: propagate flush reasons through FlushRegions path (#8051 ) * feat: propagate flush reasons through FlushRegions path Signed-off-by: QuakeWang <wangfuzheng0814@foxmail.com> * refactor: address flush reason review feedback Signed-off-by: QuakeWang <wangfuzheng0814@foxmail.com> * refactor: keep flush instruction helper name Signed-off-by: QuakeWang <wangfuzheng0814@foxmail.com> --------- Signed-off-by: QuakeWang <wangfuzheng0814@foxmail.com>	2026-05-01 02:28:55 +00:00
Ning Sun	b8951a3514	feat: persist our column_id to parquet field_id (#8032 ) * feat: persist our column_id to parquet field_id * refactor: avoid clone field when possible * chore: fmt * chore: address style suggestions	2026-04-30 15:40:24 +00:00
shuiyisong	0effc30778	chore: update the opendal to 0.56 rc2 (#8003 ) * chore: update opendal version Signed-off-by: shuiyisong <xixing.sys@gmail.com> * chore: update opendal version Signed-off-by: shuiyisong <xixing.sys@gmail.com> * chore: fix test Signed-off-by: shuiyisong <xixing.sys@gmail.com> * fix: grpc init Signed-off-by: shuiyisong <xixing.sys@gmail.com> * fix: dep versions Signed-off-by: shuiyisong <xixing.sys@gmail.com> * fix: remove aws-lc-rs in reqwest Signed-off-by: shuiyisong <xixing.sys@gmail.com> * chore: rebase main and fix compile Signed-off-by: shuiyisong <xixing.sys@gmail.com> * fix: remove unused deps Signed-off-by: shuiyisong <xixing.sys@gmail.com> * Revert "fix: remove aws-lc-rs in reqwest" This reverts commit `90bfafca06`. * chore: remove aws-lc-sys from blacklist Signed-off-by: shuiyisong <xixing.sys@gmail.com> * chore: fix sqlness Signed-off-by: shuiyisong <xixing.sys@gmail.com> * chore: add tls deps Signed-off-by: shuiyisong <xixing.sys@gmail.com> * fix: idemptent install in rds Signed-off-by: shuiyisong <xixing.sys@gmail.com> * fix: test Signed-off-by: shuiyisong <xixing.sys@gmail.com> * chore: use aws-lc-sys as possible Signed-off-by: shuiyisong <xixing.sys@gmail.com> * fix: lint Signed-off-by: shuiyisong <xixing.sys@gmail.com> * fix: address comments Signed-off-by: shuiyisong <xixing.sys@gmail.com> * chore: address CR issue Signed-off-by: shuiyisong <xixing.sys@gmail.com> Signed-off-by: evenyag <realevenyag@gmail.com> * fix: sync opendal compat adapter with upstream Signed-off-by: evenyag <realevenyag@gmail.com> * fix: address compat clippy warnings Signed-off-by: evenyag <realevenyag@gmail.com> --------- Signed-off-by: shuiyisong <xixing.sys@gmail.com> Signed-off-by: evenyag <realevenyag@gmail.com> Co-authored-by: evenyag <realevenyag@gmail.com>	2026-04-26 09:59:48 +00:00
fys	c90e4147de	refactor: introduce the ProjectInput structure (#7908 ) * refactor: introduce the ProjectInput structure * remove unused import * fix: cr * fix: cr * fix: code review * add more unit test * avoid clone of input.projection	2026-04-14 09:29:33 +00:00
fys	62013217c7	fix: cargo check -p common-meta (#7964 ) fix: moka feature	2026-04-14 08:27:22 +00:00
Yingwen	233e35c0c9	feat!: switch default sst format to flat (#7909 ) * feat: support alter from primary_key to flat Signed-off-by: evenyag <realevenyag@gmail.com> * chore: alter flat to primary_key Signed-off-by: evenyag <realevenyag@gmail.com> * feat: change default_experimental_flat_format to true Signed-off-by: evenyag <realevenyag@gmail.com> * feat: compute channel size from splitted batch size Signed-off-by: evenyag <realevenyag@gmail.com> * test: add tests for split and channel size Signed-off-by: evenyag <realevenyag@gmail.com> * fix: always set sst_format from manifest on region open sanitize_region_options did not set options.sst_format when the default (PrimaryKey) matched the manifest value, leaving it as None after reopen. This caused the alter format change to appear lost. Signed-off-by: evenyag <realevenyag@gmail.com> * test: fix tests Signed-off-by: evenyag <realevenyag@gmail.com> * test: show create table after alteration Signed-off-by: evenyag <realevenyag@gmail.com> * refactor!: rename default_experimental_flat_format to default_flat_format The flat format is no longer experimental. Remove "experimental" from the config field name, doc comments, and all references. Signed-off-by: evenyag <realevenyag@gmail.com> * chore: fix clippy Signed-off-by: evenyag <realevenyag@gmail.com> --------- Signed-off-by: evenyag <realevenyag@gmail.com>	2026-04-03 04:14:02 +00:00
Lei, HUANG	2b4e12c358	feat: auto-align Prometheus schemas in pending rows batching (#7877 ) * feat/auto-schema-align: - Error Handling Improvements: - Removed `CatalogSnafu` context from various `.await` calls in `dashboard.rs`, `influxdb.rs`, `jaeger.rs`, `prometheus.rs`, `event.rs`, and `pipeline.rs` to streamline error handling. - Prometheus Store Enhancements: - Added support for auto-creating tables and adding missing Prometheus tag columns in `prom_store.rs` and `pending_rows_batcher.rs`. - Introduced `PendingRowsSchemaAlterer` trait for schema alterations in `pending_rows_batcher.rs`. - Test Additions: - Added tests for new Prometheus store functionalities in `prom_store.rs` and `pending_rows_batcher.rs`. - Error Message Improvements: - Enhanced error messages for catalog access in `error.rs`. - Server Configuration Updates: - Updated server configuration to include Prometheus store options in `server.rs`. Signed-off-by: Lei, HUANG <mrsatangel@gmail.com> * reformat Signed-off-by: Lei, HUANG <mrsatangel@gmail.com> * feat/auto-schema-align: ### Add DataTypes Error Handling and Column Renaming Logic - `error.rs`: Introduced a new `DataTypes` error variant to handle errors from `datatypes::error::Error`. Updated `ErrorExt` implementation to include `DataTypes`. - `pending_rows_batcher.rs`: Added functions `find_prom_special_column_names` and `rename_prom_special_columns_for_existing_schema` to handle renaming of special Prometheus columns. Updated `build_prom_create_table_schema` to simplify error handling with `ConcreteDataType`. - Tests: Added a test case `test_rename_prom_special_columns_for_existing_schema` to verify the renaming logic for Prometheus special columns. Signed-off-by: Lei, HUANG <mrsatangel@gmail.com> * feat/auto-schema-align: - Refactored `PendingRowsBatcher` to accommodate Prometheus record batches: - Introduced `accommodate_record_batch_for_target_schema` to normalize incoming record batches against existing table schemas. - Removed `collect_missing_prom_tag_columns` and `rename_prom_special_columns_for_existing_schema` in favor of the new function. - Added `unzip_logical_region_schema` to extract schema components. - Updated tests in `pending_rows_batcher.rs`: - Added tests for `accommodate_record_batch_for_target_schema` to verify handling of missing tag columns and renaming of special columns. - Ensured error handling for missing timestamp and field columns in target schema. Signed-off-by: Lei, HUANG <mrsatangel@gmail.com> * feat/auto-schema-align: ### Commit Summary - Enhancement in Table Creation Logic: Updated `prom_store.rs` to modify the handling of `table_options` during table creation. Specifically, `table_options` are now extended differently based on the `AutoCreateTableType`. For `Physical` tables, enforced `sst_format=flat` to optimize pending-rows writes by leveraging bulk memtables. Signed-off-by: Lei, HUANG <mrsatangel@gmail.com> * feat/auto-schema-align: Enhance Performance Monitoring in `pending_rows_batcher.rs` - Added performance monitoring timers to various stages of the `PendingRowsBatcher` process, including schema cache checks, table resolution, schema creation, and record batch alignment. - Improved schema handling by adding timers around schema alteration and missing column addition processes. Signed-off-by: Lei, HUANG <mrsatangel@gmail.com> * feat/auto-schema-align: - Enhance Concurrent Write Handling: Introduced `FlushRegionWrite` and `FlushWriteResult` structs to manage region writes and their results. Added `flush_region_writes_concurrently` function to handle concurrent flushing of region writes based on `should_dispatch_concurrently` logic in `pending_rows_batcher.rs`. - Testing Enhancements: Added tests for concurrent dispatching of region writes and the logic for determining concurrent dispatch in `pending_rows_batcher.rs`. Signed-off-by: Lei, HUANG <mrsatangel@gmail.com> * feat/auto-schema-align: ### Add Histogram for Flush Stage Elapsed Time - `metrics.rs`: Introduced a new `HistogramVec` named `PENDING_ROWS_BATCH_FLUSH_STAGE_ELAPSED` to track the elapsed time of pending rows batch flush stages. - `pending_rows_batcher.rs`: Replaced instances of `PENDING_ROWS_BATCH_INGEST_STAGE_ELAPSED` with `PENDING_ROWS_BATCH_FLUSH_STAGE_ELAPSED` to measure the elapsed time for various flush stages, including `flush_write_region`, `flush_concat_table_batches`, `flush_resolve_table`, `flush_fetch_partition_rule`, `flush_split_record_batch`, `flush_filter_record_batch`, `flush_resolve_region_leader`, and `flush_encode_ipc`. Signed-off-by: Lei, HUANG <mrsatangel@gmail.com> * Add design doc for physical table batching in PendingRowsBatcher Signed-off-by: Lei, HUANG <mrsatangel@gmail.com> * Add implementation plan for physical table batching in PendingRowsBatcher * feat/auto-schema-align: ### Commit Message Enhance Metric Engine with Physical Batch Processing - Add `metric-engine` Dependency: Updated `Cargo.lock` and `Cargo.toml` to include `metric-engine` as a workspace dependency. - Expose Batch Modifier Functions: Changed visibility of `TagColumnInfo`, `compute_tsid_array`, and `modify_batch_sparse` in `batch_modifier.rs` to public, and made `batch_modifier` a public module in `lib.rs`. - Implement Physical Batch Processing: - Added functions `bulk_insert_physical_region` and `bulk_insert_logical_region` in `bulk_insert.rs` to handle physical and logical batch insertions. - Updated `pending_rows_batcher.rs` to attempt physical batch processing before falling back to logical processing, including new functions `flush_batch_physical` and `flush_batch_per_logical_table`. - Enhance Testing: - Added tests for physical region passthrough and empty batch handling in `bulk_insert.rs`. - Introduced `with_mito_config` in `test_util.rs` for customized test environments. Signed-off-by: Lei, HUANG <mrsatangel@gmail.com> * feat/auto-schema-align: ### Enhance Batch Processing for Table Creation and Alteration - `prom_store.rs`: - Added `create_tables_if_missing_batch` and `add_missing_prom_tag_columns_batch` methods to handle batch creation of tables and batch alteration to add missing tag columns. - Implemented logic to determine missing tables and columns, and perform batch operations accordingly. - `pending_rows_batcher.rs`: - Updated `PendingRowsBatcher` to utilize batch methods for creating tables an adding missing columns. - Enhanced logic to resolve table schemas and accommodate record batches after batch operations. Signed-off-by: Lei, HUANG <mrsatangel@gmail.com> * perf: concurrent catalog lookups and eliminate redundant concat_batches on ingest path Replace sequential catalog_manager.table() calls with concurrent futures::future::join_all in align_table_batches_to_region_schema. This affects all three lookup loops: initial table resolution, post-create resolution, and post-alter schema refresh. Reduces O(N) sequential RPC latency to O(1) wall-clock time for requests with many distinct logical tables (e.g. Prometheus remote_write). Remove the per-logical-table concat_batches in flush_batch_physical. Instead of merging all chunks of a table into one RecordBatch before calling modify_batch_sparse, apply modify_batch_sparse directly to each chunk and collect all modified chunks for a single final concat. This eliminates one full data copy per logical table on the flush path. * refactor: extract Prometheus schema alignment helpers into prom_row_builder module Move six functions and their eight unit tests from pending_rows_batcher.rs (~2386 lines) into a new prom_row_builder.rs module (~776 lines), leaving the batcher at ~1665 lines focused on flush/worker machinery. Extracted functions: - accommodate_record_batch_for_target_schema (normalize incoming batch against existing table schema) - unzip_logical_region_schema (extract ts/field/tag columns) - build_prom_create_table_schema (build ColumnSchema vec for table creation) - align_record_batch_to_schema (reorder/fill/cast columns to target schema) - rows_to_record_batch (convert proto Rows to Arrow RecordBatch) - build_arrow_array (build Arrow arrays from proto values) Cleaned up 12 now-unused imports from pending_rows_batcher.rs. * feat/auto-schema-align: ### Enhance `PendingRowsBatcher` and `prom_row_builder` for Efficient Schema Handling - `pending_rows_batcher.rs`: - Refactored `submit` method to integrate table batch building and alignment into a single method `build_and_align_table_batches`. - Removed intermediate `RecordBatch` creation, optimizing the process by directly converting proto `RowInsertRequests` into aligned `RecordBatch`es. - Enhanced schema handling by identifying missing columns directly from proto schemas. - `prom_row_builder.rs`: - Introduced `rows_to_aligned_record_batch` for direct conversion of proto `Rows` into aligned `RecordBatch`es. - Added `identify_missing_columns_from_proto` to detect absent tag columns without intermediate `RecordBatch`. - Implemented `build_prom_create_table_schema_from_proto` to construct table schemas directly from proto schemas. Signed-off-by: Lei, HUANG <mrsatangel@gmail.com> * feat/auto-schema-align: Add elapsed time metrics for bulk insert operations - Updated `bulk_insert` method in `bulk_insert.rs` to record elapsed time metrics using `MITO_OPERATION_ELAPSED` for both physical and logical regions. - Added a new test `test_bulk_insert_records_elapsed_metric` to verify that the elapsed time metric is recorded correctly during bulk insert operations. Signed-off-by: Lei, HUANG <mrsatangel@gmail.com> * remove flush per logical region Signed-off-by: Lei, HUANG <mrsatangel@gmail.com> * feat/auto-schema-align: Refactor `flush_batch` and `flush_batch_physical` functions - Removed unused `catalog` and `schema` variables from `flush_batch` in `pending_rows_batcher.rs`. - Updated `flush_batch_physical` to directly use `ctx.current_catalog()` and `ctx.current_schema()` for resolving table names. Signed-off-by: Lei, HUANG <mrsatangel@gmail.com> * feat/auto-schema-align: ### Remove Unused Function and Associated Test - File: `src/servers/src/prom_row_builder.rs` - Removed the unused function `build_prom_create_table_schema` which was responsible for building a `Vec<ColumnSchema>` from an Arrow schema. - Deleted the associated test `test_build_prom_create_table_schema_from_request_schema` that validated the removed function. Signed-off-by: Lei, HUANG <mrsatangel@gmail.com> * feat/auto-schema-align: - Remove Test: Deleted the `test_bulk_insert_records_elapsed_metric` test from `bulk_insert.rs`. - Refactor Table Resolution: Introduced `TableResolutionPlan` struct and refactored table resolution logic in `pending_rows_batcher.rs`. - Enhance Table Handling: Added functions for collecting non-empty table rows, unique table schemas, and handling table creation and alteration in `pending_rows_batcher.rs`. - Add Tests: Implemented tests for `collect_non_empty_table_rows` and `collect_unique_table_schemas` in `pending_rows_batcher.rs`. Signed-off-by: Lei, HUANG <mrsatangel@gmail.com> * feat/auto-schema-align: - Refactor Error Handling: Updated error handling in `pending_rows_batcher.rs` and `prom_row_builder.rs` to use `Snafu` error context for more descriptive error messages. - Remove Unused Functionality: Eliminated the `rows_to_record_batch` function and related test in `prom_row_builder.rs` as it was redundant. - Simplify Function Return Types: Modified `rows_to_aligned_record_batch` in `prom_row_builder.rs` to return only `RecordBatch` without missing columns, simplifying the function's interface and related tests. Signed-off-by: Lei, HUANG <mrsatangel@gmail.com> * feat/auto-schema-align: ### Add Helper Function for Table Options in `prom_store.rs` - Introduced `fill_metric_physical_table_options` function to encapsulate logic for setting table options, ensuring the use of flat SST format and physical table metadata. - Updated `Instance` implementation to utilize the new helper function for setting table options. - Added a unit test `test_metric_physical_table_options_forces_flat_sst_format` to verify the correct application of table options. Signed-off-by: Lei, HUANG <mrsatangel@gmail.com> * feat/auto-schema-align: - Refactor `PendingRowsBatcher`: Simplified worker retrieval logic in `get_or_spawn_worker` method by using a more concise conditional check. - Metrics Update: Added `PENDING_ROWS_BATCH_FLUSH_STAGE_ELAPSED` metric in `pending_rows_batcher.rs`. - Remove Unused Code: Deleted multiple test functions related to record batch alignment and schema preparation in `pending_rows_batcher.rs` and `prom_row_builder.rs`. - Function Visibility Change: Made `build_prom_create_table_schema_from_proto` public in `prom_row_builder.rs`. Signed-off-by: Lei, HUANG <mrsatangel@gmail.com> * chore: remove plan Signed-off-by: Lei, HUANG <mrsatangel@gmail.com> * feat/auto-schema-align: ### Refactor and Simplify Schema Alteration Logic - Removed Unused Methods: Deleted `create_table_if_missing` and `add_missing_prom_tag_columns` methods from `PendingRowsSchemaAlterer` trait in `prom_store.rs` and `pending_rows_batcher.rs`. - Error Handling Improvement: Enhanced error handling in `create_tables_if_missing_batch` method to return a specific error message for unsupported `AutoCreateTableType` in `prom_store.rs`. - Visibility Change: Made `as_str` method public in `AutoCreateTableType` enum in `insert.rs` to support external access. Signed-off-by: Lei, HUANG <mrsatangel@gmail.com> * feat/auto-schema-align: ### Commit Message Improve safety in `prom_row_builder.rs` - Updated `unzip_logical_region_schema` to use `saturating_sub` for safer capacity calculation of `tag_columns`. Signed-off-by: Lei, HUANG <mrsatangel@gmail.com> * feat/auto-schema-align: Add TODO comments for future improvements in `pending_rows_batcher.rs` - Added a TODO comment to consider bounding the `flush_region_writes_concurrently` function. - Added a TODO comment to potentially limit the maximum rows to concatenate in the `flush_batch_physical` function. Signed-off-by: Lei, HUANG <mrsatangel@gmail.com> * feat/auto-schema-align: ### Commit Message Enhance error handling in `pending_rows_batcher.rs` - Updated `collect_unique_table_schemas` to return a `Result` type, enabling error handling for duplicate table names. - Modified the function to return an error when duplicate table names are found in `table_rows`. - Adjusted test cases to handle the new `Result` return type in `collect_unique_table_schemas`. Signed-off-by: Lei, HUANG <mrsatangel@gmail.com> * feat/auto-schema-align: - Refactor `partition_columns` Method: Updated the `partition_columns` method in `multi_dim.rs`, `partition.rs`, and `splitter.rs` to return a slice reference instead of a cloned vector, improving performance by avoiding unnecessary cloning. - Enhance Partition Handling: Added functions `collect_tag_columns_and_non_tag_indices` and `strip_partition_columns_from_batch` in `pending_rows_batcher.rs` to manage partition columns more efficiently, including stripping partition columns from record batches. - Update Tests: Modified existing tests and added new ones in `pending_rows_batcher.rs` to verify the functionality of partition column handling, ensuring correct behavior of the new methods. Signed-off-by: Lei, HUANG <mrsatangel@gmail.com> * feat/auto-schema-align: ### Enhance Schema Handling and Validation in `pending_rows_batcher.rs` - Schema Validation Enhancements: - Added checks for essential columns (`timestamp`, `value`) in `collect_tag_columns_and_non_tag_indices`. - Introduced `PHYSICAL_REGION_ESSENTIAL_COLUMN_COUNT` to ensure minimum column count in `strip_partition_columns_from_batch`. - Improved error handling for unexpected data types and duplicated columns. - Function Modifications: - Updated `strip_partition_columns_from_batch` to project essential columns without lookup. - Modified `flush_batch_physical` to use `essential_col_indices` instead of `non_tag_indices`. - Test Enhancements: - Added tests for schema validation, including checks for unexpected data types and duplicated columns. - Verified correct projection of essential columns in `strip_partition_columns_from_batch`. Files affected: `pending_rows_batcher.rs`, `tests`. Signed-off-by: Lei, HUANG <mrsatangel@gmail.com> * feat/auto-schema-align: - Add `smallvec` Dependency: Updated `Cargo.lock` and `Cargo.toml` to include `smallvec` as a workspace dependency. - Refactor Function: Renamed `collect_tag_columns_and_non_tag_indices` to `columns_taxonomy` in `pending_rows_batcher.rs` and updated its return type to use `SmallVec`. - Update Tests: Modified test cases in `pending_rows_batcher.rs` to reflect changes in function name and return type. Signed-off-by: Lei, HUANG <mrsatangel@gmail.com> * feat/auto-schema-align: Refactor `pending_rows_batcher.rs` to Simplify Table ID Handling - Updated `TableBatch` struct to use `TableId` directly instead of `Option<u32>` for `table_id`. - Simplified logic in `flush_batch_physical` by removing the check for `None` in `table_id`. - Adjusted related logic in `start_worker` to accommodate the change in `table_id` handling. Signed-off-by: Lei, HUANG <mrsatangel@gmail.com> * feat/auto-schema-align: ### Enhance Batch Processing Logic - `pending_rows_batcher.rs`: - Moved column taxonomy resolution inside the loop to handle schema variations across batches. - Added checks to skip processing if both tag columns and essential column indices are empty. - Tests: - Added `test_modify_batch_sparse_with_taxonomy_per_batch` to verify batch modification logic with varying schemas. Signed-off-by: Lei, HUANG <mrsatangel@gmail.com> * feat/auto-schema-align: ### Remove Primary Key Column Check in `pending_rows_batcher.rs` - Removed the check for the primary key column and other essential column names in the function `strip_partition_columns_from_batch` within `pending_rows_batcher.rs`. - Simplified the logic by eliminating the validation of column order against expected essential names. Signed-off-by: Lei, HUANG <mrsatangel@gmail.com> * feat/auto-schema-align: ### Refactor error handling and iteration in `otlp.rs` and `pending_rows_batcher.rs` - `otlp.rs`: Simplified error handling by removing `CatalogSnafu` context when awaiting table retrieval. - `pending_rows_batcher.rs`: Streamlined iteration over tables by removing unnecessary `into_iter()` calls, improving code readability and efficiency. Signed-off-by: Lei, HUANG <mrsatangel@gmail.com> * chore/metrics-for-bulk: Add timing metrics for batch processing in `pending_rows_batcher.rs` - Introduced `modify_elapsed` and `columns_taxonomy_elapsed` to measure time spent in `modify_batch_sparse` and `columns_taxonomy` functions. - Updated `flush_batch_physical` to record these metrics using `PENDING_ROWS_BATCH_FLUSH_STAGE_ELAPSED`. Signed-off-by: Lei, HUANG <mrsatangel@gmail.com> * feat/auto-schema-align: ### Commit Summary - Remove Unused Code: Eliminated the `#[allow(dead_code)]` attribute from the `compute_tsid_array` function in `batch_modifier.rs`. - Error Handling Improvement: Enhanced error handling in `flush_batch_physical` function by adjusting the `match` block in `pending_rows_batcher.rs`. - Simplify Logic: Streamlined the logic in `rows_to_aligned_record_batch` by removing unnecessary type casting in `prom_row_builder.rs`. Signed-off-by: Lei, HUANG <mrsatangel@gmail.com> * feat/auto-schema-align: Refactor `flush_batch_physical` in `pending_rows_batcher.rs`: - Moved partition column stripping logic to a single location before processing region batches. - Updated the use of `combined_batch` to `stripped_batch` for consistency in batch processing. - Removed redundant partition column stripping logic within the region batch loop. Signed-off-by: Lei, HUANG <mrsatangel@gmail.com> * feat/auto-schema-align: ### Update `batch_modifier.rs` Documentation and Parameter Naming - Enhanced documentation for `compute_tsid_array` and `modify_batch_sparse` functions to clarify their logic and parameters. - Renamed parameter `non_tag_column_indices` to `extra_column_indices` in `modify_batch_sparse` for better clarity. Signed-off-by: Lei, HUANG <mrsatangel@gmail.com> --------- Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>	2026-04-01 02:45:26 +00:00
Ning Sun	e14404c677	chore: update rust toolchain to 2026-03-21 (#7849 ) * chore: update rust toolchain to 2026-03-21 * chore: new format * fix: lint * chore: resolve lint issues * chore: remove as_millis_f64 * chore: deps up	2026-03-30 12:13:14 +00:00
Lei, HUANG	b57dfc18dc	feat: pending rows batching for metrics (#7831 ) * feat: metric batch 2s PoC Signed-off-by: jeremyhi <fengjiachun@gmail.com> * chore: max_concurrent_flushes Signed-off-by: jeremyhi <fengjiachun@gmail.com> * chore: work channel size Signed-off-by: jeremyhi <fengjiachun@gmail.com> * feat(servers): add metrics and logs for pending rows batch flush Add the `FLUSH_ELAPSED` histogram metric to track the duration of pending rows batch flushes in the Prometheus store protocol handler. This provides better observability into the performance and latency of the batcher. Also update telemetry by: - Recording elapsed time for both successful and failed flush operations. - Adding an informational log upon successful flush including row count and duration. - Including elapsed time in error logs when a flush fails. Signed-off-by: Lei, HUANG <mrsatangel@gmail.com> * feat(servers): implement columnar batching for pending rows Refactor PendingRowsBatcher to use columnar batching for the metrics store. Incoming RowInsertRequests are now converted to RecordBatches, partitioned, and flushed via BulkInsert requests to datanodes. - Enhance MultiDimPartitionRule to handle scalar boolean predicates. - Add metrics for tracking flush failures and dropped rows. - Update dependencies to support columnar batching in servers. Signed-off-by: Lei, HUANG <mrsatangel@gmail.com> * feat(servers): add backpressure for pending rows Implement backpressure in PendingRowsBatcher by limiting in-flight requests with a semaphore and making the submission wait for the flush result. This ensures Prometheus write requests are throttled and only return once the data has been successfully flushed to datanodes. - Add max_inflight_requests to PromStoreOptions. - Use oneshot channels to notify submitters of flush completion. - Limit concurrent requests using a new inflight_semaphore. - Update PendingRowsBatcher::submit to wait for the flush outcome. Signed-off-by: Lei, HUANG <mrsatangel@gmail.com> * feat: add stage-level metrics for bulk ingestion Introduce histograms to track the elapsed time of various stages in the metric engine bulk insert path and the server's pending rows batcher. This provides better observability into the performance bottlenecks of the ingestion pipeline. Signed-off-by: Lei, HUANG <mrsatangel@gmail.com> * - `src/metric-engine/src/engine/bulk_insert.rs`: Removed the fallback mechanism that converted record batches to rows when bulk inserts were unsupported, along with related helper functions and unused imports. - `src/operator/src/insert.rs`: Removed an unused import (`common_time::TimeToLive::Instant`). Signed-off-by: Lei, HUANG <mrsatangel@gmail.com> * feat(servers): columnar Prom remote write Optimize the Prometheus remote write path by allowing direct conversion from decoded Prometheus samples to Arrow RecordBatches. This bypasses intermediate row-based representations when `PendingRowsBatcher` is active and no pipeline is used, improving ingestion efficiency. - Implement `as_record_batch_groups` in `TablesBuilder` and `PromWriteRequest`. - Add `submit_prom_record_batch_groups` to `PendingRowsBatcher`. - Introduce `DecodedPromWriteRequest` in `prom_store`. - Implement row-to-RecordBatch conversion logic in `prom_row_builder`. Signed-off-by: Lei, HUANG <mrsatangel@gmail.com> * Revert "feat(servers): columnar Prom remote write" This reverts commit efbb63c12a3e7fcec03858ea0351efd94fec8242. * refactor(servers): improve row to RecordBatch conversion - Use `snafu::ensure` for row validation in `rows_to_record_batch`. - Add explicit type hint for `MutableVector` to improve clarity. - Reorganize and clean up imports in `pending_rows_batcher.rs`. Signed-off-by: Lei, HUANG <mrsatangel@gmail.com> * perf(servers): use arrow builders for row conversion This commit optimizes the conversion from `api::v1::Rows` to `RecordBatch` by using Arrow builders directly. This avoids the overhead of `MutableVector` and `common_recordbatch`, leading to better performance in the `pending_rows_batcher`. Additionally, the `#[allow(dead_code)]` attribute is removed from `modify_batch_sparse` in the metric engine as it is now utilized. Signed-off-by: Lei, HUANG <mrsatangel@gmail.com> * perf(metric-engine): optimize batch modification Optimize `modify_batch_sparse` by reusing buffers, using Arrow builders, and employing fast-path encoding methods. This reduces allocations and avoids redundant downcasting and serializer overhead. Signed-off-by: Lei, HUANG <mrsatangel@gmail.com> * feat/metric-engine-support-bulk: Add Environment Variable for Batch Sync Control - `pending_rows_batcher.rs`: Introduced an environment variable `PENDING_ROWS_BATCH_SYNC` to control the synchronization behavior of batch processing. If set to true, the function will wait for the flush result; otherwise, it will return immediatel with the total rows count. Signed-off-by: Lei, HUANG <mrsatangel@gmail.com> * wip Signed-off-by: Lei, HUANG <mrsatangel@gmail.com> * chore: update and fix clippy Signed-off-by: Lei, HUANG <mrsatangel@gmail.com> * fix: failing test Signed-off-by: Lei, HUANG <mrsatangel@gmail.com> * picking-pending-rows-batcher: ### Commit Message Remove Unused Code and Simplify Error Handling - `src/error.rs`: Removed the `BatcherQueueFull` error variant and its associated logic, simplifying the error handling by removing unused code. - `src/http/prom_store.rs`: Eliminated the `try_decompress` function, streamlining the decompression logic by directly using `snappy_decompress` in `decode_remote_read_request`. Signed-off-by: Lei, HUANG <mrsatangel@gmail.com> * chore: parse PENDING_ROWS_BATCH_SYNC once Signed-off-by: Lei, HUANG <mrsatangel@gmail.com> * chore: revert unrelated changes Signed-off-by: Lei, HUANG <mrsatangel@gmail.com> * Refactor Prometheus Write Handling - `prom_store.rs`: Introduced `pre_write` method in `PromStoreProtocolHandler` to handle pre-write checks for Prometheus remote write requests. Updated `write` method to utilize `pre_write`. - `server.rs`: Modified `PendingRowsBatcher` initialization to conditionally create a batcher based on `with_metric_engine` flag. - `http/prom_store.rs`: Integrated `pre_write` checks before submitting requests to `PendingRowsBatcher`. - `query_handler.rs`: Added `pre_write` method to `PromStoreProtocolHandler` trait for pre-write operations. Signed-off-by: Lei, HUANG <mrsatangel@gmail.com> * picking-pending-rows-batcher: - Fix Label Typo: Corrected a typo in the label value from `"flush_wn ite_region"` to `"flush_write_region"` in `pending_rows_batcher.rs`. - Refactor Array Building Logic: Introduced a macro `build_array!` to streamline the construction of `ArrayRef` for different data types, reducing code duplication in `pending_rows_batcher.rs`. Signed-off-by: Lei, HUANG <mrsatangel@gmail.com> * format toml Signed-off-by: Lei, HUANG <mrsatangel@gmail.com> * picking-pending-rows-batcher: ### Update PromStore and PendingRowsBatcher Configuration - `prom_store.rs`: Set `pending_rows_flush_interval` to `Duration::ZERO` to disable automatic flushing. - `pending_rows_batcher.rs`: Enhance validation to disable the batcher when `flush_interval` is zero or configuration values like `max_batch_rows`, `max_concurrent_flushes`, `worker_channel_capacity`, or `max_inflight_requests` are zero, preventing potential panics or deadlocks. Signed-off-by: Lei, HUANG <mrsatangel@gmail.com> * picking-pending-rows-batcher: ### Update `pending_rows_flush_interval` to Zero - Files Modified: - `src/frontend/src/service_config/prom_store.rs` - `tests-integration/tests/http.rs` - Key Changes: - Updated `pending_rows_flush_interval` from `Duration::from_secs(2)` to `Duration::ZERO` in `prom_store.rs`. - Changed `pending_rows_flush_interval` configuration from `"2s"` to `"0s"` in `http.rs`. These changes set the flush interval to zero, potentially affecting how frequently pending rows are flushed. Signed-off-by: Lei, HUANG <mrsatangel@gmail.com> * picking-pending-rows-batcher: Add Worker Management Enhancements - `metrics.rs`: Introduced `PENDING_WORKERS` gauge to track active pending rows batch workers. - `pending_rows_batcher.rs`: - Added worker idle timeout logic with `WORKER_IDLE_TIMEOUT_MULTIPLIER`. - Implemented worker management functions: `spawn_worker`, `remove_worker_if_same_channel`, and `should_close_worker_on_idle_timeout`. - Enhanced worker lifecycle management to handle idle workers and ensure proper cleanup. - Tests: Added unit tests for worker removal and idle timeout logic. Signed-off-by: Lei, HUANG <mrsatangel@gmail.com> * fix: clippy Signed-off-by: Lei, HUANG <mrsatangel@gmail.com> --------- Signed-off-by: jeremyhi <fengjiachun@gmail.com> Signed-off-by: Lei, HUANG <mrsatangel@gmail.com> Co-authored-by: jeremyhi <fengjiachun@gmail.com>	2026-03-27 02:19:00 +00:00
Lei, HUANG	dc98e0215b	feat(metric-engine): support bulk inserts with put fallback (#7792 ) * feat(metric-engine): support bulk inserts Implement `RegionRequest::BulkInserts` to support efficient columnar data ingestion in the metric engine. Key changes: - Implement `bulk_insert_region` to handle logical-to-physical region mapping and dispatch writes. - Add `batch_modifier` for `RecordBatch` transformations, specifically for `__tsid` generation and sparse primary key encoding. - Integrate `BulkInserts` into the `MetricEngine` request handling logic. - Provide a row-based fallback mechanism if the underlying storage doesn't support bulk writes. Signed-off-by: Lei, HUANG <mrsatangel@gmail.com> * feat/metric-engine-bulk-insert: ### Update `bulk_insert.rs` to Support Partition Expression Version - Enhancements: - Added support for `partition_expr_version` in `RegionBulkInsertsRequest` and `RegionPutRequest`. - Modified the handling of `partition_expr_version` to be dynamically set from the `request` object. Files affected: - `src/metric-engine/src/engine/bulk_insert.rs` Signed-off-by: Lei, HUANG <mrsatangel@gmail.com> * fix: cargo lock revert Signed-off-by: Lei, HUANG <mrsatangel@gmail.com> * add doc for conversions Signed-off-by: Lei, HUANG <mrsatangel@gmail.com> * chore: simplify test Signed-off-by: Lei, HUANG <mrsatangel@gmail.com> * feat/metric-engine-bulk-insert: ### Refactor `bulk_insert.rs` in `metric-engine` - Refactor Functionality: - Replaced `resolve_tag_columns` with `resolve_tag_columns_from_metadata` to streamline tag column resolution. - Moved logic for resolving tag columns directly into `resolve_tag_columns_from_metadata`, removing the need for an external function call. - Enhancements: - Improved error handling and context provision for missing physical regions and columns. - Optimized tag column sorting and index management within the batch processing logic. Signed-off-by: Lei, HUANG <mrsatangel@gmail.com> * feat/metric-engine-bulk-insert: ### Refactor `record_batch_to_rows` Function in `bulk_insert.rs` - Simplified the `record_batch_to_rows` function by removing the `logical_metadata` parameter and directly validating column types within the function. - Enhanced error handling for timestamp, value, and tag columns by checking their data types and providing detailed error messages. - Replaced the use of `Helper::try_into_vector` with direct downcasting to `TimestampMillisecondArray`, `Float64Array`, and `StringArray` for improved type safety and clarity. - Updated the construction of `api::v1::Rows` to directly handle null values and construct `api::v1::Value` objects accordingly. Signed-off-by: Lei, HUANG <mrsatangel@gmail.com> * feat/metric-engine-bulk-insert: ## Commit Message Refactor `bulk_insert.rs` to optimize state access - Moved the state read operation inside a new block to limit its scope and improve code clarity. - Adjusted logic for processing `tag_columns` and `non_tag_indices` to work within the new block structure. Signed-off-by: Lei, HUANG <mrsatangel@gmail.com> * feat/metric-engine-bulk-insert: ### Refactor `compute_tsid_array` Function - Refactored `compute_tsid_array` function: Modified the function signature to accept `tag_arrays` as a parameter instead of building it internally. This change affects the following files: - `src/metric-engine/src/batch_modifier.rs` - Updated test cases: Adjusted test cases to accommodate the new `compute_tsid_array` function signature by passing `tag_arrays` explicitly. Signed-off-by: Lei, HUANG <mrsatangel@gmail.com> * docs: add doc for bulk_insert_region Signed-off-by: Lei, HUANG <mrsatangel@gmail.com> * feat/metric-engine-bulk-insert: ### Commit Message Refactor `bulk_insert.rs` in `metric-engine`: - Removed error handling for unsupported status codes in `write_data` method. - Eliminated `record_batch_to_rows` function, simplifying the data insertion process. - Streamlined the `write_data` method by removing fallback logic for unsupported operations. Signed-off-by: Lei, HUANG <mrsatangel@gmail.com> * feat/metric-engine-bulk-insert: - Optimize Primary Key Construction: Refactored `modify_batch_sparse` in `batch_modifier.rs` to use `BinaryBuilder` for more efficient primary key construction. - Add Fallback for Unsupported Bulk Inserts: Updated `bulk_insert.rs` to handle unsupported bulk inserts by converting record batches to rows and using `RegionPutRequest`. - Implement Record Batch to Rows Conversion: Added `record_batch_to_rows` function in `bulk_insert.rs` to convert `RecordBatch` to `api::v1::Rows` for fallback operations. Signed-off-by: Lei, HUANG <mrsatangel@gmail.com> * feat/metric-engine-bulk-insert: Add test for handling null values in `record_batch_to_rows` - Added a new test `test_record_batch_to_rows_with_null_values` in `bulk_insert.rs` to verify the handling of null values in the `record_batch_to_rows` function. - The test checks the conversion of a `RecordBatch` with null values in various fields to ensure correct row creation and schema handling. Signed-off-by: Lei, HUANG <mrsatangel@gmail.com> * feat/metric-engine-bulk-insert: Add fallback path for unsupported status and improve error context handling - `bulk_insert.rs`: - Added a fallback path for `PartitionTreeMemtable` in case of unsupported status code. - Enhanced error handling by using `with_context` for better error messages when timestamp and value columns are not found in `RecordBatch`. Signed-off-by: Lei, HUANG <mrsatangel@gmail.com> --------- Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>	2026-03-17 11:28:06 +00:00
LFC	5eac4f10aa	chore: remove dependency on "atty" (#7725 ) Signed-off-by: luofucong <luofc@foxmail.com>	2026-02-26 09:58:01 +00:00
Weny Xu	df04267c54	fix(repartition): reject writes on deallocating regions during region merge (#7694 ) * feat(meta): add write route policy to region route with backward compatibility Signed-off-by: WenyXu <wenymedia@gmail.com> * fix(meta): use partition_expr compatibility accessor in repartition matching Signed-off-by: WenyXu <wenymedia@gmail.com> * feat(meta): introduce staging partition rule enum for repartition instructions Signed-off-by: WenyXu <wenymedia@gmail.com> * feat(datanode): plumb staging partition rule enum through heartbeat handlers Signed-off-by: WenyXu <wenymedia@gmail.com> * feat(meta): mark pending-deallocate regions as reject-all during merge staging Signed-off-by: WenyXu <wenymedia@gmail.com> * feat(partition): exclude reject-all regions from write partitioning Signed-off-by: WenyXu <wenymedia@gmail.com> * feat(mito): store staging partition rule enum in region state Signed-off-by: WenyXu <wenymedia@gmail.com> * feat(mito): reject writes in staging when partition rule is reject-all Signed-off-by: WenyXu <wenymedia@gmail.com> * feat(meta): send enter staging instruction with reject-all Signed-off-by: WenyXu <wenymedia@gmail.com> * fix(repartition): preserve reject-all on exit, merge enter-staging instructions, and allow staged bulk writes Signed-off-by: WenyXu <wenymedia@gmail.com> * refactor: refactor to ignore all writes Signed-off-by: WenyXu <wenymedia@gmail.com> * chore: apply suggestions Signed-off-by: WenyXu <wenymedia@gmail.com> * refactor: rename StagingPartitionRule to StagingPartitionDirective across staging flow Signed-off-by: WenyXu <wenymedia@gmail.com> * chore: add comments Signed-off-by: WenyXu <wenymedia@gmail.com> * chore: clippy Signed-off-by: WenyXu <wenymedia@gmail.com> * refactor: nit Signed-off-by: WenyXu <wenymedia@gmail.com> * chore: apply suggestions Signed-off-by: WenyXu <wenymedia@gmail.com> * refactor: rename Signed-off-by: WenyXu <wenymedia@gmail.com> --------- Signed-off-by: WenyXu <wenymedia@gmail.com>	2026-02-25 07:04:38 +00:00
Weny Xu	0ed3b83099	refactor: rename partition rule version to partition expr version (#7696 ) * refactor: rename partition rule version to partition expr version Signed-off-by: WenyXu <wenymedia@gmail.com> * chore: update proto Signed-off-by: WenyXu <wenymedia@gmail.com> * chore: clippy Signed-off-by: WenyXu <wenymedia@gmail.com> --------- Signed-off-by: WenyXu <wenymedia@gmail.com>	2026-02-10 10:12:47 +00:00
Weny Xu	8026b23834	feat: partition rule version validation for writes and staging (#7628 ) * feat: verify partition rule Signed-off-by: WenyXu <wenymedia@gmail.com> * feat: add partition version cache Signed-off-by: WenyXu <wenymedia@gmail.com> * chore: header check Signed-off-by: WenyXu <wenymedia@gmail.com> * chore: fmt toml Signed-off-by: WenyXu <wenymedia@gmail.com> * refactor: minor refactor Signed-off-by: WenyXu <wenymedia@gmail.com> * chore: header Signed-off-by: WenyXu <wenymedia@gmail.com> * chore: fix clippy Signed-off-by: WenyXu <wenymedia@gmail.com> * fix: fix unit tests Signed-off-by: WenyXu <wenymedia@gmail.com> * refactor: minor refactor Signed-off-by: WenyXu <wenymedia@gmail.com> * chore: apply suggestions Signed-off-by: WenyXu <wenymedia@gmail.com> * chore: nit Signed-off-by: WenyXu <wenymedia@gmail.com> * chore: nit Signed-off-by: WenyXu <wenymedia@gmail.com> * chore: apply suggestions Signed-off-by: WenyXu <wenymedia@gmail.com> * chore: apply suggestions Signed-off-by: WenyXu <wenymedia@gmail.com> --------- Signed-off-by: WenyXu <wenymedia@gmail.com>	2026-02-06 12:16:34 +00:00
Weny Xu	5bfc728d32	fix(repartition): improve physical region allocation and compaction read path correctness (#7621 ) * fix: fix metadata region Signed-off-by: WenyXu <wenymedia@gmail.com> * fix: adjust repartition flow and compaction read compatibility Signed-off-by: WenyXu <wenymedia@gmail.com> * chore: remove logs Signed-off-by: WenyXu <wenymedia@gmail.com> * refactor: rename compaction mapper and pk projection Signed-off-by: WenyXu <wenymedia@gmail.com> * refactor: rename `CompactionProjectionMapper` Signed-off-by: WenyXu <wenymedia@gmail.com> * refactor: clarify compaction projection naming Signed-off-by: WenyXu <wenymedia@gmail.com> * chore: add comments Signed-off-by: WenyXu <wenymedia@gmail.com> * chore: fmt Signed-off-by: WenyXu <wenymedia@gmail.com> * feat: allow create physical table with internal columns Signed-off-by: WenyXu <wenymedia@gmail.com> * test: add tests Signed-off-by: WenyXu <wenymedia@gmail.com> * fix: fix template logic Signed-off-by: WenyXu <wenymedia@gmail.com> * fix: fix unit test Signed-off-by: WenyXu <wenymedia@gmail.com> * chore: apply suggestions Signed-off-by: WenyXu <wenymedia@gmail.com> * chore: apply suggestions Signed-off-by: WenyXu <wenymedia@gmail.com> * chore: update sqlness result Signed-off-by: WenyXu <wenymedia@gmail.com> --------- Signed-off-by: WenyXu <wenymedia@gmail.com>	2026-01-28 04:04:05 +00:00
Weny Xu	d0c610f3c7	feat: add `partial_drop` to `DropRequest` (#7597 ) * feat: add `partial_drop` to `DropRequest` Signed-off-by: WenyXu <wenymedia@gmail.com> * feat: handle non-partial-drop drop task Signed-off-by: WenyXu <wenymedia@gmail.com> * feat: remove files immediately Signed-off-by: WenyXu <wenymedia@gmail.com> * chore: update proto Signed-off-by: WenyXu <wenymedia@gmail.com> --------- Signed-off-by: WenyXu <wenymedia@gmail.com>	2026-01-27 10:46:52 +00:00
jeremyhi	007e6cc860	chore: batch requests in metric engine (#7535 ) * chore: batch requests in metric engine Signed-off-by: jeremyhi <fengjiachun@gmail.com> * chore: fix clippy Signed-off-by: jeremyhi <fengjiachun@gmail.com> * fix: merge multiple schemas Signed-off-by: jeremyhi <fengjiachun@gmail.com> * chore: add tracing span Signed-off-by: jeremyhi <fengjiachun@gmail.com> * feat: add metrics for batch put Signed-off-by: jeremyhi <fengjiachun@gmail.com> * feat: add sparse and dense encoding test case Signed-off-by: jeremyhi <fengjiachun@gmail.com> * chore: avoid allocation of vec Signed-off-by: jeremyhi <fengjiachun@gmail.com> * chore: table_id_for_row Signed-off-by: jeremyhi <fengjiachun@gmail.com> * chore: by comment Signed-off-by: jeremyhi <fengjiachun@gmail.com> * feat: algorithm to reduce hash lookups by using array indexing Signed-off-by: jeremyhi <fengjiachun@gmail.com> --------- Signed-off-by: jeremyhi <fengjiachun@gmail.com>	2026-01-16 02:57:57 +00:00
Weny Xu	2ae20daa62	feat: add sync region instruction for repartition procedure (#7562 ) * feat: add sync region instruction for repartition procedure This commit introduces a new sync region instruction and integrates it into the repartition procedure flow, specifically for metric engine tables. Changes: - Add SyncRegion instruction type and SyncRegionsReply in instruction.rs - Implement SyncRegionHandler in datanode to handle sync region requests - Add SyncRegion state in repartition procedure to sync newly allocated regions - Integrate sync region step after enter_staging_region for metric engine tables - Add sync_region flag and allocated_region_ids to PersistentContext - Make SyncRegionFromRequest serializable for instruction transmission - Add test utilities and mock support for sync region operations The sync region step is conditionally executed based on the table engine type, ensuring that newly allocated regions in metric engine tables are properly synced from their source regions before proceeding with manifest remapping. Signed-off-by: WenyXu <wenymedia@gmail.com> * chore: add logs Signed-off-by: WenyXu <wenymedia@gmail.com> * feat(repartition): improve staging region handling and support metric engine repartition - Reorder sync region flow: move SyncRegion from EnterStagingRegion to RepartitionStart to sync before applying staging - Add ExitStaging metadata update state to properly clear staging leader info after repartition completes - Update build_template_from_raw_table_info to optionally skip metric engine internal columns when creating region requests - Fix region state transition: set_dropping now expects specific state (Staging or Writable) for proper validation - Adjust region drop and copy handlers to handle staging regions correctly - Add comprehensive test cases for metric engine SPLIT/MERGE partition operations on physical tables with logical tables - Improve logging for table route updates, region drops, and repartition operations Signed-off-by: WenyXu <wenymedia@gmail.com> * refactor: removes code duplication Signed-off-by: WenyXu <wenymedia@gmail.com> * fix: update result Signed-off-by: WenyXu <wenymedia@gmail.com> * chore: refine comments Signed-off-by: WenyXu <wenymedia@gmail.com> * feat: add error strategy support for flush region and flush pending deallocate regions - Add `ErrorStrategy` enum in `procedure/utils.rs`: - Supports `Ignore` and `Retry` strategies for error handling - Refactor `flush_region` to accept `error_strategy` parameter - Extract `handle_flush_region_reply` helper function for better code organization - Add pending deallocate region support: - Add `pending_deallocate_region_ids` field to `PersistentContext` - Implement `flush_pending_deallocate_regions` in `EnterStagingRegion` state - Flush pending deallocate regions before entering staging regions to ensure data consistency - Update error handling: - `flush_leader_region`: Use `ErrorStrategy::Ignore` to skip unreachable datanodes - `sync_region`: Use `ErrorStrategy::Retry` for critical operations - `enter_staging_region`: Use `ErrorStrategy::Retry` when flushing pending deallocate regions This change improves the robustness of the repartition procedure by: 1. Providing flexible error handling strategies for flush operations 2. Ensuring pending deallocate regions are properly flushed before repartitioning 3. Preventing data inconsistency during region migration Signed-off-by: WenyXu <wenymedia@gmail.com> * chore: apply suggestions from CR Signed-off-by: WenyXu <wenymedia@gmail.com> * fix: compile Signed-off-by: WenyXu <wenymedia@gmail.com> --------- Signed-off-by: WenyXu <wenymedia@gmail.com>	2026-01-15 04:52:57 +00:00
Weny Xu	2f242927a8	feat(repartition): implement region deallocation for repartition procedure (#7522 ) * feat: implement deallocate regions for repartition procedure Signed-off-by: WenyXu <wenymedia@gmail.com> * feat(metric-engine): add force flag to drop physical regions with associated logical regions Signed-off-by: WenyXu <wenymedia@gmail.com> * feat: update table metadata after deallocating regions Signed-off-by: WenyXu <wenymedia@gmail.com> * chore: apply suggestions Signed-off-by: WenyXu <wenymedia@gmail.com> * chore: update proto Signed-off-by: WenyXu <wenymedia@gmail.com> --------- Signed-off-by: WenyXu <wenymedia@gmail.com>	2026-01-07 06:13:48 +00:00
Weny Xu	294f19fa1d	feat(metric-engine): support sync logical regions from source region (#7438 ) * chore: move file Signed-off-by: WenyXu <wenymedia@gmail.com> * feat(metric-engine): support sync logical regions from source region Signed-off-by: WenyXu <wenymedia@gmail.com> * fix: fix unit tests Signed-off-by: WenyXu <wenymedia@gmail.com> * chore: apply suggestions Signed-off-by: WenyXu <wenymedia@gmail.com> * chore: add comments Signed-off-by: WenyXu <wenymedia@gmail.com> * chore: add comments Signed-off-by: WenyXu <wenymedia@gmail.com> * chore: apply suggestions from CR Signed-off-by: WenyXu <wenymedia@gmail.com> --------- Signed-off-by: WenyXu <wenymedia@gmail.com>	2025-12-25 09:06:58 +00:00
Lei, HUANG	3ad0b60c4b	chore(metric-engine): set default compaction time window for data region (#7474 ) chore: set compaction time window for metric engine data region to 1 day by default Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>	2025-12-25 03:55:17 +00:00
Weny Xu	e1b18614ee	feat(mito2): implement `ApplyStagingManifest` request handling (#7456 ) * feat(mito2): implement `ApplyStagingManifest` request handling Signed-off-by: WenyXu <wenymedia@gmail.com> * chore: apply suggestions from CR Signed-off-by: WenyXu <wenymedia@gmail.com> * chore: fmt Signed-off-by: WenyXu <wenymedia@gmail.com> * chore: apply suggestions from CR Signed-off-by: WenyXu <wenymedia@gmail.com> * chore: apply suggestions from CR Signed-off-by: WenyXu <wenymedia@gmail.com> * fix: fix logic Signed-off-by: WenyXu <wenymedia@gmail.com> * chore: update proto Signed-off-by: WenyXu <wenymedia@gmail.com> --------- Signed-off-by: WenyXu <wenymedia@gmail.com>	2025-12-24 09:05:09 +00:00
Weny Xu	f7d5c87ac0	feat: introduce `copy_region_from` for mito engine (#7389 ) * feat: introduce `copy_region_from` Signed-off-by: WenyXu <wenymedia@gmail.com> * fix: fix clippy Signed-off-by: WenyXu <wenymedia@gmail.com> * chore: apply suggestions from CR Signed-off-by: WenyXu <wenymedia@gmail.com> * chore: apply suggestions Signed-off-by: WenyXu <wenymedia@gmail.com> * chore: apply suggestions from CR Signed-off-by: WenyXu <wenymedia@gmail.com> * chore: apply suggestions from CR Signed-off-by: WenyXu <wenymedia@gmail.com> --------- Signed-off-by: WenyXu <wenymedia@gmail.com>	2025-12-16 06:12:06 +00:00
discord9	9197e818ec	refactor: use versioned index for index file (#7309 ) * refactor: use versioned index for index file Signed-off-by: discord9 <discord9@163.com> * fix: sst entry table Signed-off-by: discord9 <discord9@163.com> * update sqlness Signed-off-by: discord9 <discord9@163.com> * chore: unit type Signed-off-by: discord9 <discord9@163.com> * fix: missing version Signed-off-by: discord9 <discord9@163.com> * more fix build index Signed-off-by: discord9 <discord9@163.com> * fix: use proper index id Signed-off-by: discord9 <discord9@163.com> * pcr Signed-off-by: discord9 <discord9@163.com> * test: update Signed-off-by: discord9 <discord9@163.com> * clippy Signed-off-by: discord9 <discord9@163.com> * test: test_list_ssts fixed Signed-off-by: discord9 <discord9@163.com> * test: fix test Signed-off-by: discord9 <discord9@163.com> * feat: stuff Signed-off-by: discord9 <discord9@163.com> * fix: clean temp index file on abort&delete all index version when delete file Signed-off-by: discord9 <discord9@163.com> * docs: explain Signed-off-by: discord9 <discord9@163.com> * fix: actually clean up tmp dir Signed-off-by: discord9 <discord9@163.com> * clippy Signed-off-by: discord9 <discord9@163.com> * clean tmp dir only when write cache enabled Signed-off-by: discord9 <discord9@163.com> * refactor: add version to index cache Signed-off-by: discord9 <discord9@163.com> * per review Signed-off-by: discord9 <discord9@163.com> * test: update size Signed-off-by: discord9 <discord9@163.com> * per review Signed-off-by: discord9 <discord9@163.com> --------- Signed-off-by: discord9 <discord9@163.com>	2025-12-09 07:31:12 +00:00
Lei, HUANG	931556dbd3	perf(metric-engine)!: Replace mur3 with fxhash for faster TSID generation (#7316 ) * feat/change-tsid-gen: perf(metric-engine): replace mur3 with fxhash for faster TSID generation - Switches from mur3::Hasher128 to fxhash::FxHasher for TSID hashing - Pre-computes label-name hash when no nulls are present, avoiding redundant work - Adds fast-path for rows without nulls; falls back to slow path otherwise - Updates Cargo.toml and lockfile to reflect dependency change Signed-off-by: Lei, HUANG <mrsatangel@gmail.com> * feat/change-tsid-gen: fix: only check primary-key labels for null when re-using cached hash - Rename has_null() → has_null_labels() and restrict the check to the primary-key columns so that non-label NULLs do not force a full TSID re-computation. - Update expected hashes in tests to match the new logic. Signed-off-by: Lei, HUANG <mrsatangel@gmail.com> * feat/change-tsid-gen: test: add comprehensive TSID generation tests for label ordering and null handling Signed-off-by: Lei, HUANG <mrsatangel@gmail.com> * feat/change-tsid-gen: bench: add criterion benchmark for TSID generator - Compare original mur3 vs current fxhash fast/slow paths - Test 2, 5, 10 label sets plus null-value slow path - Add mur3 & criterion dev-deps; register bench target Signed-off-by: Lei, HUANG <mrsatangel@gmail.com> * feat/change-tsid-gen: test: stabilize metric-engine tests by fixing non-deterministic row order - Add ORDER BY to SELECTs in TTL tests to ensure consistent output - Update expected __tsid values after hash function change - Swap expected OTLP metric rows to match new ordering Signed-off-by: Lei, HUANG <mrsatangel@gmail.com> * feat/change-tsid-gen: refactor: simplify Default impls and remove redundant code - Replace manual Default for TsidGenerator with derive - Remove unnecessary into_iter() call - Simplify Option::unwrap_or_else to unwrap_or Signed-off-by: Lei, HUANG <mrsatangel@gmail.com> --------- Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>	2025-12-02 08:38:29 +00:00
Weny Xu	8346acb900	feat: introduce `EnterStagingRequest` for `RegionEngine` (#7261 ) * feat: introduce `EnterStagingRequest` for region engine Signed-off-by: WenyXu <wenymedia@gmail.com> * chore: apply suggestions from CR Signed-off-by: WenyXu <wenymedia@gmail.com> * chore: apply suggestions from CR Signed-off-by: WenyXu <wenymedia@gmail.com> * chore: apply suggestions from CR Signed-off-by: WenyXu <wenymedia@gmail.com> * refactor: improve error handling in staging mode entry Signed-off-by: WenyXu <wenymedia@gmail.com> * chore: apply suggestions from CR Signed-off-by: WenyXu <wenymedia@gmail.com> * chore: apply suggestions from CR Signed-off-by: WenyXu <wenymedia@gmail.com> --------- Signed-off-by: WenyXu <wenymedia@gmail.com>	2025-11-28 09:02:32 +00:00
Ruihang Xia	4c07d2d5de	fix: metric engine deadlock when altering a group of tables (#7308 ) Signed-off-by: Ruihang Xia <waynestxia@gmail.com>	2025-11-27 09:45:06 +00:00
Weny Xu	6b6d1ce7c4	feat: introduce `remap_manifests` for `RegionEngine` (#7265 ) * refactor: consolidate RegionManifestOptions creation logic Signed-off-by: WenyXu <wenymedia@gmail.com> * feat: introduce`remap_manifests` for `RegionEngine` Signed-off-by: WenyXu <wenymedia@gmail.com> * Apply suggestions from code review Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * chore: apply suggestions from CR Signed-off-by: WenyXu <wenymedia@gmail.com> --------- Signed-off-by: WenyXu <wenymedia@gmail.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>	2025-11-25 12:09:20 +00:00
LFC	4a7c16586b	refactor: remove `Vector`s from `RecordBatch` completely (#7184 ) * refactor: remove `Vector`s from `RecordBatch` completely Signed-off-by: luofucong <luofc@foxmail.com> * resolve PR comments Signed-off-by: luofucong <luofc@foxmail.com> * resolve PR comments Signed-off-by: luofucong <luofc@foxmail.com> --------- Signed-off-by: luofucong <luofc@foxmail.com>	2025-11-21 08:53:35 +00:00
fys	c5173fccfc	chore: add default value to sparse_primary_key_encoding config item (#7273 )	2025-11-21 08:22:55 +00:00
discord9	29bbff3c90	feat: gc worker only local regions&test (#7203 ) * feat: gc worker only on local region Signed-off-by: discord9 <discord9@163.com> * more check Signed-off-by: discord9 <discord9@163.com> * chore: stuff Signed-off-by: discord9 <discord9@163.com> * fix: ignore async index file for now Signed-off-by: discord9 <discord9@163.com> * fix: file removal rate calc Signed-off-by: discord9 <discord9@163.com> * chore: per review Signed-off-by: discord9 <discord9@163.com> * chore: per review Signed-off-by: discord9 <discord9@163.com> * clippy Signed-off-by: discord9 <discord9@163.com> --------- Signed-off-by: discord9 <discord9@163.com>	2025-11-18 02:45:09 +00:00
Weny Xu	47937961f6	feat(metric)!: enable sparse primary key encoding by default (#7195 ) * feat(metric): enable sparse primary key encoding by default Signed-off-by: WenyXu <wenymedia@gmail.com> * chore: update config.md Signed-off-by: WenyXu <wenymedia@gmail.com> * fix: fix unit tests Signed-off-by: WenyXu <wenymedia@gmail.com> * fix: fix unit tests Signed-off-by: WenyXu <wenymedia@gmail.com> * fix sqlness Signed-off-by: WenyXu <wenymedia@gmail.com> * Update src/mito-codec/src/key_values.rs Co-authored-by: Yingwen <realevenyag@gmail.com> * feat: only allow setting primary key encoding for metric engine Signed-off-by: evenyag <realevenyag@gmail.com> * feat: support deleting rows from logical region instead of physical region This keeps the behavior the same as put. It's easier to support sparse encoding for deleting logical regions. Now the metric engine doesn't support delete rows from physical region directly. Signed-off-by: evenyag <realevenyag@gmail.com> * test: update sqlness Signed-off-by: evenyag <realevenyag@gmail.com> * chore: remove unused error Signed-off-by: WenyXu <wenymedia@gmail.com> --------- Signed-off-by: WenyXu <wenymedia@gmail.com> Signed-off-by: evenyag <realevenyag@gmail.com> Co-authored-by: Yingwen <realevenyag@gmail.com>	2025-11-11 06:33:51 +00:00
Ruihang Xia	30192d9802	feat: disable default compression for `__op_type` column (#7196 ) * feat: disable default compression for `__op_type` column Signed-off-by: Ruihang Xia <waynestxia@gmail.com> * update test Signed-off-by: Ruihang Xia <waynestxia@gmail.com> * revert unrelated code Signed-off-by: Ruihang Xia <waynestxia@gmail.com> --------- Signed-off-by: Ruihang Xia <waynestxia@gmail.com>	2025-11-10 07:59:25 +00:00
Weny Xu	9de680f456	refactor: add support for batch region upgrade operations part2 (#7160 ) * add tests for metric engines Signed-off-by: WenyXu <wenymedia@gmail.com> * feat: catchup in background Signed-off-by: WenyXu <wenymedia@gmail.com> * refactor: replace sequential catchup with batch processing Signed-off-by: WenyXu <wenymedia@gmail.com> * chore: unit tests Signed-off-by: WenyXu <wenymedia@gmail.com> * remove single catchup Signed-off-by: WenyXu <wenymedia@gmail.com> * chore: remove unused error Signed-off-by: WenyXu <wenymedia@gmail.com> * chore: refine catchup tests Signed-off-by: WenyXu <wenymedia@gmail.com> * chore: add unit tests Signed-off-by: WenyXu <wenymedia@gmail.com> * chore: apply suggestions Signed-off-by: WenyXu <wenymedia@gmail.com> --------- Signed-off-by: WenyXu <wenymedia@gmail.com>	2025-11-03 06:01:38 +00:00
Weny Xu	6960a0183a	refactor: add support for batch region upgrade operations part1 (#7155 ) * refactor: convert UpgradeRegion instruction to batch operation Signed-off-by: WenyXu <wenymedia@gmail.com> * feat: introduce `handle_batch_catchup_requests` fn for mito engine Signed-off-by: WenyXu <wenymedia@gmail.com> * test: add tests Signed-off-by: WenyXu <wenymedia@gmail.com> * feat: introduce `handle_batch_catchup_requests` fn for metric engine Signed-off-by: WenyXu <wenymedia@gmail.com> * chore: suggestion and add ser/de tests Signed-off-by: WenyXu <wenymedia@gmail.com> * chore: add comments Signed-off-by: WenyXu <wenymedia@gmail.com> * fix: fix unit tests Signed-off-by: WenyXu <wenymedia@gmail.com> --------- Signed-off-by: WenyXu <wenymedia@gmail.com>	2025-10-31 03:08:38 +00:00
Sicong Hu	30894d7599	feat(mito): Optimize async index building with priority-based batching (#7034 ) * feat: add priority-based batching to IndexBuildScheduler Signed-off-by: SNC123 <sinhco@outlook.com> * fix: clean old puffin-related cache Signed-off-by: SNC123 <sinhco@outlook.com> * test: add test for IndexBuildScheduler Signed-off-by: SNC123 <sinhco@outlook.com> * feat: different index file id for read and async write Signed-off-by: SNC123 <sinhco@outlook.com> * feat: different index file id for delete Signed-off-by: SNC123 <sinhco@outlook.com> * chore: clippy Signed-off-by: SNC123 <sinhco@outlook.com> * fix: apply suggestions Signed-off-by: SNC123 <sinhco@outlook.com> * fix: apply comments Signed-off-by: SNC123 <sinhco@outlook.com> * combine files and index files Signed-off-by: SNC123 <sinhco@outlook.com> * feat: add index_file_id into ManifestSstEntry Signed-off-by: SNC123 <sinhco@outlook.com> * Update src/mito2/src/gc.rs Signed-off-by: SNC123 <sinhco@outlook.com> * resolve conflicts Signed-off-by: SNC123 <sinhco@outlook.com> * fix: sqlness Signed-off-by: SNC123 <sinhco@outlook.com> * chore: fmt Signed-off-by: SNC123 <sinhco@outlook.com> --------- Signed-off-by: SNC123 <sinhco@outlook.com>	2025-10-31 02:13:17 +00:00
shuiyisong	a20ac4f9e5	feat: prefix option for timestamp index and value column (#7125 ) * refactor: use GREPTIME_TIMESTAMP const Signed-off-by: shuiyisong <xixing.sys@gmail.com> * feat: add config for default ts col name Signed-off-by: shuiyisong <xixing.sys@gmail.com> * refactor: replace GREPTIME_TIMESTAMP with function get Signed-off-by: shuiyisong <xixing.sys@gmail.com> * chore: update config doc * fix: test Signed-off-by: shuiyisong <xixing.sys@gmail.com> * chore: remove opts on flownode and metasrv Signed-off-by: shuiyisong <xixing.sys@gmail.com> * chore: add validation for ts column name Signed-off-by: shuiyisong <xixing.sys@gmail.com> * chore: use get_or_init to avoid test error Signed-off-by: shuiyisong <xixing.sys@gmail.com> * chore: fmt Signed-off-by: shuiyisong <xixing.sys@gmail.com> * chore: update docs Signed-off-by: shuiyisong <xixing.sys@gmail.com> * chore: using empty string to disable prefix Signed-off-by: shuiyisong <xixing.sys@gmail.com> * chore: update comment Signed-off-by: shuiyisong <xixing.sys@gmail.com> * chore: address CR issues Signed-off-by: shuiyisong <xixing.sys@gmail.com> --------- Signed-off-by: shuiyisong <xixing.sys@gmail.com>	2025-10-27 08:00:03 +00:00
Yingwen	4c70b4c31d	feat: store estimated series num in file meta (#7126 ) * feat: add num_series to FileMeta Signed-off-by: evenyag <realevenyag@gmail.com> * feat: add SeriesEstimator to collect num_series Signed-off-by: evenyag <realevenyag@gmail.com> * fix: set num_series in compactor Signed-off-by: evenyag <realevenyag@gmail.com> * chore: print num_series in Debug for FileMeta Signed-off-by: evenyag <realevenyag@gmail.com> * style: fmt code Signed-off-by: evenyag <realevenyag@gmail.com> * style: fix clippy Signed-off-by: evenyag <realevenyag@gmail.com> * fix: increase series count when next ts <= last Signed-off-by: evenyag <realevenyag@gmail.com> * test: add tests for SeriesEstimator Signed-off-by: evenyag <realevenyag@gmail.com> * feat: add num_series to ssts_manifest table Signed-off-by: evenyag <realevenyag@gmail.com> * test: update sqlness tests Signed-off-by: evenyag <realevenyag@gmail.com> * test: fix metric engine list entry test Signed-off-by: evenyag <realevenyag@gmail.com> --------- Signed-off-by: evenyag <realevenyag@gmail.com>	2025-10-24 05:53:48 +00:00
Sicong Hu	a1af4dce0c	feat: implement three build types for async index build (#7029 ) * feat: impl four types index build Signed-off-by: SNC123 <sinhco@outlook.com> * test: add tests for four types index build Signed-off-by: SNC123 <sinhco@outlook.com> * test: add sqlness test for manual index build Signed-off-by: SNC123 <sinhco@outlook.com> * fix: add region request support and correct sqlness Signed-off-by: SNC123 <sinhco@outlook.com> * fix: update cargo.toml for proto and resolve conflicts Signed-off-by: SNC123 <sinhco@outlook.com> * fix: rebase Signed-off-by: SNC123 <sinhco@outlook.com> * chore: clippy Signed-off-by: SNC123 <sinhco@outlook.com> * fix: toml fmt and correct sqlness Signed-off-by: SNC123 <sinhco@outlook.com> * fix: correct sqlness result Signed-off-by: SNC123 <sinhco@outlook.com> * refactor: extract manual build logic Signed-off-by: SNC123 <sinhco@outlook.com> * apply suggestions Signed-off-by: SNC123 <sinhco@outlook.com> * feat: abort index build process Signed-off-by: SNC123 <sinhco@outlook.com> * clippy Signed-off-by: SNC123 <sinhco@outlook.com> * chore: wrap `should_abort_index` Signed-off-by: SNC123 <sinhco@outlook.com> * chore: clippy Signed-off-by: SNC123 <sinhco@outlook.com> --------- Signed-off-by: SNC123 <sinhco@outlook.com>	2025-10-21 02:48:28 +00:00
dennis zhuang	8a2371a05c	feat: supports large string (#7097 ) * feat: supports large string Signed-off-by: Dennis Zhuang <killme2008@gmail.com> * chore: add doc for extract_string_vector_values Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Signed-off-by: Dennis Zhuang <killme2008@gmail.com> * chore: refactor by cr comments Signed-off-by: Dennis Zhuang <killme2008@gmail.com> * chore: changes by cr comments Signed-off-by: Dennis Zhuang <killme2008@gmail.com> * refactor: extract_string_vector_values Signed-off-by: Dennis Zhuang <killme2008@gmail.com> * feat: remove large string type and refactor string vector Signed-off-by: Dennis Zhuang <killme2008@gmail.com> * chore: revert some changes Signed-off-by: Dennis Zhuang <killme2008@gmail.com> * feat: adds large string type Signed-off-by: Dennis Zhuang <killme2008@gmail.com> * chore: impl default for StringSizeType Signed-off-by: Dennis Zhuang <killme2008@gmail.com> * fix: tests and test compatibility Signed-off-by: Dennis Zhuang <killme2008@gmail.com> * test: update sqlness tests Signed-off-by: Dennis Zhuang <killme2008@gmail.com> * chore: remove panic Signed-off-by: Dennis Zhuang <killme2008@gmail.com> --------- Signed-off-by: Dennis Zhuang <killme2008@gmail.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>	2025-10-17 01:46:11 +00:00
Ning Sun	2e6ea1167f	refactor: update valueref coerce function name based on its semantics (#7098 )	2025-10-16 09:11:40 +00:00
LFC	8fe17d43d5	chore: update rust to nightly 2025-10-01 (#7069 ) * chore: update rust to nightly 2025-10-01 Signed-off-by: luofucong <luofc@foxmail.com> * chore: nix update --------- Signed-off-by: luofucong <luofc@foxmail.com> Co-authored-by: Ning Sun <sunning@greptime.com>	2025-10-11 07:30:52 +00:00
Ning Sun	749a5ab165	feat: struct value and vector (#7033 ) * feat: struct value Signed-off-by: Ning Sun <sunning@greptime.com> * feat: update for proto module * feat: wip struct type * feat: implement more vector operations * feat: make datatype and api * feat: reoslve some compilation issues * feat: resolve all compilation issues * chore: format update * test: resolve tests * test: test and refactor value-to-pb * feat: add more tests and fix for value types * chore: remove dbg * feat: test and fix iterator * fix: resolve struct_type issue * refactor: use vec for struct items * chore: update proto to main branch * refactor: address some of review issues * refactor: update for further review * Add validation on new methods * feat: update struct/list json serialization * refactor: reimplement get in struct_vector * refactor: struct vector functions * refactor: fix lint issue * refactor: address review comments --------- Signed-off-by: Ning Sun <sunning@greptime.com>	2025-10-10 21:49:51 +00:00
Lei, HUANG	c92ab4217f	fix: avoid truncating SST statistics during flush (#6977 ) fix/disable-parquet-stats-truncate: - Update `memcomparable` Dependency: Switched from crates.io to a Git repository for `memcomparable` in `Cargo.lock`, `mito-codec/Cargo.toml`, and removed it from `mito2/Cargo.toml`. - Enhance Parquet Writer Properties: Added `set_statistics_truncate_length` and `set_column_index_truncate_length` to `WriterProperties` in `parquet.rs`, `bulk/part.rs`, `partition_tree/data.rs`, and `writer.rs`. - Add Test for Corrupt Scan: Introduced a new test module `scan_corrupt.rs` in `mito2/src/engine` to verify handling of corrupt data. - Update Test Data: Modified test data in `flush.rs` to reflect changes in file sizes and sequences. Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>	2025-09-17 03:02:52 +00:00
Lei, HUANG	9096c5ebbf	chore: bump sequence on region edit (#6947 ) * chore/update-sequence-on-region-edit: ### Commit Message Refactor `get_last_seq_num` Method Across Engines - Change Return Type: Updated the `get_last_seq_num` method to return `Result<SequenceNumber, BoxedError>` instead of `Result<Option<SequenceNumber>, BoxedError>` in the following files: - `src/datanode/src/tests.rs` - `src/file-engine/src/engine.rs` - `src/metric-engine/src/engine.rs` - `src/metric-engine/src/engine/read.rs` - `src/mito2/src/engine.rs` - `src/query/src/optimizer/test_util.rs` - `src/store-api/src/region_engine.rs` - Enhance Region Edit Handling: Modified `RegionWorkerLoop` in `src/mito2/src/worker/handle_manifest.rs` to update file sequences during region edits. Signed-off-by: Lei, HUANG <mrsatangel@gmail.com> * add committed_sequence to RegionEdit Signed-off-by: Lei, HUANG <mrsatangel@gmail.com> * chore/update-sequence-on-region-edit: ### Commit Message Refactor sequence retrieval method - Renamed Method: Changed `get_last_seq_num` to `get_committed_sequence` across multiple files to better reflect its purpose of retrieving the latest committed sequence. - Affected files: `tests.rs`, `engine.rs` in `file-engine`, `metric-engine`, `mito2`, `test_util.rs`, and `region_engine.rs`. - Removed Unused Struct: Deleted `RegionSequencesRequest` struct from `region_request.rs` as it is no longer needed. Signed-off-by: Lei, HUANG <mrsatangel@gmail.com> * chore/update-sequence-on-region-edit: Add Committed Sequence Handling in Region Engine - `engine.rs`: Introduced a new test module `bump_committed_sequence_test` to verify committed sequence handling. - `bump_committed_sequence_test.rs`: Added a test to ensure the committed sequence is correctly updated and persisted across region reopenings. - `action.rs`: Updated `RegionManifest` and `RegionManifestBuilder` to include `committed_sequence` for tracking. - `manager.rs`: Adjusted manifest size assertion to accommodate new committed sequence data. - `opener.rs`: Implemented logic to override committed sequence during region opening. - `version.rs`: Added `set_committed_sequence` method to update the committed sequence in `VersionControl`. Signed-off-by: Lei, HUANG <mrsatangel@gmail.com> * chore/update-sequence-on-region-edit: Enhance `test_bump_committed_sequence` in `bump_committed_sequence_test.rs` - Updated the test to include row operations using `build_rows`, `put_rows`, and `rows_schema` to verify the committed sequence behavior. - Adjusted assertions to reflect changes in committed sequence after row operations and region edits. - Added comments to clarify the expected behavior of committed sequence after reopening the region and replaying the WAL. Signed-off-by: Lei, HUANG <mrsatangel@gmail.com> * chore/update-sequence-on-region-edit: Enhance Region Sequence Management - `bump_committed_sequence_test.rs`: Updated test to handle region reopening and sequence management, ensuring committed sequences are correctly set and verified after edits. - `opener.rs`: Improved committed sequence handling by overriding it only if the manifest's sequence is greater than the replayed sequence. Added logging for mutation sequence replay. - `region_write_ctx.rs`: Modified `push_mutation` and `push_bulk` methods to adopt sequence numbers from parameters, enhancing sequence management during write operations. - `handle_write.rs`: Updated `RegionWorkerLoop` to pass sequence numbers in `push_bulk` and `push_mutation` methods, ensuring consistent sequence handling. Signed-off-by: Lei, HUANG <mrsatangel@gmail.com> * chore/update-sequence-on-region-edit: ### Remove Debug Logging from `opener.rs` - Removed debug logging for mutation sequences in `opener.rs` to clean up the output and improve performance. Signed-off-by: Lei, HUANG <mrsatangel@gmail.com> --------- Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>	2025-09-16 16:22:25 +00:00
Zhenchi	db42ad42dc	feat: add visible to sst entry for staging mode (#6964 ) Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>	2025-09-15 09:05:54 +00:00
Zhenchi	21ee981b49	feat: add origin_region_id and node_id to sst entry (#6937 ) Signed-off-by: Zhenchi <zhongzc_arch@outlook.com>	2025-09-09 09:45:15 +00:00
Ruihang Xia	c9377e7c5a	build: bump rust edition to 2024 (#6920 ) * bump edition Signed-off-by: Ruihang Xia <waynestxia@gmail.com> * format Signed-off-by: Ruihang Xia <waynestxia@gmail.com> * gen keyword Signed-off-by: Ruihang Xia <waynestxia@gmail.com> * lifetime and env var Signed-off-by: Ruihang Xia <waynestxia@gmail.com> * one more gen fix Signed-off-by: Ruihang Xia <waynestxia@gmail.com> * lifetime of temporaries in tail expressions Signed-off-by: Ruihang Xia <waynestxia@gmail.com> * format again Signed-off-by: Ruihang Xia <waynestxia@gmail.com> * clippy nested if Signed-off-by: Ruihang Xia <waynestxia@gmail.com> * clippy let and return Signed-off-by: Ruihang Xia <waynestxia@gmail.com> --------- Signed-off-by: Ruihang Xia <waynestxia@gmail.com>	2025-09-08 02:37:18 +00:00
Weny Xu	658d07bfc8	feat: add `written_bytes_since_open` column to `region_statistics` table (#6904 ) * feat: add `write_bytes` column to `region_statistics` table Signed-off-by: WenyXu <wenymedia@gmail.com> * chore: update comments Signed-off-by: WenyXu <wenymedia@gmail.com> * chore: rename `write_bytes` to `written_bytes` Signed-off-by: WenyXu <wenymedia@gmail.com> * chore: rename `written_bytes` to `written_bytes_since_open` Signed-off-by: WenyXu <wenymedia@gmail.com> * chore: apply suggestions Signed-off-by: WenyXu <wenymedia@gmail.com> --------- Signed-off-by: WenyXu <wenymedia@gmail.com>	2025-09-05 07:27:30 +00:00

1 2 3 4

186 Commits