* feat: use arrow-pg for encode_row
* refactor: remove bytea and datetime module
* feat: port more encodings to arrow-pg
* feat: implement intervalstyle
* chore: format
* chore: remove error that is no longer used
* chore: use released arrow-pg
* Apply suggestions from code review
Co-authored-by: LFC <990479+MichaelScofield@users.noreply.github.com>
---------
Co-authored-by: LFC <990479+MichaelScofield@users.noreply.github.com>
* feat: impl vector index scan in storage
Signed-off-by: Dennis Zhuang <killme2008@gmail.com>
* feat: fallback to read remote blob when blob not found
Signed-off-by: Dennis Zhuang <killme2008@gmail.com>
* chore: refactor encoding and decoding and apply suggestions
Signed-off-by: Dennis Zhuang <killme2008@gmail.com>
* fix: license
Signed-off-by: Dennis Zhuang <killme2008@gmail.com>
* test: add apply_with_k tests
Signed-off-by: Dennis Zhuang <killme2008@gmail.com>
* chore: apply suggestions
Signed-off-by: Dennis Zhuang <killme2008@gmail.com>
* fix: forgot to align nulls when the vector column is not in the batch
Signed-off-by: Dennis Zhuang <killme2008@gmail.com>
* test: add test for vector column is not in a batch while buiilding
Signed-off-by: Dennis Zhuang <killme2008@gmail.com>
---------
Signed-off-by: Dennis Zhuang <killme2008@gmail.com>
* feat: add repartition procedure factory support to DdlManager
- Introduce RepartitionProcedureFactory trait for creating and registering
repartition procedures
- Implement DefaultRepartitionProcedureFactory for metasrv with full support
- Implement StandaloneRepartitionProcedureFactory for standalone (unsupported)
- Add procedure loader registration for RepartitionProcedure and
RepartitionGroupProcedure
- Add helper methods to TableMetadataAllocator for allocator access
- Add error types for repartition procedure operations
- Update DdlManager to accept and use RepartitionProcedureFactoryRef
Signed-off-by: WenyXu <wenymedia@gmail.com>
* feat: integrate repartition procedure into DdlManager
- Add submit_repartition_task() to handle repartition from alter table
- Route Repartition operations in submit_alter_table_task() to repartition factory
- Refactor: rename submit_procedure() to execute_procedure_and_wait()
- Make all DDL operations wait for completion by default
- Add submit_procedure() for fire-and-forget submissions
- Add CreateRepartitionProcedure error type
- Add placeholder Repartition handling in grpc-expr (unsupported)
- Update greptime-proto dependency
Signed-off-by: WenyXu <wenymedia@gmail.com>
* feat: implement ALTER TABLE REPARTITION procedure submission
Signed-off-by: WenyXu <wenymedia@gmail.com>
* refactor(repartition): handle central region in apply staging manifest
- Introduce ApplyStagingManifestInstructions struct to organize instructions
- Add special handling for central region when applying staging manifests
- Transition state from UpdateMetadata to RepartitionEnd after applying staging manifests
- Remove next_state() method in RepartitionStart and inline state transitions
- Improve logging and expression serialization in DDL statement executor
- Move repartition tests from standalone to distributed test suite
Signed-off-by: WenyXu <wenymedia@gmail.com>
* chore: apply suggestions from CR
Signed-off-by: WenyXu <wenymedia@gmail.com>
* chore: update proto
Signed-off-by: WenyXu <wenymedia@gmail.com>
---------
Signed-off-by: WenyXu <wenymedia@gmail.com>
* refactor: rename WalOptionsAllocator to WalProvider
The name "WalOptionsAllocator" was misleading because:
- For RaftEngine variant, it doesn't actually allocate anything
- The actual allocation logic lives in KafkaTopicPool
"WalProvider" better describes its role as providing WAL options
based on the configured WAL backend (RaftEngine or Kafka).
Changes:
- Rename `WalOptionsAllocator` to `WalProvider`
- Rename `WalOptionsAllocatorRef` to `WalProviderRef`
- Rename `build_wal_options_allocator` to `build_wal_provider`
- Rename module `wal_options_allocator` to `wal_provider`
- Rename error types: `BuildWalOptionsAllocator` -> `BuildWalProvider`,
`StartWalOptionsAllocator` -> `StartWalProvider`
Signed-off-by: WenyXu <wenymedia@gmail.com>
* refactor(meta): extract allocator traits from TableMetadataAllocator
Refactor TableMetadataAllocator to use trait-based dependency injection
for better testability and separation of concerns.
Changes:
- Add `ResourceIdAllocator` trait to abstract ID allocation
- Add `WalOptionsAllocator` trait to abstract WAL options allocation
- Implement traits for `Sequence` and `WalProvider`
- Remove duplicate `allocate_region_wal_options` function
- Rename `table_id_sequence` to `table_id_allocator` for consistency
- Rename `TableIdSequenceHandler` to `TableIdAllocatorHandler`
Signed-off-by: WenyXu <wenymedia@gmail.com>
* feat(meta): add max_region_number tracking to PhysicalTableRouteValue
Add `max_region_number` field to track the highest region number ever
allocated for a table. This value only increases when regions are added
and never decreases when regions are dropped, ensuring unique region
numbers across the table's lifetime.
Changes:
- Add `max_region_number` field to `PhysicalTableRouteValue`
- Implement custom `Deserialize` for backward compatibility
- Update `update_region_routes` to maintain max_region_number
- Calculate max_region_number from region_routes in `new()`
Signed-off-by: WenyXu <wenymedia@gmail.com>
* refactor: extract TableRouteAllocator trait from TableMetadataAllocator
- Add TableRouteAllocator trait for abstracting region route allocation
- Implement blanket impl for all PeerAllocator types
- Add PeerAllocator impl for Arc<T> to support trait object delegation
- Update TableMetadataAllocator to use TableRouteAllocatorRef
Signed-off-by: WenyXu <wenymedia@gmail.com>
* refactor: rename TableRouteAllocator to RegionRoutesAllocator
- Rename table_route.rs to region_routes.rs
- Rename TableRouteAllocator trait to RegionRoutesAllocator
- Rename wal_option.rs to wal_options.rs for consistency
- Update TableMetadataAllocator to use new naming
Signed-off-by: WenyXu <wenymedia@gmail.com>
* feat(meta-srv): implement region allocation for repartition procedure
This commit implements the region allocation phase of the repartition procedure,
which handles allocating new regions when a table needs to be split into more partitions.
Key changes:
- Refactor `RegionRoutesAllocator::allocate` to accept `(region_number, partition_expr)` tuples
for more flexible region number assignment
- Simplify `AllocationPlanEntry` by removing `regions_to_allocate` and `regions_to_deallocate`
fields (now derived from source/target counts)
- Add `convert_allocation_plan_to_repartition_plan` function to handle allocation, equal,
and deallocation cases
- Fix `RepartitionPlanEntry::allocate_regions()` to return target regions (was incorrectly
returning source regions)
- Implement complete `AllocateRegion` state with:
- Region route allocation via `RegionRoutesAllocator`
- WAL options allocation via `WalOptionsAllocator`
- Operating region registration for concurrency control
- Region creation on datanodes via `CreateTableExecutor`
- Table route metadata update
- Add `TableRouteValue::max_region_number()` helper method
- Add comprehensive unit tests for plan conversion and allocation logic
Signed-off-by: WenyXu <wenymedia@gmail.com>
* chore: apply suggestions from CR
Signed-off-by: WenyXu <wenymedia@gmail.com>
* chore: apply suggestions from CR
Signed-off-by: WenyXu <wenymedia@gmail.com>
---------
Signed-off-by: WenyXu <wenymedia@gmail.com>
* feat: impl vector index building
Signed-off-by: Dennis Zhuang <killme2008@gmail.com>
* feat: supports flat format
Signed-off-by: Dennis Zhuang <killme2008@gmail.com>
* ci: add vector_index feature to test
Signed-off-by: Dennis Zhuang <killme2008@gmail.com>
* chore: apply suggestions
Signed-off-by: Dennis Zhuang <killme2008@gmail.com>
* chore: apply suggestions from copilot
Signed-off-by: Dennis Zhuang <killme2008@gmail.com>
---------
Signed-off-by: Dennis Zhuang <killme2008@gmail.com>
* feat/allow-one-to-many-pipeline:
### Enhance Pipeline Processing for One-to-Many Transformations
- **Support One-to-Many Transformations**:
- Updated `processor.rs`, `etl.rs`, `vrl_processor.rs`, and `greptime.rs` to handle one-to-many transformations by allowing VRL processors to return arrays, expanding each element into separate rows.
- Introduced `transform_array_elements` and `values_to_rows` functions to facilitate this transformation.
- **Error Handling Enhancements**:
- Added new error types in `error.rs` to handle cases where array elements are not objects and for transformation failures.
- **Testing Enhancements**:
- Added tests in `pipeline.rs` to verify one-to-many transformations, single object processing, and error handling for non-object array elements.
- **Context Management**:
- Modified `ctx_req.rs` to clone `ContextOpt` when adding rows, ensuring correct context management during transformations.
- **Server Pipeline Adjustments**:
- Updated `pipeline.rs` in `servers` to handle transformed outputs with one-to-many row expansions, ensuring correct row padding and request formation.
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* feat/allow-one-to-many-pipeline:
Add one-to-many VRL pipeline test in `http.rs`
- Introduced `test_pipeline_one_to_many_vrl` to verify VRL processor's ability to expand a single input row into multiple output rows.
- Updated `http_tests!` macro to include the new test.
- Implemented test scenarios for single and multiple input rows, ensuring correct data transformation and row count validation.
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* feat/allow-one-to-many-pipeline:
### Add Tests for VRL Pipeline Transformations
- **File:** `src/pipeline/src/etl.rs`
- Added tests for one-to-many VRL pipeline expansion to ensure multiple output rows from a single input.
- Introduced tests to verify backward compatibility for single object output.
- Implemented tests to confirm zero rows are produced from empty arrays.
- Added validation tests to ensure array elements must be objects.
- Developed tests for one-to-many transformations with table suffix hints from VRL.
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* feat/allow-one-to-many-pipeline:
### Enhance Pipeline Transformation with Per-Row Table Suffixes
- **`src/pipeline/src/etl.rs`**: Updated `TransformedOutput` to include per-row table suffixes, allowing for more flexible routing of transformed data. Modified `PipelineExecOutput` and related methods to
handle the new structure.
- **`src/pipeline/src/etl/transform/transformer/greptime.rs`**: Enhanced `values_to_rows` to support per-row table suffix extraction and application.
- **`src/pipeline/tests/common.rs`** and **`src/pipeline/tests/pipeline.rs`**: Adjusted tests to validate the new per-row table suffix functionality, ensuring backward compatibility and correct behavior in
one-to-many transformations.
- **`src/servers/src/pipeline.rs`**: Modified `run_custom_pipeline` to process transformed outputs with per-row table suffixes, grouping rows by `(opt, table_name)` for insertion.
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* feat/allow-one-to-many-pipeline:
### Update VRL Processor Type Checks
- **File:** `vrl_processor.rs`
- **Changes:** Updated type checking logic to use `contains_object()` and `contains_array()` methods instead of `is_object()` and `is_array()`. This change ensures
compatibility with VRL type inference that may return multiple possible types.
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* feat/allow-one-to-many-pipeline:
- **Enhance Error Handling**: Added new error types `ArrayElementMustBeObjectSnafu` and `TransformArrayElementSnafu` to improve error handling in `etl.rs` and `greptime.rs`.
- **Refactor Error Usage**: Moved error usage declarations in `transform_array_elements` and `values_to_rows` functions to the top of the file for better organization in `etl.rs` and `greptime.rs`.
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* feat/allow-one-to-many-pipeline:
### Update `greptime.rs` to Enhance Error Handling
- **Error Handling**: Modified the `values_to_rows` function to handle invalid array elements based on the `skip_error` parameter. If `skip_error` is true, invalid elements are skipped; otherwise, an error is returned.
- **Testing**: Added unit tests in `greptime.rs` to verify the behavior of `values_to_rows` with different `skip_error` settings, ensuring correct processing of valid objects and appropriate error handling for invalid elements.
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* feat/allow-one-to-many-pipeline:
### Commit Summary
- **Enhance `TransformedOutput` Structure**: Refactored `TransformedOutput` to use a `HashMap` for grouping rows by `ContextOpt`, allowing for per-row configuration options. Updated methods in `PipelineExecOutput` to support the new structure (`src/pipeline/src/etl.rs`).
- **Add New Transformation Method**: Introduced `transform_array_elements_to_hashmap` to handle array inputs with per-row `ContextOpt` in `HashMap` format (`src/pipeline/src/etl.rs`).
- **Update Pipeline Execution**: Modified `run_custom_pipeline` to process `TransformedOutput` using the new `HashMap` structure, ensuring rows are grouped by `ContextOpt` and table name (`src/servers/src/pipeline.rs`).
- **Add Tests for New Structure**: Implemented tests to verify the functionality of the new `HashMap` structure in `TransformedOutput`, including scenarios for one-to-many mapping, single object input, and empty arrays (`src/pipeline/src/etl.rs`).
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* feat/allow-one-to-many-pipeline:
### Refactor `values_to_rows` to Return `HashMap` Grouped by `ContextOpt`
- **`etl.rs`**:
- Updated `values_to_rows` to return a `HashMap` grouped by `ContextOpt` instead of a vector.
- Adjusted logic to handle single object and array inputs, ensuring rows are grouped by their `ContextOpt`.
- Modified functions to extract rows from default `ContextOpt` and apply table suffixes accordingly.
- **`greptime.rs`**:
- Enhanced `values_to_rows` to handle errors gracefully with `skip_error` logic.
- Added logic to group rows by `ContextOpt` for array inputs.
- **Tests**:
- Updated existing tests to validate the new `HashMap` return structure.
- Added a new test to verify correct grouping of rows by per-element `ContextOpt`.
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* feat/allow-one-to-many-pipeline:
### Refactor and Enhance Error Handling in ETL Pipeline
- **Refactored Functionality**:
- Replaced `transform_array_elements` with `transform_array_elements_by_ctx` in `etl.rs` to streamline transformation logic and improve error handling.
- Updated `values_to_rows` in `greptime.rs` to use `or_default` for cleaner code.
- **Enhanced Error Handling**:
- Introduced `unwrap_or_continue_if_err` macro in `etl.rs` to allow skipping errors based on pipeline context, improving robustness in data processing.
These changes enhance the maintainability and error resilience of the ETL pipeline.
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* feat/allow-one-to-many-pipeline:
### Update `Row` Handling in ETL Pipeline
- **Refactor `Row` Type**: Introduced `RowWithTableSuffix` type alias to simplify handling of rows with optional table suffixes across the ETL pipeline.
- **Modify Function Signatures**: Updated function signatures in `etl.rs` and `greptime.rs` to use `RowWithTableSuffix` for better clarity and consistency.
- **Enhance Test Coverage**: Adjusted test logic in `greptime.rs` to align with the new `RowWithTableSuffix` type, ensuring correct grouping and processing of rows by TTL.
Files affected: `etl.rs`, `greptime.rs`.
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
---------
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* feat/change-tsid-gen:
perf(metric-engine): replace mur3 with fxhash for faster TSID generation
- Switches from mur3::Hasher128 to fxhash::FxHasher for TSID hashing
- Pre-computes label-name hash when no nulls are present, avoiding redundant work
- Adds fast-path for rows without nulls; falls back to slow path otherwise
- Updates Cargo.toml and lockfile to reflect dependency change
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* feat/change-tsid-gen:
fix: only check primary-key labels for null when re-using cached hash
- Rename has_null() → has_null_labels() and restrict the check to the
primary-key columns so that non-label NULLs do not force a full
TSID re-computation.
- Update expected hashes in tests to match the new logic.
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* feat/change-tsid-gen:
test: add comprehensive TSID generation tests for label ordering and null handling
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* feat/change-tsid-gen:
bench: add criterion benchmark for TSID generator
- Compare original mur3 vs current fxhash fast/slow paths
- Test 2, 5, 10 label sets plus null-value slow path
- Add mur3 & criterion dev-deps; register bench target
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* feat/change-tsid-gen:
test: stabilize metric-engine tests by fixing non-deterministic row order
- Add ORDER BY to SELECTs in TTL tests to ensure consistent output
- Update expected __tsid values after hash function change
- Swap expected OTLP metric rows to match new ordering
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* feat/change-tsid-gen:
refactor: simplify Default impls and remove redundant code
- Replace manual Default for TsidGenerator with derive
- Remove unnecessary into_iter() call
- Simplify Option::unwrap_or_else to unwrap_or
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
---------
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* feat(mysql): add SHOW WARNINGS support and return warnings for unsupported SET variables
Signed-off-by: Dennis Zhuang <killme2008@gmail.com>
* feat(function): add MySQL IF() function and PostgreSQL description functions for connector compatibility
Signed-off-by: Dennis Zhuang <killme2008@gmail.com>
* fix: show tables for mysql
Signed-off-by: Dennis Zhuang <killme2008@gmail.com>
* fix: partitions table in information_schema and add starrocks external catalog compatibility
Signed-off-by: Dennis Zhuang <killme2008@gmail.com>
* refactor: async udf
Signed-off-by: Dennis Zhuang <killme2008@gmail.com>
* fix: set warnings
Signed-off-by: Dennis Zhuang <killme2008@gmail.com>
* feat: impl pg_my_temp_schema and make description functions simple
Signed-off-by: Dennis Zhuang <killme2008@gmail.com>
* test: add test for issue 7313
Signed-off-by: Dennis Zhuang <killme2008@gmail.com>
* feat: apply suggestions
Signed-off-by: Dennis Zhuang <killme2008@gmail.com>
* fix: partition_expression and partition_description
Signed-off-by: Dennis Zhuang <killme2008@gmail.com>
* fix: test
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Signed-off-by: Dennis Zhuang <killme2008@gmail.com>
* fix: unit tests
Signed-off-by: Dennis Zhuang <killme2008@gmail.com>
* fix: saerch_path only works for pg
Signed-off-by: Dennis Zhuang <killme2008@gmail.com>
* feat: improve warnings processing
Signed-off-by: Dennis Zhuang <killme2008@gmail.com>
* fix: warnings while writing affected rows and refactor
Signed-off-by: Dennis Zhuang <killme2008@gmail.com>
* chore: improve ShobjDescriptionFunction signature
Signed-off-by: Dennis Zhuang <killme2008@gmail.com>
* refactor: array_to_boolean
Signed-off-by: Dennis Zhuang <killme2008@gmail.com>
---------
Signed-off-by: Dennis Zhuang <killme2008@gmail.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
* feat: divide parquet and puffin index
Signed-off-by: evenyag <realevenyag@gmail.com>
* feat: download index files when we open the region
Signed-off-by: evenyag <realevenyag@gmail.com>
* feat: use different label for parquet/puffin
Signed-off-by: evenyag <realevenyag@gmail.com>
* feat: control parallelism and cache size by env
Signed-off-by: evenyag <realevenyag@gmail.com>
* fix: change gauge to counter
Signed-off-by: evenyag <realevenyag@gmail.com>
* fix: correct file type labels in file cache
Signed-off-by: evenyag <realevenyag@gmail.com>
* refactor: move env to config and change cache ratio to percent
Signed-off-by: evenyag <realevenyag@gmail.com>
* feat: checks capacity before download and refine metrics
Signed-off-by: evenyag <realevenyag@gmail.com>
* refactor: change open to return MitoRegionRef
Signed-off-by: evenyag <realevenyag@gmail.com>
* refactor: extract download to FileCache
Signed-off-by: evenyag <realevenyag@gmail.com>
* feat: run load cache task in write cache
Signed-off-by: evenyag <realevenyag@gmail.com>
* feat: check region state before downloading files
Signed-off-by: evenyag <realevenyag@gmail.com>
* chore: update config docs and test
Signed-off-by: evenyag <realevenyag@gmail.com>
* fix: use file id from index_file_id to compute puffin key
Signed-off-by: evenyag <realevenyag@gmail.com>
* fix: skip loading cache in some states
Signed-off-by: evenyag <realevenyag@gmail.com>
---------
Signed-off-by: evenyag <realevenyag@gmail.com>