* meta-srv: introduce SelectorWrapper to wrap configured selector
- `SelectorWrapper trait`: add `SelectorWrapper` trait and `SelectorWrapperRef` in `src/meta-srv/src/metasrv.rs` to support decorating selectors
- `metasrv bootstrap`: apply `SelectorWrapperRef` in `src/meta-srv/src/bootstrap.rs` to wrap the configured selector, and add unit tests to verify the behavior
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* feat(meta-srv): add selector factory plugin hook
- `SelectorFactory`: replace selector wrapper registration with a bootstrap-time factory context in `src/meta-srv/src/metasrv.rs`
- `metasrv_builder`: build the configured base selector before invoking plugin factories in `src/meta-srv/src/bootstrap.rs`
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
---------
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* feat: expose node info for placement selectors
Return `NodeInfo` from `PeerDiscovery` methods and keep OSS selectors mapping back to `Peer`.
Carry `__greptime_origin_frontend.addr` from frontend create-table DDLs into selector `extensions`, and thread `PeerAllocContext` through table-route allocation.
Persist datanode `NodeInfo` when heartbeat stats are absent so collected env vars remain available after restart.
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* fix: skip datanode node info without stats
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* fix: avoid unnecessary workload clones
Skip workload cloning for inactive nodes and for active node-info lookups without workload filters.
Files: `src/meta-srv/src/discovery/utils.rs`
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* fix: require frontend origin address
Require `StatementExecutor` to carry a concrete frontend origin address and always attach it to meta DDL query contexts.
Files: `src/operator/src/statement.rs`, `src/operator/src/statement/ddl.rs`, `src/operator/src/utils.rs`, `src/frontend/src/instance/builder.rs`, `src/frontend/src/heartbeat.rs`, `src/flow/src/server.rs`, `src/cmd/src/standalone.rs`, `src/cmd/src/flownode.rs`.
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* refactor: reuse resolved frontend address
Resolve the frontend peer address once in the frontend builder, store it on the instance, and reuse it for heartbeat and flow invoker origins.
Files: `src/frontend/src/instance/builder.rs`, `src/frontend/src/instance.rs`, `src/frontend/src/heartbeat.rs`, `src/cmd/src/frontend.rs`, `src/cmd/src/standalone.rs`, `src/frontend/src/frontend.rs`, `src/frontend/src/heartbeat/tests.rs`.
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* fix: preserve datanode lease liveness
Filter active datanode node infos through lease timestamps and workloads while preserving node info fields such as reported env vars.
Files: `src/meta-srv/src/discovery/utils.rs`, `src/meta-srv/src/discovery/lease.rs`.
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* Remove stale datanode lease helper
- `discovery`: remove the obsolete `alive_datanodes` helper and related tests in `src/meta-srv/src/discovery/utils.rs` and `src/meta-srv/src/discovery/lease.rs`
- `integration`: update cluster and standalone setup paths in `tests-integration/src/cluster.rs` and `tests-integration/src/standalone.rs`
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* feat/env-based-region-selector-oss: simplify lease discovery
- `lease-discovery`: simplify logic and remove unused utilities in `src/meta-srv/src/discovery/lease.rs` and `src/meta-srv/src/discovery/utils.rs`
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
---------
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* feat: support reporting env vars in heartbeat messages to metasrv
Add `heartbeat_env_vars` config option for datanode and frontend. When
configured, the specified environment variable values are read at startup
and sent to metasrv in every heartbeat via the `extensions` map. Metasrv
extracts and stores them in `NodeInfo` for use in routing decisions
(e.g. AZ-aware region placement).
- Add `EnvVars` helper in `common/meta/src/datanode.rs` following the
existing `GcStat` extension pattern with `into_extensions`/`from_extensions`
- Add `env_vars: HashMap<String, String>` field to `NodeInfo` in
`common/meta/src/cluster.rs` with `#[serde(default)]` for backward compat
- Add `heartbeat_env_vars: Vec<String>` config field to `DatanodeOptions`,
`FrontendOptions`, and `StandaloneOptions`
- Inject env vars into heartbeat `extensions` in both datanode and frontend
heartbeat tasks (`datanode/src/heartbeat.rs`, `frontend/src/heartbeat.rs`)
- Extract env vars from `req.extensions` in all three metasrv
`CollectXxxClusterInfoHandler`s
- Update `NodeInfo` construction sites in `meta-client`,
`discovery/lease.rs`, and `standalone/information_extension.rs`
- Update expected TOML output in `tests-integration/tests/http.rs`
- Add unit tests for `EnvVars` round-trip and `NodeInfo` backward compat
Signed-off-by: Lei, HUANG <leih@nvidia.com>
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* refactor: address heartbeat env review feedback
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* chore: log error on deserialization failure
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* refactor: send heartbeat env vars once
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* fix: resend heartbeat env vars after reconnect
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* revert: keep env vars in every heartbeat
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
---------
Signed-off-by: Lei, HUANG <leih@nvidia.com>
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* refactor(meta-srv): move group rollback ownership to parent repartition procedure
- Parent procedure now owns partial rollback based on failed/unknown subprocedures
- rollback order: group metadata first, then allocated-region cleanup
- original_target_routes captured during build-plan, persisted in RepartitionPlanEntry
- rollback_group_metadata_routes moved to utils as parent-owned helper
- Group subprocedure no longer supports rollback (rollback_supported = false)
- Removed UpdateMetadata::RollbackStaging from group state machine
- Deleted redundant group rollback tests and helpers
BREAKING CHANGE: group Procedure no longer handles rollback; parent procedure
is responsible for crash recovery and selecting which plans to roll back.
Signed-off-by: WenyXu <wenymedia@gmail.com>
* chore: update comments
Signed-off-by: WenyXu <wenymedia@gmail.com>
* chore: apply suggestions from CR
Signed-off-by: WenyXu <wenymedia@gmail.com>
---------
Signed-off-by: WenyXu <wenymedia@gmail.com>
* refactor: remove the `RawTableMeta` and `RawTableInfo` to make codes more concise
Signed-off-by: luofucong <luofc@foxmail.com>
* fix ci
Signed-off-by: luofucong <luofc@foxmail.com>
* fix ci
Signed-off-by: luofucong <luofc@foxmail.com>
---------
Signed-off-by: luofucong <luofc@foxmail.com>
* fix: send get file ref to all regions
Signed-off-by: discord9 <discord9@163.com>
* refactor: return err on fail to get table route
Signed-off-by: discord9 <discord9@163.com>
* refactor: batch get
Signed-off-by: discord9 <discord9@163.com>
* chore: add loggin in all places
Signed-off-by: discord9 <discord9@163.com>
---------
Signed-off-by: discord9 <discord9@163.com>
* feat: add sync region instruction for repartition procedure
This commit introduces a new sync region instruction and integrates it
into the repartition procedure flow, specifically for metric engine tables.
Changes:
- Add SyncRegion instruction type and SyncRegionsReply in instruction.rs
- Implement SyncRegionHandler in datanode to handle sync region requests
- Add SyncRegion state in repartition procedure to sync newly allocated regions
- Integrate sync region step after enter_staging_region for metric engine tables
- Add sync_region flag and allocated_region_ids to PersistentContext
- Make SyncRegionFromRequest serializable for instruction transmission
- Add test utilities and mock support for sync region operations
The sync region step is conditionally executed based on the table engine type,
ensuring that newly allocated regions in metric engine tables are properly
synced from their source regions before proceeding with manifest remapping.
Signed-off-by: WenyXu <wenymedia@gmail.com>
* chore: add logs
Signed-off-by: WenyXu <wenymedia@gmail.com>
* feat(repartition): improve staging region handling and support metric engine repartition
- Reorder sync region flow: move SyncRegion from EnterStagingRegion to RepartitionStart to sync before applying staging
- Add ExitStaging metadata update state to properly clear staging leader info after repartition completes
- Update build_template_from_raw_table_info to optionally skip metric engine internal columns when creating region requests
- Fix region state transition: set_dropping now expects specific state (Staging or Writable) for proper validation
- Adjust region drop and copy handlers to handle staging regions correctly
- Add comprehensive test cases for metric engine SPLIT/MERGE partition operations on physical tables with logical tables
- Improve logging for table route updates, region drops, and repartition operations
Signed-off-by: WenyXu <wenymedia@gmail.com>
* refactor: removes code duplication
Signed-off-by: WenyXu <wenymedia@gmail.com>
* fix: update result
Signed-off-by: WenyXu <wenymedia@gmail.com>
* chore: refine comments
Signed-off-by: WenyXu <wenymedia@gmail.com>
* feat: add error strategy support for flush region and flush pending deallocate regions
- **Add `ErrorStrategy` enum** in `procedure/utils.rs`:
- Supports `Ignore` and `Retry` strategies for error handling
- Refactor `flush_region` to accept `error_strategy` parameter
- Extract `handle_flush_region_reply` helper function for better code organization
- **Add pending deallocate region support**:
- Add `pending_deallocate_region_ids` field to `PersistentContext`
- Implement `flush_pending_deallocate_regions` in `EnterStagingRegion` state
- Flush pending deallocate regions before entering staging regions to ensure data consistency
- **Update error handling**:
- `flush_leader_region`: Use `ErrorStrategy::Ignore` to skip unreachable datanodes
- `sync_region`: Use `ErrorStrategy::Retry` for critical operations
- `enter_staging_region`: Use `ErrorStrategy::Retry` when flushing pending deallocate regions
This change improves the robustness of the repartition procedure by:
1. Providing flexible error handling strategies for flush operations
2. Ensuring pending deallocate regions are properly flushed before repartitioning
3. Preventing data inconsistency during region migration
Signed-off-by: WenyXu <wenymedia@gmail.com>
* chore: apply suggestions from CR
Signed-off-by: WenyXu <wenymedia@gmail.com>
* fix: compile
Signed-off-by: WenyXu <wenymedia@gmail.com>
---------
Signed-off-by: WenyXu <wenymedia@gmail.com>
* fix(meta): update topic region mapping during table route updates
Fix a bug in `build_create_txn` where the parameter order was incorrect
(`(topic, region_id)` -> `(region_id, topic)`), and add support for updating
topic region mappings during repartition operations.
- Add `build_update_txn` method to handle topic region mapping updates
- Integrate topic region update into `update_table_route` transaction
- Add WAL options merging and validation logic for repartition
- Update allocate/deallocate procedures to pass WAL options
- Add comprehensive tests for all scenarios
This ensures topic region mappings stay in sync with table routes during
repartition, preventing data inconsistencies.
Signed-off-by: WenyXu <wenymedia@gmail.com>
* feat(meta): handle region_not_found in region migration
Add support for detecting and handling regions that exist in migration
tasks but are no longer present in table routes (e.g., removed after
repartition). This prevents unnecessary retries and cleans up related
resources.
Changes:
- Add `region_not_found` field to `SubmitRegionMigrationTaskResult` and
`RegionMigrationAnalysis` structs
- Update `analyze_region_migration_task` to detect regions missing from
current table routes
- Deregister failure detectors for `region_not_found` regions in supervisor
- Change `table_regions()` return type from `HashMap<TableId, Vec<RegionId>>`
to `HashMap<TableId, HashSet<RegionId>>` for better performance
- Add test cases for `region_not_found` handling
This fixes the issue where migration tasks would continue retrying on
regions that have been removed after repartition operations.
Signed-off-by: WenyXu <wenymedia@gmail.com>
* fix: fix clippy
Signed-off-by: WenyXu <wenymedia@gmail.com>
---------
Signed-off-by: WenyXu <wenymedia@gmail.com>
* feat: add repartition procedure factory support to DdlManager
- Introduce RepartitionProcedureFactory trait for creating and registering
repartition procedures
- Implement DefaultRepartitionProcedureFactory for metasrv with full support
- Implement StandaloneRepartitionProcedureFactory for standalone (unsupported)
- Add procedure loader registration for RepartitionProcedure and
RepartitionGroupProcedure
- Add helper methods to TableMetadataAllocator for allocator access
- Add error types for repartition procedure operations
- Update DdlManager to accept and use RepartitionProcedureFactoryRef
Signed-off-by: WenyXu <wenymedia@gmail.com>
* feat: integrate repartition procedure into DdlManager
- Add submit_repartition_task() to handle repartition from alter table
- Route Repartition operations in submit_alter_table_task() to repartition factory
- Refactor: rename submit_procedure() to execute_procedure_and_wait()
- Make all DDL operations wait for completion by default
- Add submit_procedure() for fire-and-forget submissions
- Add CreateRepartitionProcedure error type
- Add placeholder Repartition handling in grpc-expr (unsupported)
- Update greptime-proto dependency
Signed-off-by: WenyXu <wenymedia@gmail.com>
* feat: implement ALTER TABLE REPARTITION procedure submission
Signed-off-by: WenyXu <wenymedia@gmail.com>
* refactor(repartition): handle central region in apply staging manifest
- Introduce ApplyStagingManifestInstructions struct to organize instructions
- Add special handling for central region when applying staging manifests
- Transition state from UpdateMetadata to RepartitionEnd after applying staging manifests
- Remove next_state() method in RepartitionStart and inline state transitions
- Improve logging and expression serialization in DDL statement executor
- Move repartition tests from standalone to distributed test suite
Signed-off-by: WenyXu <wenymedia@gmail.com>
* chore: apply suggestions from CR
Signed-off-by: WenyXu <wenymedia@gmail.com>
* chore: update proto
Signed-off-by: WenyXu <wenymedia@gmail.com>
---------
Signed-off-by: WenyXu <wenymedia@gmail.com>
fix: fix SQL table identifier quoting for election and RDS kv-backend
- Quote MySQL table names with backticks and PostgreSQL tables with double quotes in election and RDS kv-backend SQL
- Update related tests to use quoted identifiers and cover hyphenated table names
- Ensure dynamic SQL using table names is safe for special characters in identifiers
Signed-off-by: WenyXu <wenymedia@gmail.com>
* refactor: rename WalOptionsAllocator to WalProvider
The name "WalOptionsAllocator" was misleading because:
- For RaftEngine variant, it doesn't actually allocate anything
- The actual allocation logic lives in KafkaTopicPool
"WalProvider" better describes its role as providing WAL options
based on the configured WAL backend (RaftEngine or Kafka).
Changes:
- Rename `WalOptionsAllocator` to `WalProvider`
- Rename `WalOptionsAllocatorRef` to `WalProviderRef`
- Rename `build_wal_options_allocator` to `build_wal_provider`
- Rename module `wal_options_allocator` to `wal_provider`
- Rename error types: `BuildWalOptionsAllocator` -> `BuildWalProvider`,
`StartWalOptionsAllocator` -> `StartWalProvider`
Signed-off-by: WenyXu <wenymedia@gmail.com>
* refactor(meta): extract allocator traits from TableMetadataAllocator
Refactor TableMetadataAllocator to use trait-based dependency injection
for better testability and separation of concerns.
Changes:
- Add `ResourceIdAllocator` trait to abstract ID allocation
- Add `WalOptionsAllocator` trait to abstract WAL options allocation
- Implement traits for `Sequence` and `WalProvider`
- Remove duplicate `allocate_region_wal_options` function
- Rename `table_id_sequence` to `table_id_allocator` for consistency
- Rename `TableIdSequenceHandler` to `TableIdAllocatorHandler`
Signed-off-by: WenyXu <wenymedia@gmail.com>
* feat(meta): add max_region_number tracking to PhysicalTableRouteValue
Add `max_region_number` field to track the highest region number ever
allocated for a table. This value only increases when regions are added
and never decreases when regions are dropped, ensuring unique region
numbers across the table's lifetime.
Changes:
- Add `max_region_number` field to `PhysicalTableRouteValue`
- Implement custom `Deserialize` for backward compatibility
- Update `update_region_routes` to maintain max_region_number
- Calculate max_region_number from region_routes in `new()`
Signed-off-by: WenyXu <wenymedia@gmail.com>
* refactor: extract TableRouteAllocator trait from TableMetadataAllocator
- Add TableRouteAllocator trait for abstracting region route allocation
- Implement blanket impl for all PeerAllocator types
- Add PeerAllocator impl for Arc<T> to support trait object delegation
- Update TableMetadataAllocator to use TableRouteAllocatorRef
Signed-off-by: WenyXu <wenymedia@gmail.com>
* refactor: rename TableRouteAllocator to RegionRoutesAllocator
- Rename table_route.rs to region_routes.rs
- Rename TableRouteAllocator trait to RegionRoutesAllocator
- Rename wal_option.rs to wal_options.rs for consistency
- Update TableMetadataAllocator to use new naming
Signed-off-by: WenyXu <wenymedia@gmail.com>
* feat(meta-srv): implement region allocation for repartition procedure
This commit implements the region allocation phase of the repartition procedure,
which handles allocating new regions when a table needs to be split into more partitions.
Key changes:
- Refactor `RegionRoutesAllocator::allocate` to accept `(region_number, partition_expr)` tuples
for more flexible region number assignment
- Simplify `AllocationPlanEntry` by removing `regions_to_allocate` and `regions_to_deallocate`
fields (now derived from source/target counts)
- Add `convert_allocation_plan_to_repartition_plan` function to handle allocation, equal,
and deallocation cases
- Fix `RepartitionPlanEntry::allocate_regions()` to return target regions (was incorrectly
returning source regions)
- Implement complete `AllocateRegion` state with:
- Region route allocation via `RegionRoutesAllocator`
- WAL options allocation via `WalOptionsAllocator`
- Operating region registration for concurrency control
- Region creation on datanodes via `CreateTableExecutor`
- Table route metadata update
- Add `TableRouteValue::max_region_number()` helper method
- Add comprehensive unit tests for plan conversion and allocation logic
Signed-off-by: WenyXu <wenymedia@gmail.com>
* chore: apply suggestions from CR
Signed-off-by: WenyXu <wenymedia@gmail.com>
* chore: apply suggestions from CR
Signed-off-by: WenyXu <wenymedia@gmail.com>
---------
Signed-off-by: WenyXu <wenymedia@gmail.com>
* feat: implement deallocate regions for repartition procedure
Signed-off-by: WenyXu <wenymedia@gmail.com>
* feat(metric-engine): add force flag to drop physical regions with associated logical regions
Signed-off-by: WenyXu <wenymedia@gmail.com>
* feat: update table metadata after deallocating regions
Signed-off-by: WenyXu <wenymedia@gmail.com>
* chore: apply suggestions
Signed-off-by: WenyXu <wenymedia@gmail.com>
* chore: update proto
Signed-off-by: WenyXu <wenymedia@gmail.com>
---------
Signed-off-by: WenyXu <wenymedia@gmail.com>