* feat: support reporting env vars in heartbeat messages to metasrv
Add `heartbeat_env_vars` config option for datanode and frontend. When
configured, the specified environment variable values are read at startup
and sent to metasrv in every heartbeat via the `extensions` map. Metasrv
extracts and stores them in `NodeInfo` for use in routing decisions
(e.g. AZ-aware region placement).
- Add `EnvVars` helper in `common/meta/src/datanode.rs` following the
existing `GcStat` extension pattern with `into_extensions`/`from_extensions`
- Add `env_vars: HashMap<String, String>` field to `NodeInfo` in
`common/meta/src/cluster.rs` with `#[serde(default)]` for backward compat
- Add `heartbeat_env_vars: Vec<String>` config field to `DatanodeOptions`,
`FrontendOptions`, and `StandaloneOptions`
- Inject env vars into heartbeat `extensions` in both datanode and frontend
heartbeat tasks (`datanode/src/heartbeat.rs`, `frontend/src/heartbeat.rs`)
- Extract env vars from `req.extensions` in all three metasrv
`CollectXxxClusterInfoHandler`s
- Update `NodeInfo` construction sites in `meta-client`,
`discovery/lease.rs`, and `standalone/information_extension.rs`
- Update expected TOML output in `tests-integration/tests/http.rs`
- Add unit tests for `EnvVars` round-trip and `NodeInfo` backward compat
Signed-off-by: Lei, HUANG <leih@nvidia.com>
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* refactor: address heartbeat env review feedback
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* chore: log error on deserialization failure
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* refactor: send heartbeat env vars once
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* fix: resend heartbeat env vars after reconnect
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* revert: keep env vars in every heartbeat
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
---------
Signed-off-by: Lei, HUANG <leih@nvidia.com>
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* feat(mito2): extract and cache primary key range for SST files
Extracts primary key ranges from SST files during flush and compaction, and caches them in FileHandle for future use (e.g., overlapping checks).
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* feat/compaction-overlapping-check:
- **Enhance Primary Key Range Logic**: Updated the `primary_key_ranges_overlap` function in `run.rs` to use `chunk()` for comparing byte ranges, improving accuracy in overlap detection.
- **Refactor Run Assignment Logic**: Simplified the logic for assigning items to runs in `run.rs` by removing redundant match statements and using `is_empty()` and `iter().any()` for cleaner checks.
- **Add Test for Transitivity Break**: Introduced a new test `test_find_sorted_runs_handles_2d_transitivity_break` in `run.rs` to ensure correct handling of transitivity breaks in sorted runs.
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* feat/compaction-overlapping-check:
- **Remove PK-disjoint detection logic**: Simplified the compaction logic in `twcs.rs` by removing the `has_time_overlapping_pairs` function and related logic for PK-disjoint detection. This includes the removal of the `append_mode_force_compact` condition and associated tests.
- **Update compaction trigger settings**: Modified `append_mode_test.rs` to set `compaction.twcs.trigger_file_num` to "2" and adjusted the expected number of files in the scanner assertion from 1 to 2.
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* chore: rebase main
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* feat/compaction-overlapping-check:
### Remove Unused Import in `compactor.rs`
- Removed the unused import `compact_request` from `compactor.rs` to clean up the codebase.
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* fix: tighten `mito2` file lifecycle handling
Refine compaction, flush, and SST/version bookkeeping across `src/mito2/src/compaction/*`, `src/mito2/src/flush.rs`, `src/mito2/src/region/*`, `src/mito2/src/sst/*`, and related tests/utilities.
* fix: reuse primary key range merge in twcs compaction
Centralize primary key range merging so can call the shared helper from .
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* refactor(mito2): simplify `FileHandle` initialization and internalize primary key range extraction
- Updated `FileHandle::new` to automatically compute the primary key range directly from `FileMeta`.
- Restricted `FileHandle::new_with_primary_key_range` to be test-only by adding the `#[cfg(test)]` attribute.
- Simplified `SstVersion::add_files` by adopting the updated `FileHandle::new` instead of manually providing the primary key range.
Modified files:
- `src/mito2/src/sst/file.rs`
- `src/mito2/src/sst/version.rs`
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* feat/compaction-overlapping-check:
### Improve File Management and Documentation
- **`twcs.rs`**: Added a comment to clarify the merging of small files when there are no overlapping files.
- **`version.rs` (in `region` and `sst`)**: Enhanced documentation for `add_files` method, explaining its functionality and panic conditions. Simplified the file handle creation logic in `sst/version.rs`.
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* feat/compaction-overlapping-check:
### Enhance Primary Key Range Handling in `opener.rs`
- Updated logic in `opener.rs` to set the primary key range for file handles when it is not already defined. This change ensures that the primary key range is extracted and set using `extract_primary_key_range` when necessary.
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* docs: add docs for the necessity of checking pk and timesmaps while find overlapping files.
* chore: address review comments
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
---------
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
fix/flaky-test:
### Add Dynamic Port Selection for Standalone Tests
- **`cli.rs`**: Implemented functions `random_standalone_addrs` and `choose_random_unused_port_offset` to dynamically select unused ports for standalone tests, enhancing test reliability.
- Updated `test_export_create_table_with_quoted_names` to use dynamically assigned ports for HTTP, RPC, MySQL, and PostgreSQL addresses.
Signed-off-by: Lei, HUANG <mrsatangel@gmail.com>
* feat: support write flat as primary key format
Signed-off-by: evenyag <realevenyag@gmail.com>
* feat: migrate flush to always use FlatSource
Add FormatType propagation in SstWriteRequest and use it to choose
Flat vs PrimaryKey write paths (write_all_flat vs
write_all_flat_as_primary_key) in AccessLayer and WriteCache. Make
compactor and flush derive the sst_write_format from region options or
engine config. Simplify flush logic and remove the old memtable_source
helper. Update tests to set default sst_write_format.
Signed-off-by: evenyag <realevenyag@gmail.com>
* refactor: compaction use flat source
Signed-off-by: evenyag <realevenyag@gmail.com>
* refactor: read parquet sequentially as flat batches
Signed-off-by: evenyag <realevenyag@gmail.com>
* refactor: remove new_batch_with_binary in favor of new_record_batch_with_binary
Replace PrimaryKeyWriteFormat with FlatWriteFormat in test_read_large_binary
test and use new_record_batch_with_binary directly, removing the now-unused
new_batch_with_binary function and its BinaryArray import.
Signed-off-by: evenyag <realevenyag@gmail.com>
* test: add tests for PrimaryKeyWriteFormat::convert_flat_batch
Signed-off-by: evenyag <realevenyag@gmail.com>
* refactor: remove Either from SstWriteRequest
Signed-off-by: evenyag <realevenyag@gmail.com>
* fix: handle index build mode
Signed-off-by: evenyag <realevenyag@gmail.com>
* fix: consider sparse encoding and last non null in flush
Signed-off-by: evenyag <realevenyag@gmail.com>
* test: add unit tests for field_column_start edge cases
Signed-off-by: evenyag <realevenyag@gmail.com>
---------
Signed-off-by: evenyag <realevenyag@gmail.com>
* feat: cast filters type for scanbench
Signed-off-by: evenyag <realevenyag@gmail.com>
* chore: pub file_range mod
So we can use the pub struct FileRange in other places
Signed-off-by: evenyag <realevenyag@gmail.com>
* fix: add api as dev-dependency to cmd for clippy
Signed-off-by: evenyag <realevenyag@gmail.com>
* feat: support profiling after warmup
Signed-off-by: evenyag <realevenyag@gmail.com>
---------
Signed-off-by: evenyag <realevenyag@gmail.com>
* feat: impl vector index query
Signed-off-by: Dennis Zhuang <killme2008@gmail.com>
* feat: remove VectorSearchRule and merge it into scan hint rule
Signed-off-by: Dennis Zhuang <killme2008@gmail.com>
* refactor: vector search hint
Signed-off-by: Dennis Zhuang <killme2008@gmail.com>
* test: join and subquery
Signed-off-by: Dennis Zhuang <killme2008@gmail.com>
* fix: clippy when feature disabled
Signed-off-by: Dennis Zhuang <killme2008@gmail.com>
* fix: push hint only when column is non-nullable or an explicit IS NOT NULL filter exists
Signed-off-by: Dennis Zhuang <killme2008@gmail.com>
* fix: transformed = true
Co-authored-by: Yingwen <realevenyag@gmail.com>
Signed-off-by: Dennis Zhuang <killme2008@gmail.com>
* chore: remove adpater vector hint
Signed-off-by: Dennis Zhuang <killme2008@gmail.com>
* chore: revert transformed
Signed-off-by: Dennis Zhuang <killme2008@gmail.com>
---------
Signed-off-by: Dennis Zhuang <killme2008@gmail.com>
Co-authored-by: Yingwen <realevenyag@gmail.com>
* feat: add repartition procedure factory support to DdlManager
- Introduce RepartitionProcedureFactory trait for creating and registering
repartition procedures
- Implement DefaultRepartitionProcedureFactory for metasrv with full support
- Implement StandaloneRepartitionProcedureFactory for standalone (unsupported)
- Add procedure loader registration for RepartitionProcedure and
RepartitionGroupProcedure
- Add helper methods to TableMetadataAllocator for allocator access
- Add error types for repartition procedure operations
- Update DdlManager to accept and use RepartitionProcedureFactoryRef
Signed-off-by: WenyXu <wenymedia@gmail.com>
* feat: integrate repartition procedure into DdlManager
- Add submit_repartition_task() to handle repartition from alter table
- Route Repartition operations in submit_alter_table_task() to repartition factory
- Refactor: rename submit_procedure() to execute_procedure_and_wait()
- Make all DDL operations wait for completion by default
- Add submit_procedure() for fire-and-forget submissions
- Add CreateRepartitionProcedure error type
- Add placeholder Repartition handling in grpc-expr (unsupported)
- Update greptime-proto dependency
Signed-off-by: WenyXu <wenymedia@gmail.com>
* feat: implement ALTER TABLE REPARTITION procedure submission
Signed-off-by: WenyXu <wenymedia@gmail.com>
* refactor(repartition): handle central region in apply staging manifest
- Introduce ApplyStagingManifestInstructions struct to organize instructions
- Add special handling for central region when applying staging manifests
- Transition state from UpdateMetadata to RepartitionEnd after applying staging manifests
- Remove next_state() method in RepartitionStart and inline state transitions
- Improve logging and expression serialization in DDL statement executor
- Move repartition tests from standalone to distributed test suite
Signed-off-by: WenyXu <wenymedia@gmail.com>
* chore: apply suggestions from CR
Signed-off-by: WenyXu <wenymedia@gmail.com>
* chore: update proto
Signed-off-by: WenyXu <wenymedia@gmail.com>
---------
Signed-off-by: WenyXu <wenymedia@gmail.com>
* refactor: rename WalOptionsAllocator to WalProvider
The name "WalOptionsAllocator" was misleading because:
- For RaftEngine variant, it doesn't actually allocate anything
- The actual allocation logic lives in KafkaTopicPool
"WalProvider" better describes its role as providing WAL options
based on the configured WAL backend (RaftEngine or Kafka).
Changes:
- Rename `WalOptionsAllocator` to `WalProvider`
- Rename `WalOptionsAllocatorRef` to `WalProviderRef`
- Rename `build_wal_options_allocator` to `build_wal_provider`
- Rename module `wal_options_allocator` to `wal_provider`
- Rename error types: `BuildWalOptionsAllocator` -> `BuildWalProvider`,
`StartWalOptionsAllocator` -> `StartWalProvider`
Signed-off-by: WenyXu <wenymedia@gmail.com>
* refactor(meta): extract allocator traits from TableMetadataAllocator
Refactor TableMetadataAllocator to use trait-based dependency injection
for better testability and separation of concerns.
Changes:
- Add `ResourceIdAllocator` trait to abstract ID allocation
- Add `WalOptionsAllocator` trait to abstract WAL options allocation
- Implement traits for `Sequence` and `WalProvider`
- Remove duplicate `allocate_region_wal_options` function
- Rename `table_id_sequence` to `table_id_allocator` for consistency
- Rename `TableIdSequenceHandler` to `TableIdAllocatorHandler`
Signed-off-by: WenyXu <wenymedia@gmail.com>
* feat(meta): add max_region_number tracking to PhysicalTableRouteValue
Add `max_region_number` field to track the highest region number ever
allocated for a table. This value only increases when regions are added
and never decreases when regions are dropped, ensuring unique region
numbers across the table's lifetime.
Changes:
- Add `max_region_number` field to `PhysicalTableRouteValue`
- Implement custom `Deserialize` for backward compatibility
- Update `update_region_routes` to maintain max_region_number
- Calculate max_region_number from region_routes in `new()`
Signed-off-by: WenyXu <wenymedia@gmail.com>
* refactor: extract TableRouteAllocator trait from TableMetadataAllocator
- Add TableRouteAllocator trait for abstracting region route allocation
- Implement blanket impl for all PeerAllocator types
- Add PeerAllocator impl for Arc<T> to support trait object delegation
- Update TableMetadataAllocator to use TableRouteAllocatorRef
Signed-off-by: WenyXu <wenymedia@gmail.com>
* refactor: rename TableRouteAllocator to RegionRoutesAllocator
- Rename table_route.rs to region_routes.rs
- Rename TableRouteAllocator trait to RegionRoutesAllocator
- Rename wal_option.rs to wal_options.rs for consistency
- Update TableMetadataAllocator to use new naming
Signed-off-by: WenyXu <wenymedia@gmail.com>
* feat(meta-srv): implement region allocation for repartition procedure
This commit implements the region allocation phase of the repartition procedure,
which handles allocating new regions when a table needs to be split into more partitions.
Key changes:
- Refactor `RegionRoutesAllocator::allocate` to accept `(region_number, partition_expr)` tuples
for more flexible region number assignment
- Simplify `AllocationPlanEntry` by removing `regions_to_allocate` and `regions_to_deallocate`
fields (now derived from source/target counts)
- Add `convert_allocation_plan_to_repartition_plan` function to handle allocation, equal,
and deallocation cases
- Fix `RepartitionPlanEntry::allocate_regions()` to return target regions (was incorrectly
returning source regions)
- Implement complete `AllocateRegion` state with:
- Region route allocation via `RegionRoutesAllocator`
- WAL options allocation via `WalOptionsAllocator`
- Operating region registration for concurrency control
- Region creation on datanodes via `CreateTableExecutor`
- Table route metadata update
- Add `TableRouteValue::max_region_number()` helper method
- Add comprehensive unit tests for plan conversion and allocation logic
Signed-off-by: WenyXu <wenymedia@gmail.com>
* chore: apply suggestions from CR
Signed-off-by: WenyXu <wenymedia@gmail.com>
* chore: apply suggestions from CR
Signed-off-by: WenyXu <wenymedia@gmail.com>
---------
Signed-off-by: WenyXu <wenymedia@gmail.com>
* feat: impl vector index building
Signed-off-by: Dennis Zhuang <killme2008@gmail.com>
* feat: supports flat format
Signed-off-by: Dennis Zhuang <killme2008@gmail.com>
* ci: add vector_index feature to test
Signed-off-by: Dennis Zhuang <killme2008@gmail.com>
* chore: apply suggestions
Signed-off-by: Dennis Zhuang <killme2008@gmail.com>
* chore: apply suggestions from copilot
Signed-off-by: Dennis Zhuang <killme2008@gmail.com>
---------
Signed-off-by: Dennis Zhuang <killme2008@gmail.com>