* feat: enable submitting wal prune procedure periodically
* chore: fix and add options
* test: add unit test
* test: fix unit test
* test: enable active_wal_pruning in test
* test: update default config
* chore: update config name
* refactor: use semaphore to control the number of prune process
* refactor: use split client for wal prune manager and topic creator
* chore: add configs
* chore: apply review comments
* fix: use tracker properly
* fix: use guard to track semaphore
* test: update unit tests
* chore: update config name
* chore: use prunable_entry_id
* refactor: semaphore to only limit the process of submitting
* chore: remove legacy sort
* chore: better configs
* fix: update config.md
* chore: respect fmt
* test: update unit tests
* chore: use interval_at
* fix: fix unit test
* test: fix unit test
* test: fix unit test
* chore: apply review comments
* docs: update config docs
* fix/frontend-node-state: Refactor NodeInfoKey and Context Handling in Meta Server
• Removed unused cluster_id from NodeInfoKey struct.
• Updated HeartbeatHandlerGroup to return Context alongside HeartbeatResponse.
• Added current_node_info to Context for tracking node information.
• Implemented on_node_disconnect in Context to handle node disconnection events, specifically for Frontend roles.
• Adjusted register_pusher function to return PusherId directly.
• Updated tests to accommodate changes in Context structure.
* fix/frontend-node-state: Refactor Heartbeat Handler Context Management
Refactored the HeartbeatHandlerGroup::handle method to use a mutable reference for Context instead of passing it by value. This change simplifies the
context management by eliminating the need to return the context with the response. Updated the Metasrv implementation to align with this new context
handling approach, improving code clarity and reducing unnecessary context cloning.
* revert: clean cluster info on disconnect
* fix/frontend-node-state: Add Frontend Expiry Listener and Update NodeInfoKey Conversion
• Introduced FrontendExpiryListener to manage the expiration of frontend nodes, including its integration with leadership change notifications.
• Modified NodeInfoKey conversion to use references, enhancing efficiency and consistency across the codebase.
• Updated collect_cluster_info_handler and metasrv to incorporate the new listener and conversion changes.
• Added frontend_expiry module to the project structure for better organization and maintainability.
* chore: add config for node expiry
* add some doc
* fix: clippy
* fix/frontend-node-state:
### Refactor Node Expiry Handling
- **Configuration Update**: Removed `node_expiry_tick` from `metasrv.example.toml` and `MetasrvOptions` in `metasrv.rs`.
- **Module Renaming**: Renamed `frontend_expiry.rs` to `node_expiry_listener.rs` and updated references in `lib.rs`.
- **Code Refactoring**: Replaced `FrontendExpiryListener` with `NodeExpiryListener` in `node_expiry_listener.rs` and `metasrv.rs`, removing the tick interval and adjusting logic to use a fixed 60-second interval for node expiry checks.
* fix/frontend-node-state:
Improve logging in `node_expiry_listener.rs`
- Enhanced warning message to include peer information when an unrecognized node info key is encountered in `node_expiry_listener.rs`.
* docs: update config docs
* fix/frontend-node-state:
**Refactor Context Handling in Heartbeat Services**
- Updated `HeartbeatHandlerGroup` in `handler.rs` to pass `Context` by value instead of by mutable reference, allowing for more flexible context
management.
- Modified `Metasrv` implementation in `heartbeat.rs` to clone `Context` when passing to `handle` method, ensuring thread safety and consistency in
asynchronous operations.
* refactor: rename grpc options
* refactor: make the arg clearly
* chore: comments on server_addr
* chore: fix test
* chore: remove the store_addr alias
* refactor: cli option rpc_server_addr
* chore: keep store-addr alias
* chore: by comment
* feat: set max log files to 720 by default, info log only
* expose max_log_files in tomls
* include dir info when panicing, limit max_log_files of err_log to 30, and that of slow_queries to opt.max_log_files
* fix clippy
* update config.md
* update expected config str
* limit err_log max files size to `max_log_files` too, include err info when panicing, put `max_l_f` in right position
* fix typos
* chore: config
Co-authored-by: Lei, HUANG <6406592+v0y4g3r@users.noreply.github.com>
---------
Co-authored-by: dennis zhuang <killme2008@gmail.com>
Co-authored-by: Lei, HUANG <6406592+v0y4g3r@users.noreply.github.com>
* feat: Use DATANODE_LEASE_SECS from distributed_time_constants for heartbeat pause duration
* feat: introduce `RegionFailureDetectorController` to manage region failure detectors
* feat: add `RegionFailureDetectorController` to `DdlContext`
* feat: add `region_failure_detector_controller` to `Context` in region migration
* feat: register region failure detectors during rollback region migration procedure
* feat: deregister region failure detectors during drop table procedure
* feat: register region failure detectors during create table procedure
* fix: update meta config
* chore: apply suggestions from CR
* chore: avoid cloning
* chore: rename
* chore: reduce the size of the test
* chore: apply suggestions from CR
* chore: move channel initialization into `RegionSupervisor::channel`
* chore: minor refactor
* chore: rename ident
* set global runtime size
* fix: resolve PR comments
* fix: log the whole option
* fix ci
* debug ci
* debug ci
---------
Co-authored-by: Weny Xu <wenymedia@gmail.com>
* refactor: use string type instead of Option type for '--store-key-prefix'
Signed-off-by: zyy17 <zyylsxm@gmail.com>
* chore: refine for code review comments
---------
Signed-off-by: zyy17 <zyylsxm@gmail.com>
* feat: remote write metric task
* chore: pass stanalone task to frontend
* chore: change name to system metric
* fix: add header and rename to export metrics