mirror of
https://github.com/neondatabase/neon.git
synced 2026-05-27 01:50:38 +00:00
## Problem Shard splits worked, but weren't safe against failures (e.g. node crash during split) yet. Related: #6676 ## Summary of changes - Introduce async rwlocks at the scope of Tenant and Node: - exclusive tenant lock is used to protect splits - exclusive node lock is used to protect new reconciliation process that happens when setting node active - exclusive locks used in both cases when doing persistent updates (e.g. node scheduling conf) where the update to DB & in-memory state needs to be atomic. - Add failpoints to shard splitting in control plane and pageserver code. - Implement error handling in control plane for shard splits: this detaches child chards and ensures parent shards are re-attached. - Crash-safety for storage controller restarts requires little effort: we already reconcile with nodes over a storage controller restart, so as long as we reset any incomplete splits in the DB on restart (added in this PR), things are implicitly cleaned up. - Implement reconciliation with offline nodes before they transition to active: - (in this context reconciliation means something like startup_reconcile, not literally the Reconciler) - This covers cases where split abort cannot reach a node to clean it up: the cleanup will eventually happen when the node is marked active, as part of reconciliation. - This also covers the case where a node was unavailable when the storage controller started, but becomes available later: previously this allowed it to skip the startup reconcile. - Storage controller now terminates on panics. We only use panics for true "should never happen" assertions, and these cases can leave us in an un-usable state if we keep running (e.g. panicking in a shard split). In the unlikely event that we get into a crashloop as a result, we'll rely on kubernetes to back us off. - Add `test_sharding_split_failures` which exercises a variety of failure cases during shard split.
55 lines
1.8 KiB
Rust
55 lines
1.8 KiB
Rust
use std::{collections::HashMap, sync::Arc};
|
|
|
|
/// A map of locks covering some arbitrary identifiers. Useful if you have a collection of objects but don't
|
|
/// want to embed a lock in each one, or if your locking granularity is different to your object granularity.
|
|
/// For example, used in the storage controller where the objects are tenant shards, but sometimes locking
|
|
/// is needed at a tenant-wide granularity.
|
|
pub(crate) struct IdLockMap<T>
|
|
where
|
|
T: Eq + PartialEq + std::hash::Hash,
|
|
{
|
|
/// A synchronous lock for getting/setting the async locks that our callers will wait on.
|
|
entities: std::sync::Mutex<std::collections::HashMap<T, Arc<tokio::sync::RwLock<()>>>>,
|
|
}
|
|
|
|
impl<T> IdLockMap<T>
|
|
where
|
|
T: Eq + PartialEq + std::hash::Hash,
|
|
{
|
|
pub(crate) fn shared(
|
|
&self,
|
|
key: T,
|
|
) -> impl std::future::Future<Output = tokio::sync::OwnedRwLockReadGuard<()>> {
|
|
let mut locked = self.entities.lock().unwrap();
|
|
let entry = locked.entry(key).or_default();
|
|
entry.clone().read_owned()
|
|
}
|
|
|
|
pub(crate) fn exclusive(
|
|
&self,
|
|
key: T,
|
|
) -> impl std::future::Future<Output = tokio::sync::OwnedRwLockWriteGuard<()>> {
|
|
let mut locked = self.entities.lock().unwrap();
|
|
let entry = locked.entry(key).or_default();
|
|
entry.clone().write_owned()
|
|
}
|
|
|
|
/// Rather than building a lock guard that re-takes the [`Self::entities`] lock, we just do
|
|
/// periodic housekeeping to avoid the map growing indefinitely
|
|
pub(crate) fn housekeeping(&self) {
|
|
let mut locked = self.entities.lock().unwrap();
|
|
locked.retain(|_k, lock| lock.try_write().is_err())
|
|
}
|
|
}
|
|
|
|
impl<T> Default for IdLockMap<T>
|
|
where
|
|
T: Eq + PartialEq + std::hash::Hash,
|
|
{
|
|
fn default() -> Self {
|
|
Self {
|
|
entities: std::sync::Mutex::new(HashMap::new()),
|
|
}
|
|
}
|
|
}
|