storcon: do not detach tenants when all nodes are unvailable

Previously, when all nodes in the cluster became unavailable at the same
time, we would detach all tenant shards. This is due to a bug in
`Service::node_configure`. If all nodes are unavailable, there's no
chance of reschedulling anything, so we should leave the intent states
untouced.

This commit adds a special case which detects this situation and skips
any reschedullings.
This commit is contained in:
Vlad Lazar
2024-06-14 11:32:04 +01:00
parent 8270b58f39
commit 677c1662a4

View File

@@ -4312,6 +4312,13 @@ impl Service {
continue;
}
if !new_nodes.values().any(Node::is_available) {
// Special case for when all nodes are unavailable: there is no point
// trying to reschedule since there's nowhere else to go. Without this
// branch we incorrectly detach tenants in response to node unavailability.
continue;
}
if tenant_shard.intent.demote_attached(scheduler, node_id) {
tenant_shard.sequence = tenant_shard.sequence.next();