mirror of
https://github.com/neondatabase/neon.git
synced 2025-12-22 21:59:59 +00:00
storcon: attempt all non-essential location config calls during reconciliations (#12745)
## Problem We saw the following in the field: Context and observations: * The storage controller keeps track of the latest generations and the pageserver that issued the latest generation in the database * When the storage controller needs to proxy a request (e.g. timeline creation) to the pageservers, it will find use the pageserver that issued the latest generation from the db (generation_pageserver). * pageserver-2.cell-2 got into a bad state and wasn't able to apply location_config (e.g. detach a shard) What happened: 1. pageserver-2.cell-2 was a secondary for our shard since we were not able to detach it 2. control plane asked to detach a tenant (presumably because it was idle) a. In response storcon clears the generation_pageserver from the db and attempts to detach all locations b. it tries to detach pageserver-2.cell-2 first, but fails, which fails the entire reconciliation leaving the good attached location still there c. return success to cplane 3. control plane asks to re-attach the tenant a. In response storcon performs a reconciliation b. it finds that the observed state matches the intent (remember we did not detach the primary at step(2)) c. skips incrementing the genration and setting the generation_pageserver column Now any requests that need to be proxied to pageservers and rely on the generation_pageserver db column fail because that's not set ## Summary of changes 1. We do all non-essential location config calls (setting up secondaries, detaches) at the end of the reconciliation. Previously, we bailed out of the reconciliation on the first failure. With this patch we attempt all of the RPCs. This allows the observed state to update even if another RPC failed for unrelated reasons. 2. If the overall reconciliation failed, we don't want to remove nodes from the observed state as a safe-guard. With the previous patch, we'll get a deletion delta to process, which would be ignored. Ignoring it is not the right thing to do since it's out of sync with the db state. Hence, on reconciliation failures map deletion from the observed state to the uncertain state. Future reconciliation will query the node to refresh their observed state. Closes LKB-204
This commit is contained in:
@@ -981,6 +981,7 @@ impl Reconciler {
|
||||
));
|
||||
}
|
||||
|
||||
let mut first_err = None;
|
||||
for (node, conf) in changes {
|
||||
if self.cancel.is_cancelled() {
|
||||
return Err(ReconcileError::Cancel);
|
||||
@@ -990,7 +991,12 @@ impl Reconciler {
|
||||
// shard _available_ (the attached location), and configuring secondary locations
|
||||
// can be done lazily when the node becomes available (via background reconciliation).
|
||||
if node.is_available() {
|
||||
self.location_config(&node, conf, None, false).await?;
|
||||
let res = self.location_config(&node, conf, None, false).await;
|
||||
if let Err(err) = res {
|
||||
if first_err.is_none() {
|
||||
first_err = Some(err);
|
||||
}
|
||||
}
|
||||
} else {
|
||||
// If the node is unavailable, we skip and consider the reconciliation successful: this
|
||||
// is a common case where a pageserver is marked unavailable: we demote a location on
|
||||
@@ -1002,6 +1008,10 @@ impl Reconciler {
|
||||
}
|
||||
}
|
||||
|
||||
if let Some(err) = first_err {
|
||||
return Err(err);
|
||||
}
|
||||
|
||||
// The condition below identifies a detach. We must have no attached intent and
|
||||
// must have been attached to something previously. Pass this information to
|
||||
// the [`ComputeHook`] such that it can update its tenant-wide state.
|
||||
|
||||
@@ -1530,10 +1530,19 @@ impl Service {
|
||||
// so that waiters will see the correct error after waiting.
|
||||
tenant.set_last_error(result.sequence, e);
|
||||
|
||||
// Skip deletions on reconcile failures
|
||||
let upsert_deltas =
|
||||
deltas.filter(|delta| matches!(delta, ObservedStateDelta::Upsert(_)));
|
||||
tenant.apply_observed_deltas(upsert_deltas);
|
||||
// If the reconciliation failed, don't clear the observed state for places where we
|
||||
// detached. Instead, mark the observed state as uncertain.
|
||||
let failed_reconcile_deltas = deltas.map(|delta| {
|
||||
if let ObservedStateDelta::Delete(node_id) = delta {
|
||||
ObservedStateDelta::Upsert(Box::new((
|
||||
node_id,
|
||||
ObservedStateLocation { conf: None },
|
||||
)))
|
||||
} else {
|
||||
delta
|
||||
}
|
||||
});
|
||||
tenant.apply_observed_deltas(failed_reconcile_deltas);
|
||||
}
|
||||
}
|
||||
|
||||
|
||||
Reference in New Issue
Block a user