mirror of
https://github.com/neondatabase/neon.git
synced 2025-12-22 21:59:59 +00:00
## Problem We saw the following in the field: Context and observations: * The storage controller keeps track of the latest generations and the pageserver that issued the latest generation in the database * When the storage controller needs to proxy a request (e.g. timeline creation) to the pageservers, it will find use the pageserver that issued the latest generation from the db (generation_pageserver). * pageserver-2.cell-2 got into a bad state and wasn't able to apply location_config (e.g. detach a shard) What happened: 1. pageserver-2.cell-2 was a secondary for our shard since we were not able to detach it 2. control plane asked to detach a tenant (presumably because it was idle) a. In response storcon clears the generation_pageserver from the db and attempts to detach all locations b. it tries to detach pageserver-2.cell-2 first, but fails, which fails the entire reconciliation leaving the good attached location still there c. return success to cplane 3. control plane asks to re-attach the tenant a. In response storcon performs a reconciliation b. it finds that the observed state matches the intent (remember we did not detach the primary at step(2)) c. skips incrementing the genration and setting the generation_pageserver column Now any requests that need to be proxied to pageservers and rely on the generation_pageserver db column fail because that's not set ## Summary of changes 1. We do all non-essential location config calls (setting up secondaries, detaches) at the end of the reconciliation. Previously, we bailed out of the reconciliation on the first failure. With this patch we attempt all of the RPCs. This allows the observed state to update even if another RPC failed for unrelated reasons. 2. If the overall reconciliation failed, we don't want to remove nodes from the observed state as a safe-guard. With the previous patch, we'll get a deletion delta to process, which would be ignored. Ignoring it is not the right thing to do since it's out of sync with the db state. Hence, on reconciliation failures map deletion from the observed state to the uncertain state. Future reconciliation will query the node to refresh their observed state. Closes LKB-204