feat(timeline_detach_ancestor): success idempotency (#8354)

Right now timeline detach ancestor reports an error (409, "no ancestor")
on a new attempt after successful completion. This makes it troublesome
for storage controller retries. Fix it to respond with `200 OK` as if
the operation had just completed quickly.

Additionally, the returned timeline identifiers in the 200 OK response
are now ordered so that responses between different nodes for error
comparison are done by the storage controller added in #8353.

Design-wise, this PR introduces a new strategy for accessing the latest
uploaded IndexPart:
`RemoteTimelineClient::initialized_upload_queue(&self) ->
Result<UploadQueueAccessor<'_>, NotInitialized>`. It should be a more
scalable way to query the latest uploaded `IndexPart` than to add a
query method for each question directly on `RemoteTimelineClient`.

GC blocking will need to be introduced to make the operation fully
idempotent. However, it is idempotent for the cases demonstrated by
tests.

Cc: #6994
This commit is contained in:
Joonas Koivunen
2024-07-15 20:47:53 +03:00
committed by GitHub
parent 04448ac323
commit 730db859c7
9 changed files with 632 additions and 74 deletions

View File

@@ -2830,9 +2830,10 @@ impl Service {
match e {
// no ancestor (ever)
Error::ApiError(StatusCode::CONFLICT, msg) => {
ApiError::Conflict(format!("{node}: {msg}"))
}
Error::ApiError(StatusCode::CONFLICT, msg) => ApiError::Conflict(format!(
"{node}: {}",
msg.strip_prefix("Conflict: ").unwrap_or(&msg)
)),
// too many ancestors
Error::ApiError(StatusCode::BAD_REQUEST, msg) => {
ApiError::BadRequest(anyhow::anyhow!("{node}: {msg}"))
@@ -2859,8 +2860,6 @@ impl Service {
let any = results.pop().expect("we must have at least one response");
// FIXME: the ordering is not stable yet on pageserver, should be (ancestor_lsn,
// TimelineId)
let mismatching = results
.iter()
.filter(|(_, res)| res != &any.1)