From 411c71b486e71fcce301121b58c48caa7dd65c01 Mon Sep 17 00:00:00 2001 From: Christian Schwarz Date: Fri, 5 May 2023 18:32:41 +0200 Subject: [PATCH] document current tenant attach API semantics (#4151) We currently return 202 as soon as the tenant is allocated in memory before we've written out the marker file. So, the /attach API currently does not have a transactional character. For example, it can happen that we respond with a 202 and then crash before writing out the marker file. In such a case, it is important that the client 1. observes the lost attach (by polling tenant status and observing 404) 2. and consequently retries the attach. It has to do it in this loop until it observes the tenant as "Active" in the tenant status. If the client doesn't follow this protocol and instead goes to another pageserver to attach the tenant, we risk a split-brain situation where both the first and second pageserver write to the tenant's S3 state. The improved description highlights the consequences of this behavior for clients that use the /attach endpoint. The tenant relocation that is currently being implemented in cloud#4740 implements retries of Attach and it does poll afterwards, but, it polls `has_in_progress_downloads`. That is incorrect, as described in the patch body. The motivation for this write-up is that, in a future PR, we'll extend the /attach endpoint with an option to provide the tenant config. If we decide to leave the non-transactional behavior of /attach unmodified, we will be able to avoid persisting the tenant config. Conversely, if we decide that the /attach API should become transactional, we'll need to persist the tenant config in the attach-marker-file before acknowledging receipt of the /attach operation. refs https://github.com/neondatabase/cloud/pull/4740 refs https://github.com/neondatabase/neon/issues/2238 refs https://github.com/neondatabase/neon/issues/1555 --- pageserver/src/http/openapi_spec.yml | 24 +++++++++++++++++++++++- 1 file changed, 23 insertions(+), 1 deletion(-) diff --git a/pageserver/src/http/openapi_spec.yml b/pageserver/src/http/openapi_spec.yml index 95f6e96a5b..877b367b40 100644 --- a/pageserver/src/http/openapi_spec.yml +++ b/pageserver/src/http/openapi_spec.yml @@ -340,7 +340,29 @@ paths: format: hex post: - description: Schedules attach operation to happen in the background for given tenant + description: | + Schedules attach operation to happen in the background for the given tenant. + As soon as the caller sends this request, it must assume the pageserver + starts writing to the tenant's S3 state unless it receives one of the + distinguished errors below that state otherwise. + + The method to identify whether a request has arrived at the pageserver, and + whether it has succeeded, is to poll for the tenant status to reach "Active" + or "Broken" state. These values are currently not explicitly documented in + the API spec. + Polling for `has_in_progress_downloads == false` is INCORRECT because that + value can turn `false` during shutdown while the Attach operation is still + unfinished. + + There is no way to cancel an in-flight request. + + If a client receives a not-distinguished response, e.g., a network timeout, + it MUST retry the /attach request and poll again for tenant status. + + In any case, it must + * NOT ASSUME that the /attach request has been lost in the network, + * NOT ASSUME that the request has been lost based on a subsequent + tenant status request returning 404. (The request may still be in flight!) responses: "202": description: Tenant attaching scheduled