Avoid the storage controller in test_tenant_creation_fails (#8392)

As described in #8385, the likely source for flakiness in
test_tenant_creation_fails is the following sequence of events:

1. test instructs the storage controller to create the tenant
2. storage controller adds the tenant and persists it to the database.
issues a creation request
3. the pageserver restarts with the failpoint disabled
4. storage controller's background reconciliation still wants to create
the tenant
5. pageserver gets new request to create the tenant from background
reconciliation

This commit just avoids the storage controller entirely. It has its own
set of issues, as the re-attach request will obviously not include the
tenant, but it's still useful to test for non-existence of the tenant.

The generation is also not optional any more during tenant attachment.
If you omit it, the pageserver yields an error. We change the signature
of `tenant_attach` to reflect that.

Alternative to #8385
Fixes #8266
This commit is contained in:
Arpad Müller
2024-07-16 12:19:28 +02:00
committed by GitHub
parent e6dadcd2f3
commit 66337097de
3 changed files with 5 additions and 12 deletions

View File

@@ -2786,8 +2786,8 @@ class NeonPageserver(PgProtocol, LogUtils):
)
return client.tenant_attach(
tenant_id,
generation,
config,
generation=generation,
)
def tenant_detach(self, tenant_id: TenantId):

View File

@@ -238,8 +238,8 @@ class PageserverHttpClient(requests.Session, MetricsGetter):
def tenant_attach(
self,
tenant_id: Union[TenantId, TenantShardId],
generation: int,
config: None | Dict[str, Any] = None,
generation: Optional[int] = None,
):
config = config or {}

View File

@@ -45,17 +45,10 @@ def test_tenant_creation_fails(neon_simple_env: NeonEnv):
# Failure to write a config to local disk makes the pageserver assume that local disk is bad and abort the process
pageserver_http.configure_failpoints(("tenant-config-before-write", "return"))
# Storage controller will see a torn TCP connection when the crash point is reached, and follow an unclean 500 error path
neon_simple_env.storage_controller.allowed_errors.extend(
[
".*Reconcile not done yet while creating tenant.*",
".*Reconcile error: receive body: error sending request.*",
".*Error processing HTTP request: InternalServerError.*",
]
)
tenant_id = TenantId.generate()
with pytest.raises(Exception, match="error sending request"):
_ = neon_simple_env.neon_cli.create_tenant()
with pytest.raises(requests.exceptions.ConnectionError, match="Connection aborted"):
neon_simple_env.pageserver.http_client().tenant_attach(tenant_id=tenant_id, generation=1)
# Any files left behind on disk during failed creation do not prevent
# a retry from succeeding. Restart pageserver with no failpoints.