pageserver: on-demand activation of tenant on GET tenant status (#7250)

## Problem

(Follows https://github.com/neondatabase/neon/pull/7237)

Some API users will query a tenant to wait for it to activate.
Currently, we return the current status of the tenant, whatever that may
be. Under heavy load, a pageserver starting up might take a long time to
activate such a tenant.

## Summary of changes

- In `tenant_status` handler, call wait_to_become_active on the tenant.
If the tenant is currently waiting for activation, this causes it to
skip the queue, similiar to other API handlers that require an active
tenant, like timeline creation. This avoids external services waiting a
long time for activation when polling GET /v1/tenant/<id>.
This commit is contained in:
John Spray
2024-04-03 14:53:43 +01:00
committed by GitHub
parent 944313ffe1
commit 8b10407be4
2 changed files with 30 additions and 2 deletions

View File

@@ -993,11 +993,26 @@ async fn tenant_status(
check_permission(&request, Some(tenant_shard_id.tenant_id))?;
let state = get_state(&request);
// In tests, sometimes we want to query the state of a tenant without auto-activating it if it's currently waiting.
let activate = true;
#[cfg(feature = "testing")]
let activate = parse_query_param(&request, "activate")?.unwrap_or(activate);
let tenant_info = async {
let tenant = state
.tenant_manager
.get_attached_tenant_shard(tenant_shard_id)?;
if activate {
// This is advisory: we prefer to let the tenant activate on-demand when this function is
// called, but it is still valid to return 200 and describe the current state of the tenant
// if it doesn't make it into an active state.
tenant
.wait_to_become_active(ACTIVE_TENANT_TIMEOUT)
.await
.ok();
}
// Calculate total physical size of all timelines
let mut current_physical_size = 0;
for timeline in tenant.list_timelines().iter() {

View File

@@ -341,8 +341,21 @@ class PageserverHttpClient(requests.Session, MetricsGetter):
res = self.post(f"http://localhost:{self.port}/v1/tenant/{tenant_id}/ignore")
self.verbose_error(res)
def tenant_status(self, tenant_id: Union[TenantId, TenantShardId]) -> Dict[Any, Any]:
res = self.get(f"http://localhost:{self.port}/v1/tenant/{tenant_id}")
def tenant_status(
self, tenant_id: Union[TenantId, TenantShardId], activate: bool = False
) -> Dict[Any, Any]:
"""
:activate: hint the server not to accelerate activation of this tenant in response
to this query. False by default for tests, because they generally want to observed the
system rather than interfering with it. This is true by default on the server side,
because in the field if the control plane is GET'ing a tenant it's a sign that it wants
to do something with it.
"""
params = {}
if not activate:
params["activate"] = "false"
res = self.get(f"http://localhost:{self.port}/v1/tenant/{tenant_id}", params=params)
self.verbose_error(res)
res_json = res.json()
assert isinstance(res_json, dict)