storage controller: API-driven graceful migrations (#10913)

## Problem

The current migration API does a live migration, but if the destination
doesn't already have a secondary, that live migration is unlikely to be
able to warm up a tenant properly within its timeout (full warmup of a
big tenant can take tens of minutes).

Background optimisation code knows how to do this gracefully by creating
a secondary first, but we don't currently give a human a way to trigger
that.

Closes: https://github.com/neondatabase/neon/issues/10540

## Summary of changes

- Add `prefererred_node` parameter to TenantShard, which is respected by
optimize_attachment
- Modify migration API to have optional prewarm=true mode, in which we
set preferred_node and call optimize_attachment, rather than directly
modifying intentstate
- Require override_scheduler=true flag if migrating somewhere that is a
less-than-optimal scheduling location (e.g. wrong AZ)
- Add `origin_node_id` to migration API so that callers can ensure
they're moving from where they think they're moving from
- Add tests for the above

The storcon_cli wrapper for this has a 'watch' mode that waits for
eventual cutover. This doesn't show the warmth of the secondary evolve
because we don't currently have an API for that in the controller, as
the passthrough API only targets attached locations, not secondaries. It
would be straightforward to add later as a dedicated endpoint for
getting secondary status, then extend the storcon_cli to consume that
and print a nice progress indicator.
This commit is contained in:
John Spray
2025-03-07 17:02:38 +00:00
committed by GitHub
parent 084fc4a757
commit 87e6117dfd
9 changed files with 707 additions and 120 deletions

View File

@@ -16,6 +16,7 @@ from fixtures.neon_fixtures import (
NeonPageserver,
PageserverAvailability,
PageserverSchedulingPolicy,
StorageControllerMigrationConfig,
)
from fixtures.pageserver.http import PageserverApiException, PageserverHttpClient
from fixtures.pg_version import PgVersion
@@ -362,7 +363,10 @@ def test_storage_controller_many_tenants(
dest_ps_id = desc["shards"][shard_number]["node_secondary"][0]
f = executor.submit(
env.storage_controller.tenant_shard_migrate, tenant_shard_id, dest_ps_id
env.storage_controller.tenant_shard_migrate,
tenant_shard_id,
dest_ps_id,
StorageControllerMigrationConfig(prewarm=False, override_scheduler=True),
)
elif op == Operation.TENANT_PASSTHROUGH:
# A passthrough read to shard zero