mirror of
https://github.com/neondatabase/neon.git
synced 2026-05-28 10:30:40 +00:00
storage controller: check warmth of secondary before doing proactive migration (#7583)
## Problem The logic in Service::optimize_all would sometimes choose to migrate a tenant to a secondary location that was only recently created, resulting in Reconciler::live_migrate hitting its 5 minute timeout warming up the location, and proceeding to attach a tenant to a location that doesn't have a warm enough local set of layer files for good performance. Closes: #7532 ## Summary of changes - Add a pageserver API for checking download progress of a secondary location - During `optimize_all`, connect to pageservers of candidate optimization secondary locations, and check they are warm. - During shard split, do heatmap uploads and start secondary downloads, so that the new shards' secondary locations start downloading ASAP, rather than waiting minutes for background downloads to kick in. I have intentionally not implemented this by continuously reading the status of locations, to avoid dealing with the scale challenge of efficiently polling & updating 10k-100k locations status. If we implement that in the future, then this code can be simplified to act based on latest state of a location rather than fetching it inline during optimize_all.
This commit is contained in:
@@ -287,6 +287,11 @@ def test_sharding_split_smoke(
|
||||
== shard_count
|
||||
)
|
||||
|
||||
# Make secondary downloads slow: this exercises the storage controller logic for not migrating an attachment
|
||||
# during post-split optimization until the secondary is ready
|
||||
for ps in env.pageservers:
|
||||
ps.http_client().configure_failpoints([("secondary-layer-download-sleep", "return(1000)")])
|
||||
|
||||
env.storage_controller.tenant_shard_split(tenant_id, shard_count=split_shard_count)
|
||||
|
||||
post_split_pageserver_ids = [loc["node_id"] for loc in env.storage_controller.locate(tenant_id)]
|
||||
@@ -300,7 +305,7 @@ def test_sharding_split_smoke(
|
||||
|
||||
# Enough background reconciliations should result in the shards being properly distributed.
|
||||
# Run this before the workload, because its LSN-waiting code presumes stable locations.
|
||||
env.storage_controller.reconcile_until_idle()
|
||||
env.storage_controller.reconcile_until_idle(timeout_secs=60)
|
||||
|
||||
workload.validate()
|
||||
|
||||
@@ -342,6 +347,10 @@ def test_sharding_split_smoke(
|
||||
assert cancelled_reconciles is not None and int(cancelled_reconciles) == 0
|
||||
assert errored_reconciles is not None and int(errored_reconciles) == 0
|
||||
|
||||
# We should see that the migration of shards after the split waited for secondaries to warm up
|
||||
# before happening
|
||||
assert env.storage_controller.log_contains(".*Skipping.*because secondary isn't ready.*")
|
||||
|
||||
env.storage_controller.consistency_check()
|
||||
|
||||
def get_node_shard_counts(env: NeonEnv, tenant_ids):
|
||||
@@ -1071,6 +1080,17 @@ def test_sharding_split_failures(
|
||||
finish_split()
|
||||
assert_split_done()
|
||||
|
||||
if isinstance(failure, StorageControllerFailpoint) and "post-complete" in failure.failpoint:
|
||||
# On a post-complete failure, the controller will recover the post-split state
|
||||
# after restart, but it will have missed the optimization part of the split function
|
||||
# where secondary downloads are kicked off. This means that reconcile_until_idle
|
||||
# will take a very long time if we wait for all optimizations to complete, because
|
||||
# those optimizations will wait for secondary downloads.
|
||||
#
|
||||
# Avoid that by configuring the tenant into Essential scheduling mode, so that it will
|
||||
# skip optimizations when we're exercising this particular failpoint.
|
||||
env.storage_controller.tenant_policy_update(tenant_id, {"scheduling": "Essential"})
|
||||
|
||||
# Having completed the split, pump the background reconciles to ensure that
|
||||
# the scheduler reaches an idle state
|
||||
env.storage_controller.reconcile_until_idle(timeout_secs=30)
|
||||
|
||||
Reference in New Issue
Block a user