From ba9722a2fd913d30b639ec506c42761cdca44440 Mon Sep 17 00:00:00 2001 From: John Spray Date: Fri, 3 Jan 2025 10:55:07 +0000 Subject: [PATCH] tests: add upload wait in test_scrubber_physical_gc_ancestors (#10260) ## Problem We see periodic failures in `test_scrubber_physical_gc_ancestors`, where the logs show that the pageserver is creating image layers that should cause child shards to no longer reference their parents' layers, but then the scrubber runs and doesn't find any unreferenced layers.[ https://neon-github-public-dev.s3.amazonaws.com/reports/pr-10256/12582034135/index.html#/testresult/78ea06dea6ba8dd3 From inspecting the code & test, it seems like this could be as simple as the test failing to wait for uploads before running the scrubber. It had a 2 second delay built in to satisfy the scrubbers time threshold checks, which on a lightly loaded machine would also have been easily enough for uploads to complete, but our test machines are more heavily loaded all the time. ## Summary of changes - Wait for uploads to complete after generating images layers in test_scrubber_physical_gc_ancestors, so that the scrubber should reliably see the post-compaction metadata. --- test_runner/regress/test_storage_scrubber.py | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/test_runner/regress/test_storage_scrubber.py b/test_runner/regress/test_storage_scrubber.py index 198e4f0460..220c428531 100644 --- a/test_runner/regress/test_storage_scrubber.py +++ b/test_runner/regress/test_storage_scrubber.py @@ -266,7 +266,9 @@ def test_scrubber_physical_gc_ancestors(neon_env_builder: NeonEnvBuilder, shard_ for shard in shards: ps = env.get_tenant_pageserver(shard) assert ps is not None - ps.http_client().timeline_compact(shard, timeline_id, force_image_layer_creation=True) + ps.http_client().timeline_compact( + shard, timeline_id, force_image_layer_creation=True, wait_until_uploaded=True + ) ps.http_client().timeline_gc(shard, timeline_id, 0) # We will use a min_age_secs=1 threshold for deletion, let it pass