storcon: avoid multiple initdbs when shard 0 has stale locations (#11760)

## Problem In #11727 I overlooked the case of multiple attached locations for shard 0. I misread the code and thought `create_one` acts on one location, but it actually acts on one _shard_, which is potentially multiple locations. This was not a regression, but it meant that the fix was incomplete. ## Summary of changes - In `create_one`, when updating shard zero, have any "other" locations use the initdb from shard 0
2026-01-07 13:32:57 +00:00 · 2025-04-29 16:31:52 +01:00
parent 768a580373
commit a2adc7dbd3
1 changed files with 10 additions and 1 deletions
--- a/storage_controller/src/service.rs
+++ b/storage_controller/src/service.rs
@@ -3663,7 +3663,7 @@ impl Service {
                locations: ShardMutationLocations,
                http_client: reqwest::Client,
                jwt: Option<String>,
-                create_req: TimelineCreateRequest,
+                mut create_req: TimelineCreateRequest,
            ) -> Result<TimelineInfo, ApiError> {
                let latest = locations.latest.node;

@@ -3682,6 +3682,15 @@ impl Service {
                    .await
                    .map_err(|e| passthrough_api_error(&latest, e))?;

+                // If we are going to create the timeline on some stale locations for shard 0, then ask them to re-use
+                // the initdb generated by the latest location, rather than generating their own.  This avoids racing uploads
+                // of initdb to S3 which might not be binary-identical if different pageservers have different postgres binaries.
+                if tenant_shard_id.is_shard_zero() {
+                    if let models::TimelineCreateRequestMode::Bootstrap { existing_initdb_timeline_id, .. } = &mut create_req.mode {
+                        *existing_initdb_timeline_id = Some(create_req.new_timeline_id);
+                    }
+                }
+
                // We propagate timeline creations to all attached locations such that a compute
                // for the new timeline is able to start regardless of the current state of the
                // tenant shard reconciliation.