From cfbef4d586f96b9f5e0648d0a7ea04db54b86962 Mon Sep 17 00:00:00 2001 From: Vlad Lazar Date: Tue, 13 May 2025 14:02:25 +0100 Subject: [PATCH] safekeeper: downgrade stream from future WAL log (#11909) ## Problem 1. Safekeeper selection on the pageserver side isn't very dynamic. Once you connect to one safekeeper, you'll use that one for as long as the safekeeper keeps the connection alive. In principle, we could be more eager, since the wal receiver connection can be cancelled but we don't do that. We wait until the "session" is done and then we pick a new SK. 2. Picking a new SK is quite conservative. We will switch if: a. We haven't received anything from the SK within the last 10 seconds (wal_connect_timeout) or b. The candidate SK is 1GiB ahead or c. The candidate SK is in the same AZ as the PS or d. There's a candidate that is ahead and we've not had any WAL within the last 10 seconds (lagging_wal_timeout) Hence, we can end up with pageservers that are requesting WAL which their safekeeper hasn't seen yet. ## Summary of changes Downgrade warning log to info. --- safekeeper/src/send_wal.rs | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/safekeeper/src/send_wal.rs b/safekeeper/src/send_wal.rs index 33e3d0485c..05f827494e 100644 --- a/safekeeper/src/send_wal.rs +++ b/safekeeper/src/send_wal.rs @@ -513,7 +513,7 @@ impl SafekeeperPostgresHandler { let end_pos = end_watch.get(); if end_pos < start_pos { - warn!( + info!( "requested start_pos {} is ahead of available WAL end_pos {}", start_pos, end_pos );