mirror of
https://github.com/neondatabase/neon.git
synced 2026-01-17 02:12:56 +00:00
Fix LSN in keepalive messages, if no WAL has been sent yet
When a new connection is established to the safekeeper, the 'end_pos' field is initially set to Lsn::INVALID (i.e 0/0). If there is no WAL to send to the client, we send KeepAlive messages with Lsn::INVALID. That confuses the pageserver: it thinks that safekeeper is lagging very much behind the tip of the branch, and will reconnect to a different safekeeper. Then the same thing happens with the new safekeeper, until some WAL is streamed which sets 'end_pos' to a valid value. To fix, use 'start_pos' rather than 'end_pos' in the keepalive messages. When the safekeeper has sent all the WAL it has available, they are equal. When the safekeeper has some WAL to send, it will send an XLogData message rather than KeepAlive. If it did send a KeepAlive even when there was some WAL to send too, I think 'start_pos' was a more correct value anyway. Fixes https://github.com/neondatabase/neon/issues/3972
This commit is contained in:
@@ -551,7 +551,7 @@ impl<IO: AsyncRead + AsyncWrite + Unpin> WalSender<'_, IO> {
|
||||
|
||||
self.pgb
|
||||
.write_message(&BeMessage::KeepAlive(WalSndKeepAlive {
|
||||
sent_ptr: self.end_pos.0,
|
||||
sent_ptr: self.start_pos.0,
|
||||
timestamp: get_current_timestamp(),
|
||||
request_reply: true,
|
||||
}))
|
||||
|
||||
Reference in New Issue
Block a user