Compare commits

...

2 Commits

Author SHA1 Message Date
Heikki Linnakangas
eb887422b7 Fix this slightly differently, by initializing end_pos to start_pos. 2023-05-03 20:53:31 +03:00
Heikki Linnakangas
9d4e3ac27f Fix LSN in keepalive messages, if no WAL has been sent yet
When a new connection is established to the safekeeper, the 'end_pos'
field is initially set to Lsn::INVALID (i.e 0/0). If there is no WAL
to send to the client, we send KeepAlive messages with
Lsn::INVALID. That confuses the pageserver: it thinks that safekeeper
is lagging very much behind the tip of the branch, and will reconnect
to a different safekeeper. Then the same thing happens with the new
safekeeper, until some WAL is streamed which sets 'end_pos' to a valid
value.

To fix, use 'start_pos' rather than 'end_pos' in the keepalive
messages. When the safekeeper has sent all the WAL it has available,
they are equal. When the safekeeper has some WAL to send, it will send
an XLogData message rather than KeepAlive. If it did send a KeepAlive
even when there was some WAL to send too, I think 'start_pos' was a
more correct value anyway.

Fixes https://github.com/neondatabase/neon/issues/3972
2023-05-03 17:06:18 +03:00

View File

@@ -399,7 +399,14 @@ impl SafekeeperPostgresHandler {
} else {
None
};
let end_pos = stop_pos.unwrap_or(Lsn::INVALID);
// How much WAL is immediately available for sending? If we have a
// 'stop_pos', we know we have all the WAL up to that point. Otherwise,
// initialize the value with the starting position. If we actually have
// more WAL available, wait_wal() will update the value on the first
// iteration. If the client requested a starting position that is ahead
// of what we have, we still report that as the end-of-WAL.
let end_pos = stop_pos.unwrap_or(start_pos);
info!(
"starting streaming from {:?} till {:?}",