Files
neon/safekeeper
Heikki Linnakangas 9d4e3ac27f Fix LSN in keepalive messages, if no WAL has been sent yet
When a new connection is established to the safekeeper, the 'end_pos'
field is initially set to Lsn::INVALID (i.e 0/0). If there is no WAL
to send to the client, we send KeepAlive messages with
Lsn::INVALID. That confuses the pageserver: it thinks that safekeeper
is lagging very much behind the tip of the branch, and will reconnect
to a different safekeeper. Then the same thing happens with the new
safekeeper, until some WAL is streamed which sets 'end_pos' to a valid
value.

To fix, use 'start_pos' rather than 'end_pos' in the keepalive
messages. When the safekeeper has sent all the WAL it has available,
they are equal. When the safekeeper has some WAL to send, it will send
an XLogData message rather than KeepAlive. If it did send a KeepAlive
even when there was some WAL to send too, I think 'start_pos' was a
more correct value anyway.

Fixes https://github.com/neondatabase/neon/issues/3972
2023-05-03 17:06:18 +03:00
..
2022-05-28 14:02:05 +03:00