Fix truncateLsn initialization (#6396)

In
7f828890cf
we changed the logic for persisting control_files. Previously it was
updated if `peer_horizon_lsn` jumped more than one segment, which made
`peer_horizon_lsn` initialized on disk as soon as safekeeper has
received a first `AppendRequest`.

This caused an issue with `truncateLsn`, which now can be zero
sometimes. This PR fixes it, and now `truncateLsn/peer_horizon_lsn` can
never be zero once we know `timeline_start_lsn`.

Closes https://github.com/neondatabase/neon/issues/6248
This commit is contained in:
Arthur Petukhovsky
2024-01-18 21:55:24 +03:00
committed by GitHub
parent e8f773387d
commit a092127b17
2 changed files with 12 additions and 6 deletions

View File

@@ -959,8 +959,8 @@ DetermineEpochStartLsn(WalProposer *wp)
}
/*
* If propEpochStartLsn is 0 everywhere, we are bootstrapping -- nothing
* was committed yet. Start streaming then from the basebackup LSN.
* If propEpochStartLsn is 0, it means flushLsn is 0 everywhere, we are bootstrapping
* and nothing was committed yet. Start streaming then from the basebackup LSN.
*/
if (wp->propEpochStartLsn == InvalidXLogRecPtr && !wp->config->syncSafekeepers)
{
@@ -973,12 +973,13 @@ DetermineEpochStartLsn(WalProposer *wp)
}
/*
* If propEpochStartLsn is not 0, at least one msg with WAL was sent to
* some connected safekeeper; it must have carried truncateLsn pointing to
* the first record.
* Safekeepers are setting truncateLsn after timelineStartLsn is known, so it
* should never be zero at this point, if we know timelineStartLsn.
*
* timelineStartLsn can be zero only on the first syncSafekeepers run.
*/
Assert((wp->truncateLsn != InvalidXLogRecPtr) ||
(wp->config->syncSafekeepers && wp->truncateLsn == wp->propEpochStartLsn));
(wp->config->syncSafekeepers && wp->truncateLsn == wp->timelineStartLsn));
/*
* We will be generating WAL since propEpochStartLsn, so we should set

View File

@@ -742,6 +742,11 @@ where
state.timeline_start_lsn
);
}
if state.peer_horizon_lsn == Lsn(0) {
// Update peer_horizon_lsn as soon as we know where timeline starts.
// It means that peer_horizon_lsn cannot be zero after we know timeline_start_lsn.
state.peer_horizon_lsn = msg.timeline_start_lsn;
}
if state.local_start_lsn == Lsn(0) {
state.local_start_lsn = msg.start_streaming_at;
info!("setting local_start_lsn to {:?}", state.local_start_lsn);