mirror of
https://github.com/neondatabase/neon.git
synced 2026-01-03 19:42:55 +00:00
Fix race condition after attaching tenant with branches. (#4170)
After tenant attach, there is a window where the child timeline is loaded and accepts GetPage requests, but its parent is not. If a GetPage request needs to traverse to the parent, it needs to wait for the parent timeline to become active, or it might miss some records on the parent timeline. It's also possible that the parent timeline is active, but it hasn't yet received all the WAL up to the branch point from the safekeeper. This happens if a pageserver crashes soon after creating a timeline, so that the WAL leading to the branch point has not yet been uploaded to remote storage. After restart, the WAL will be re-streamed and ingested from the safekeeper, but that takes a while. Because of that, it's not enough to check that the parent timeline is active, we also need to wait for the WAL to arrive on the parent timeline, just like at the beginning of GetPage handling. We probably should change the behavior at create_timeline so that a timeline can only be created after all the WAL up to the branch point has been uploaded to remote storage, but that's not currently the case and out of scope for this PR (see github issue #4218). @NanoBjorn encountered this while working on tenant migration. After migrating a tenant with a parent and child branch, connecting to the child branch failed with an error like: ``` FATAL: "base/16385" is not a valid data directory DETAIL: File "base/16385/PG_VERSION" is missing. ``` This commit adds two tests that reproduce the bug, with slightly different symptoms.
This commit is contained in:
committed by
GitHub
parent
edcf4d61a4
commit
2855c73990
@@ -146,6 +146,10 @@ pub const XLH_UPDATE_OLD_ALL_VISIBLE_CLEARED: u8 = (1 << 0) as u8;
|
||||
pub const XLH_UPDATE_NEW_ALL_VISIBLE_CLEARED: u8 = (1 << 1) as u8;
|
||||
pub const XLH_DELETE_ALL_VISIBLE_CLEARED: u8 = (1 << 0) as u8;
|
||||
|
||||
// From replication/message.h
|
||||
pub const XLOG_LOGICAL_MESSAGE: u8 = 0x00;
|
||||
|
||||
// From rmgrlist.h
|
||||
pub const RM_XLOG_ID: u8 = 0;
|
||||
pub const RM_XACT_ID: u8 = 1;
|
||||
pub const RM_SMGR_ID: u8 = 2;
|
||||
@@ -157,6 +161,7 @@ pub const RM_RELMAP_ID: u8 = 7;
|
||||
pub const RM_STANDBY_ID: u8 = 8;
|
||||
pub const RM_HEAP2_ID: u8 = 9;
|
||||
pub const RM_HEAP_ID: u8 = 10;
|
||||
pub const RM_LOGICALMSG_ID: u8 = 21;
|
||||
|
||||
// from xlogreader.h
|
||||
pub const XLR_INFO_MASK: u8 = 0x0F;
|
||||
|
||||
Reference in New Issue
Block a user