Compare commits

...

5 Commits

Author SHA1 Message Date
Christian Schwarz
eefb60685d uninit marker creation & temp timeline dir creation: fail if already exists
We clean up uninit markers upon creation failure and during tenant load.
The previous patch made it so that cleanup is guaranteed during tenant load.
So, now we can drop the code that makes creation idempotent, thereby
adding robustness in depth and uncluttering the code a little.
2023-06-07 20:17:08 +02:00
Christian Schwarz
1684bf36c8 tenant load: make tempfile/tempdir cleanup logic bail out if it fails
Before this patch, when loading a tenant, we'd iterate the timeline directory
to figure out which timelines the tenant has, and also remove temporary
entries in the timelines directory that are created during timeline
creation.

When we would fail to clean up temporary entries, we'd log an error but
continue.
This error-ignoring behavior forces all code that runs later, e.g.,
timeline creation code, to idempotently re-try those cleanups.

It's much cleaner to remove such temp entries once during startup,
then expect them to be absent later.
Why is that? Because
1. it de-clutters the not-startup code
2. we can use O_EXCL when we create the uninit marker, giving
   us extra robustness in depth that we're not creating the same
   timeline concurrently
3. same exclusive-op argument for when we create the initdb dir

This patch refactors the timeline loading code to a fix-point
iteration loop that does the cleanups.

If a cleanup fails, we bail out.

Once we've reached the fixpoint (no temp stuff left), we assume
each remaining entry is a timeline dir, parse it into a TimelineId,
and continue the loading as usual.
2023-06-07 20:02:45 +02:00
Heikki Linnakangas
bc5ade701b Enable sanity check that disk_consistent_lsn is valid on created timeline.
This required changing a bunch of unit tests to use a valid initdb LSN.
2023-06-07 18:00:39 +02:00
Heikki Linnakangas
29257af821 Clean up timeline initialization code.
Clarify who's responsible for initializing the layer map. There were
previously two different ways to do it:

- create_empty_timeline and bootstrap_timeline let prepare_timeline()
  initialize an empty layer map.

- branch_timeline passed a flag to initialize_with_lock() to tell
  initialize_with_lock to call load_layer_map(). Because it was a
  newly created timeline, load_layer_map() never found any layer
  files, so it just initialized an empty layer map.

With this commit, prepare_new_timeline() always does it. The LSN to
initialize it with is passed as argument.

Other changes per function:

prepare_timeline:
- rename to 'prepare_new_timeline' to make it clear that it's only used
  when creating a new timeline, not when loading an existing timeline
- always initialize an empty layer map. The caller can pass the LSN to
  initialize it with. (Previously, prepare_timeline would optionally
  load the layer map at 'initdb_lsn'. Some caller used that, while others
  let initialize_with_lock do it

initialize_with_lock:
- As mentioned above, remove the option to load the layer map
- Acquire the 'timelines' lock in the function itself. None of the callers
  did any other work while holding the lock.
- Rename it to finish_creation() to make its intent more clear. It's only
  used when creating a new timeline now.

create_timeline_data:
- Rename to create_timeline_struct() for clarity. It just initializes
  the Timeline struct, not any other "data"

create_timeline_files:
- use create_dir rather than create_dir_all, to be a little more strict.
  We know that the parent directory should already exist, and the timeline
  directory should not exist.
- Move the call to create_timeline_struct() to the caller. It was just
  being "passed through"
2023-06-07 17:20:47 +02:00
Christian Schwarz
6ea0f41bd5 timeline_init_and_sync: don't hold Tenant::timelines while load_layer_map
This patch inlines `initialize_with_lock` and then reorganizes the code
such that we can `load_layer_map` without holding the
`Tenant::timelines` lock.

As a nice aside, we can get rid of the dummy() uninit mark, which has
always been a terrible hack.
2023-06-07 17:13:54 +02:00
6 changed files with 393 additions and 274 deletions

View File

@@ -1600,7 +1600,7 @@ pub fn create_test_timeline(
pg_version: u32,
ctx: &RequestContext,
) -> anyhow::Result<std::sync::Arc<Timeline>> {
let tline = tenant.create_test_timeline(timeline_id, Lsn(8), pg_version, ctx)?;
let tline = tenant.create_test_timeline(timeline_id, Lsn(4), pg_version, ctx)?;
let mut m = tline.begin_modification(Lsn(8));
m.init_empty()?;
m.commit()?;

View File

@@ -11,7 +11,7 @@
//! parent timeline, and the last LSN that has been written to disk.
//!
use anyhow::{bail, Context};
use anyhow::{bail, ensure, Context};
use futures::FutureExt;
use pageserver_api::models::TimelineState;
use remote_storage::DownloadError;
@@ -29,6 +29,7 @@ use std::collections::BTreeSet;
use std::collections::HashMap;
use std::ffi::OsStr;
use std::fs;
use std::fs::DirEntry;
use std::fs::File;
use std::fs::OpenOptions;
use std::io;
@@ -184,18 +185,13 @@ struct TimelineUninitMark {
}
impl UninitializedTimeline<'_> {
/// Ensures timeline data is valid, loads it into pageserver's memory and removes
/// uninit mark file on success.
/// Finish timeline creation: insert it into the Tenant's timelines map and remove the
/// uninit mark file.
///
/// This function launches the flush loop if not already done.
///
/// The caller is responsible for activating the timeline (function `.activate()`).
fn initialize_with_lock(
mut self,
_ctx: &RequestContext,
timelines: &mut HashMap<TimelineId, Arc<Timeline>>,
load_layer_map: bool,
) -> anyhow::Result<Arc<Timeline>> {
fn finish_creation(mut self) -> anyhow::Result<Arc<Timeline>> {
let timeline_id = self.timeline_id;
let tenant_id = self.owning_tenant.tenant_id;
@@ -203,25 +199,19 @@ impl UninitializedTimeline<'_> {
format!("No timeline for initalization found for {tenant_id}/{timeline_id}")
})?;
// Check that the caller initialized disk_consistent_lsn
let new_disk_consistent_lsn = new_timeline.get_disk_consistent_lsn();
// TODO it would be good to ensure that, but apparently a lot of our testing is dependend on that at least
// ensure!(new_disk_consistent_lsn.is_valid(),
// "Timeline {tenant_id}/{timeline_id} has invalid disk_consistent_lsn and cannot be initialized");
ensure!(
new_disk_consistent_lsn.is_valid(),
"new timeline {tenant_id}/{timeline_id} has invalid disk_consistent_lsn"
);
let mut timelines = self.owning_tenant.timelines.lock().unwrap();
match timelines.entry(timeline_id) {
Entry::Occupied(_) => anyhow::bail!(
"Found freshly initialized timeline {tenant_id}/{timeline_id} in the tenant map"
),
Entry::Vacant(v) => {
if load_layer_map {
new_timeline
.load_layer_map(new_disk_consistent_lsn)
.with_context(|| {
format!(
"Failed to load layermap for timeline {tenant_id}/{timeline_id}"
)
})?;
}
uninit_mark.remove_uninit_mark().with_context(|| {
format!(
"Failed to remove uninit mark file for timeline {tenant_id}/{timeline_id}"
@@ -250,9 +240,10 @@ impl UninitializedTimeline<'_> {
.await
.context("Failed to import basebackup")?;
// Flush the new layer files to disk, before we make the timeline as available to
// the outside world.
//
// Flush loop needs to be spawned in order to be able to flush.
// We want to run proper checkpoint before we mark timeline as available to outside world
// Thus spawning flush loop manually and skipping flush_loop setup in initialize_with_lock
raw_timeline.maybe_spawn_flush_loop();
fail::fail_point!("before-checkpoint-new-timeline", |_| {
@@ -264,10 +255,9 @@ impl UninitializedTimeline<'_> {
.await
.context("Failed to flush after basebackup import")?;
// Initialize without loading the layer map. We started with an empty layer map, and already
// updated it for the layers that we created during the import.
let mut timelines = self.owning_tenant.timelines.lock().unwrap();
let tl = self.initialize_with_lock(ctx, &mut timelines, false)?;
// All the data has been imported. Insert the Timeline into the tenant's timelines
// map and remove the uninit mark file.
let tl = self.finish_creation()?;
tl.activate(broker_client, None, ctx);
Ok(tl)
}
@@ -310,15 +300,6 @@ fn cleanup_timeline_directory(uninit_mark: TimelineUninitMark) {
}
impl TimelineUninitMark {
/// Useful for initializing timelines, existing on disk after the restart.
pub fn dummy() -> Self {
Self {
uninit_mark_deleted: true,
uninit_mark_path: PathBuf::new(),
timeline_path: PathBuf::new(),
}
}
fn new(uninit_mark_path: PathBuf, timeline_path: PathBuf) -> Self {
Self {
uninit_mark_deleted: false,
@@ -512,7 +493,7 @@ impl Tenant {
ancestor: Option<Arc<Timeline>>,
first_save: bool,
init_order: Option<&InitializationOrder>,
ctx: &RequestContext,
_ctx: &RequestContext,
) -> anyhow::Result<()> {
let tenant_id = self.tenant_id;
@@ -523,54 +504,38 @@ impl Tenant {
.context("merge_local_remote_metadata")?
.to_owned();
let timeline = {
let timeline = self.create_timeline_struct(
timeline_id,
up_to_date_metadata,
ancestor.clone(),
remote_client,
init_order,
)?;
let new_disk_consistent_lsn = timeline.get_disk_consistent_lsn();
anyhow::ensure!(
new_disk_consistent_lsn.is_valid(),
"Timeline {tenant_id}/{timeline_id} has invalid disk_consistent_lsn"
);
timeline
.load_layer_map(new_disk_consistent_lsn)
.with_context(|| {
format!("Failed to load layermap for timeline {tenant_id}/{timeline_id}")
})?;
{
// avoiding holding it across awaits
let mut timelines_accessor = self.timelines.lock().unwrap();
if timelines_accessor.contains_key(&timeline_id) {
anyhow::bail!(
"Timeline {tenant_id}/{timeline_id} already exists in the tenant map"
);
}
let dummy_timeline = self.create_timeline_data(
timeline_id,
up_to_date_metadata,
ancestor.clone(),
remote_client,
init_order,
)?;
let timeline = UninitializedTimeline {
owning_tenant: self,
timeline_id,
raw_timeline: Some((dummy_timeline, TimelineUninitMark::dummy())),
};
// Do not start walreceiver here. We do need loaded layer map for reconcile_with_remote
// But we shouldnt start walreceiver before we have all the data locally, because working walreceiver
// will ingest data which may require looking at the layers which are not yet available locally
match timeline.initialize_with_lock(ctx, &mut timelines_accessor, true) {
Ok(new_timeline) => new_timeline,
Err(e) => {
error!("Failed to initialize timeline {tenant_id}/{timeline_id}: {e:?}");
// FIXME using None is a hack, it wont hurt, just ugly.
// Ideally initialize_with_lock error should return timeline in the error
// Or return ownership of itself completely so somethin like into_broken
// can be called directly on Uninitielized timeline
// also leades to redundant .clone
let broken_timeline = self
.create_timeline_data(
timeline_id,
up_to_date_metadata,
ancestor.clone(),
None,
None,
)
.with_context(|| {
format!("creating broken timeline data for {tenant_id}/{timeline_id}")
})?;
broken_timeline.set_state(TimelineState::Broken);
timelines_accessor.insert(timeline_id, broken_timeline);
return Err(e);
match timelines_accessor.entry(timeline_id) {
Entry::Occupied(_) => {
// The uninit mark file acts as a lock that prevents another task from
// initializing the timeline at the same time.
unreachable!(
"Timeline {tenant_id}/{timeline_id} already exists in the tenant map"
);
}
Entry::Vacant(v) => {
v.insert(Arc::clone(&timeline));
timeline.maybe_spawn_flush_loop();
}
}
};
@@ -996,96 +961,93 @@ impl Tenant {
let tenant_id = self.tenant_id;
let conf = self.conf;
let span = info_span!("blocking");
let myself = Arc::clone(self);
let sorted_timelines: Vec<(_, _)> = tokio::task::spawn_blocking(move || {
let _g = span.entered();
let mut timelines_to_load: HashMap<TimelineId, TimelineMetadata> = HashMap::new();
let timelines_dir = conf.timelines_path(&tenant_id);
for entry in
std::fs::read_dir(&timelines_dir).context("list timelines directory for tenant")?
{
let entry = entry.context("read timeline dir entry")?;
let timeline_dir = entry.path();
// Prune out all the temporary directories, if any, until only (fully initialized) timeline dirs remain.
// This is essentially a fix-point operation.
//
// If we fail to do cleanup, we bail out. It's better to bail out here than
// to require all later code (timeline create) to deal with leftover temp entries.
// The blast radius is constrained to this tenant, so, it's not too bad.
let entries: Vec<DirEntry> = loop {
let mut entries = Vec::new();
for entry in std::fs::read_dir(&timelines_dir).with_context(|| {
format!(
"Failed to list timelines directory for tenant {}",
myself.tenant_id
)
})? {
let entry = entry.with_context(|| {
format!("cannot read timeline dir entry for {}", myself.tenant_id)
})?;
entries.push(entry);
}
if crate::is_temporary(&timeline_dir) {
info!(
"Found temporary timeline directory, removing: {}",
timeline_dir.display()
);
if let Err(e) = std::fs::remove_dir_all(&timeline_dir) {
error!(
"Failed to remove temporary directory '{}': {:?}",
timeline_dir.display(),
e
);
}
} else if is_uninit_mark(&timeline_dir) {
let timeline_uninit_mark_file = &timeline_dir;
info!(
"Found an uninit mark file {}, removing the timeline and its uninit mark",
timeline_uninit_mark_file.display()
);
let timeline_id = timeline_uninit_mark_file
.file_stem()
.and_then(OsStr::to_str)
.unwrap_or_default()
.parse::<TimelineId>()
.with_context(|| {
format!(
"Could not parse timeline id out of the timeline uninit mark name {}",
timeline_uninit_mark_file.display()
)
})?;
let timeline_dir = conf.timeline_path(&timeline_id, &tenant_id);
if let Err(e) =
remove_timeline_and_uninit_mark(&timeline_dir, timeline_uninit_mark_file)
{
error!("Failed to clean up uninit marked timeline: {e:?}");
}
} else {
let timeline_id = timeline_dir
.file_name()
.and_then(OsStr::to_str)
.unwrap_or_default()
.parse::<TimelineId>()
.with_context(|| {
format!(
"Could not parse timeline id out of the timeline dir name {}",
timeline_dir.display()
)
})?;
let timeline_uninit_mark_file =
conf.timeline_uninit_mark_file_path(tenant_id, timeline_id);
if timeline_uninit_mark_file.exists() {
let mut removed_something = false;
for entry in &entries {
let timeline_dir = entry.path();
if crate::is_temporary(&timeline_dir) {
// This branch is for the temporary dir created by the bootstrap_timeline procedure.
// It should always start with basebackup-.*.
info!(
%timeline_id,
"Found an uninit mark file, removing the timeline and its uninit mark",
"Found temporary directory of timeline_bootstrap, removing: {}",
timeline_dir.display()
);
if let Err(e) = remove_timeline_and_uninit_mark(
&timeline_dir,
&timeline_uninit_mark_file,
) {
error!("Failed to clean up uninit marked timeline: {e:?}");
}
continue;
}
let file_name = entry.file_name();
if let Ok(timeline_id) =
file_name.to_str().unwrap_or_default().parse::<TimelineId>()
{
let metadata = load_metadata(conf, timeline_id, tenant_id)
.context("failed to load metadata")?;
timelines_to_load.insert(timeline_id, metadata);
} else {
// A file or directory that doesn't look like a timeline ID
warn!(
"unexpected file or directory in timelines directory: {}",
file_name.to_string_lossy()
std::fs::remove_dir_all(&timeline_dir).with_context(|| format!("remove temporary timeline_bootstrap dir {timeline_dir:?}"))?;
removed_something = true;
} else if is_uninit_mark(&timeline_dir) {
// This branch is to remove timelines that didn't finish creating.
let timeline_uninit_mark_file = &timeline_dir;
info!(
"Found an uninit mark file {}, removing the timeline and its uninit mark",
timeline_uninit_mark_file.display()
);
let timeline_id = timeline_uninit_mark_file
.file_stem()
.and_then(OsStr::to_str)
.unwrap_or_default()
.parse::<TimelineId>()
.with_context(|| {
format!(
"Could not parse timeline id out of the timeline uninit mark name {}",
timeline_uninit_mark_file.display()
)
})?;
let timeline_dir = myself.conf.timeline_path(&timeline_id, &myself.tenant_id);
remove_timeline_and_uninit_mark(&timeline_dir, timeline_uninit_mark_file)?;
removed_something = true;
}
}
if removed_something {
continue;
}
break entries;
};
let mut timelines_to_load: HashMap<TimelineId, TimelineMetadata> = HashMap::new();
for entry in entries {
let timeline_dir = entry.path();
assert!(!crate::is_temporary(&timeline_dir), "removed above");
assert!(!is_uninit_mark(&timeline_dir), "removed above");
let timeline_id = timeline_dir
.file_name()
.and_then(OsStr::to_str)
.unwrap_or_default()
.parse::<TimelineId>()
.with_context(|| {
format!(
"Could not parse timeline id out of the timeline dir name {}",
timeline_dir.display()
)
})?;
let metadata = load_metadata(myself.conf, timeline_id, myself.tenant_id)
.context("failed to load metadata")?;
timelines_to_load.insert(timeline_id, metadata);
}
// Sort the array of timeline IDs into tree-order, so that parent comes before
@@ -1253,6 +1215,8 @@ impl Tenant {
drop(timelines);
let new_metadata = TimelineMetadata::new(
// Initialize disk_consistent LSN to 0, The caller must import some data to
// make it valid, before calling finish_creation()
Lsn(0),
None,
None,
@@ -1261,11 +1225,11 @@ impl Tenant {
initdb_lsn,
pg_version,
);
self.prepare_timeline(
self.prepare_new_timeline(
new_timeline_id,
&new_metadata,
timeline_uninit_mark,
true,
initdb_lsn,
None,
)
}
@@ -1284,8 +1248,14 @@ impl Tenant {
ctx: &RequestContext,
) -> anyhow::Result<Arc<Timeline>> {
let uninit_tl = self.create_empty_timeline(new_timeline_id, initdb_lsn, pg_version, ctx)?;
let mut timelines = self.timelines.lock().unwrap();
let tl = uninit_tl.initialize_with_lock(ctx, &mut timelines, true)?;
// Normally, you would need to import some data to the timeline before making it
// available. But our unit tests have no such initial data.
uninit_tl
.raw_timeline()?
.set_disk_consistent_lsn(initdb_lsn);
let tl = uninit_tl.finish_creation()?;
// The non-test code would call tl.activate() here.
tl.set_state(TimelineState::Active);
Ok(tl)
@@ -2160,7 +2130,12 @@ impl Tenant {
}
}
fn create_timeline_data(
/// Helper function to create a new Timeline struct.
///
/// The returned Timeline is in Loading state. The caller is responsible for
/// initializing any on-disk state, and for inserting the Timeline to the 'timelines'
/// map.
fn create_timeline_struct(
&self,
new_timeline_id: TimelineId,
new_metadata: &TimelineMetadata,
@@ -2580,7 +2555,7 @@ impl Tenant {
src_timeline: &Arc<Timeline>,
dst_id: TimelineId,
start_lsn: Option<Lsn>,
ctx: &RequestContext,
_ctx: &RequestContext,
) -> anyhow::Result<Arc<Timeline>> {
let src_id = src_timeline.timeline_id;
@@ -2665,17 +2640,15 @@ impl Tenant {
src_timeline.pg_version,
);
let new_timeline = {
let mut timelines = self.timelines.lock().unwrap();
self.prepare_timeline(
dst_id,
&metadata,
timeline_uninit_mark,
false,
Some(Arc::clone(src_timeline)),
)?
.initialize_with_lock(ctx, &mut timelines, true)?
};
let uninitialized_timeline = self.prepare_new_timeline(
dst_id,
&metadata,
timeline_uninit_mark,
start_lsn + 1,
Some(Arc::clone(src_timeline)),
)?;
let new_timeline = uninitialized_timeline.finish_creation()?;
// Root timeline gets its layers during creation and uploads them along with the metadata.
// A branch timeline though, when created, can get no writes for some time, hence won't get any layers created.
@@ -2707,7 +2680,10 @@ impl Tenant {
let timelines = self.timelines.lock().unwrap();
self.create_timeline_uninit_mark(timeline_id, &timelines)?
};
// create a `tenant/{tenant_id}/timelines/basebackup-{timeline_id}.{TEMP_FILE_SUFFIX}/`
// Init temporarily repo to get bootstrap data, this creates a directory in the `initdb_path` path
// 1. create a `tenant/{tenant_id}/timelines/basebackup-{timeline_id}.{TEMP_FILE_SUFFIX}/`
// temporary directory for basebackup files for the given timeline.
let initdb_path = path_with_suffix_extension(
self.conf
@@ -2715,26 +2691,20 @@ impl Tenant {
.join(format!("basebackup-{timeline_id}")),
TEMP_FILE_SUFFIX,
);
// an uninit mark was placed before, nothing else can access this timeline files
// current initdb was not run yet, so remove whatever was left from the previous runs
if initdb_path.exists() {
fs::remove_dir_all(&initdb_path).with_context(|| {
format!(
"Failed to remove already existing initdb directory: {}",
initdb_path.display()
)
})?;
}
// Init temporarily repo to get bootstrap data, this creates a directory in the `initdb_path` path
run_initdb(self.conf, &initdb_path, pg_version)?;
// this new directory is very temporary, set to remove it immediately after bootstrap, we don't need it
scopeguard::defer! {
if let Err(e) = fs::remove_dir_all(&initdb_path) {
// 2. the initdb directory is very temporary, set to remove it immediately after bootstrap, we don't need it
let remove_initdb_dir = || {
fs::remove_dir_all(&initdb_path)
.with_context(|| format!("remove temporary initdb directory {initdb_path:?}"))
};
let remove_initdb_dir_guard = scopeguard::guard(remove_initdb_dir, |remove_initdb_dir| {
if let Err(e) = remove_initdb_dir() {
// this is unlikely, but we will remove the directory on pageserver restart or another bootstrap call
error!("Failed to remove temporary initdb directory '{}': {}", initdb_path.display(), e);
error!("failed to {e:?}");
}
}
});
// 3. do it. if we bail out, the scopeguard takes care of removing the dir.
run_initdb(self.conf, &initdb_path, pg_version)?;
let pgdata_path = &initdb_path;
let pgdata_lsn = import_datadir::get_lsn_from_controlfile(pgdata_path)?.align();
@@ -2751,8 +2721,13 @@ impl Tenant {
pgdata_lsn,
pg_version,
);
let raw_timeline =
self.prepare_timeline(timeline_id, &new_metadata, timeline_uninit_mark, true, None)?;
let raw_timeline = self.prepare_new_timeline(
timeline_id,
&new_metadata,
timeline_uninit_mark,
pgdata_lsn,
None,
)?;
let tenant_id = raw_timeline.owning_tenant.tenant_id;
let unfinished_timeline = raw_timeline.raw_timeline()?;
@@ -2768,10 +2743,15 @@ impl Tenant {
format!("Failed to import pgdatadir for timeline {tenant_id}/{timeline_id}")
})?;
// Flush the new layer files to disk, before we mark the timeline as available to
// fail the bootstrap if we can't clean up after ourselves
remove_initdb_dir()?;
// no more cleanup necessary
let _ = scopeguard::ScopeGuard::into_inner(remove_initdb_dir_guard);
// Flush the new layer files to disk, before we make the timeline as available to
// the outside world.
//
// Thus spawn flush loop manually and skip flush_loop setup in initialize_with_lock
// Flush loop needs to be spawned in order to be able to flush.
unfinished_timeline.maybe_spawn_flush_loop();
fail::fail_point!("before-checkpoint-new-timeline", |_| {
@@ -2787,12 +2767,8 @@ impl Tenant {
)
})?;
// Initialize the timeline without loading the layer map, because we already updated the layer
// map above, when we imported the datadir.
let timeline = {
let mut timelines = self.timelines.lock().unwrap();
raw_timeline.initialize_with_lock(ctx, &mut timelines, false)?
};
// All done!
let timeline = raw_timeline.finish_creation()?;
info!(
"created root timeline {} timeline.lsn {}",
@@ -2803,14 +2779,28 @@ impl Tenant {
Ok(timeline)
}
/// Creates intermediate timeline structure and its files, without loading it into memory.
/// It's up to the caller to import the necesary data and import the timeline into memory.
fn prepare_timeline(
/// Creates intermediate timeline structure and its files, along with an uninit marker file.
///
/// An empty layer map is initialized, and new data and WAL can be imported starting
/// at 'disk_consistent_lsn'. After any initial data has been imported, call
/// `finish_creation` to insert the Timeline into the timelines map and to remove the
/// uninit mark file.
///
/// No background tasks are launched by this function.
///
/// In case of an error, the function returns it as Err() but also does best-effort cleanup
/// of the created on-disk state, including the `uninit_mark`.
/// If that cleanup fails, it will log the error at error! level.
/// It is guaranteed that the uninit marker is cleaned up last, so that, if cleanup is interrupted or fails,
/// the cleanup will resumed on pageserver restart.
/// In combination with O_EXCL being used to create the uninit marker, the resulting behavior
/// is that subsequent creation attempts will fail until the cleanup is complete.
fn prepare_new_timeline(
&self,
new_timeline_id: TimelineId,
new_metadata: &TimelineMetadata,
uninit_mark: TimelineUninitMark,
init_layers: bool,
start_lsn: Lsn,
ancestor: Option<Arc<Timeline>>,
) -> anyhow::Result<UninitializedTimeline> {
let tenant_id = self.tenant_id;
@@ -2828,33 +2818,27 @@ impl Tenant {
None
};
match self.create_timeline_files(
&uninit_mark.timeline_path,
new_timeline_id,
new_metadata,
ancestor,
remote_client,
) {
Ok(new_timeline) => {
if init_layers {
new_timeline.layers.write().unwrap().next_open_layer_at =
Some(new_timeline.initdb_lsn);
}
debug!(
"Successfully created initial files for timeline {tenant_id}/{new_timeline_id}"
);
Ok(UninitializedTimeline {
owning_tenant: self,
timeline_id: new_timeline_id,
raw_timeline: Some((new_timeline, uninit_mark)),
})
}
Err(e) => {
error!("Failed to create initial files for timeline {tenant_id}/{new_timeline_id}, cleaning up: {e:?}");
cleanup_timeline_directory(uninit_mark);
Err(e)
}
let timeline_struct = self
.create_timeline_struct(new_timeline_id, new_metadata, ancestor, remote_client, None)
.context("Failed to create timeline data structure")?;
timeline_struct.init_empty_layer_map(start_lsn);
if let Err(e) =
self.create_timeline_files(&uninit_mark.timeline_path, new_timeline_id, new_metadata)
{
error!("Failed to create initial files for timeline {tenant_id}/{new_timeline_id}, cleaning up: {e:?}");
cleanup_timeline_directory(uninit_mark);
return Err(e);
}
debug!("Successfully created initial files for timeline {tenant_id}/{new_timeline_id}");
Ok(UninitializedTimeline {
owning_tenant: self,
timeline_id: new_timeline_id,
raw_timeline: Some((timeline_struct, uninit_mark)),
})
}
fn create_timeline_files(
@@ -2862,13 +2846,8 @@ impl Tenant {
timeline_path: &Path,
new_timeline_id: TimelineId,
new_metadata: &TimelineMetadata,
ancestor: Option<Arc<Timeline>>,
remote_client: Option<RemoteTimelineClient>,
) -> anyhow::Result<Arc<Timeline>> {
let timeline_data = self
.create_timeline_data(new_timeline_id, new_metadata, ancestor, remote_client, None)
.context("Failed to create timeline data structure")?;
crashsafe::create_dir_all(timeline_path).context("Failed to create timeline directory")?;
) -> anyhow::Result<()> {
crashsafe::create_dir(timeline_path).context("Failed to create timeline directory")?;
fail::fail_point!("after-timeline-uninit-mark-creation", |_| {
anyhow::bail!("failpoint after-timeline-uninit-mark-creation");
@@ -2882,8 +2861,7 @@ impl Tenant {
true,
)
.context("Failed to create timeline metadata")?;
Ok(timeline_data)
Ok(())
}
/// Attempts to create an uninit mark file for the timeline initialization.
@@ -2911,7 +2889,10 @@ impl Tenant {
let uninit_mark_path = self
.conf
.timeline_uninit_mark_file_path(tenant_id, timeline_id);
fs::File::create(&uninit_mark_path)
fs::OpenOptions::new()
.create_new(true)
.write(true)
.open(&uninit_mark_path)
.context("Failed to create uninit mark file")
.and_then(|_| {
crashsafe::fsync_file_and_parent(&uninit_mark_path)
@@ -3010,7 +2991,7 @@ fn remove_timeline_and_uninit_mark(timeline_dir: &Path, uninit_mark: &Path) -> a
})
.with_context(|| {
format!(
"Failed to remove unit marked timeline directory {}",
"Failed to remove uninit marked timeline directory {}",
timeline_dir.display()
)
})?;
@@ -3498,7 +3479,8 @@ mod tests {
#[tokio::test]
async fn test_basic() -> anyhow::Result<()> {
let (tenant, ctx) = TenantHarness::create("test_basic")?.load().await;
let tline = tenant.create_test_timeline(TIMELINE_ID, Lsn(0), DEFAULT_PG_VERSION, &ctx)?;
let tline =
tenant.create_test_timeline(TIMELINE_ID, Lsn(0x05), DEFAULT_PG_VERSION, &ctx)?;
let writer = tline.writer();
writer.put(*TEST_KEY, Lsn(0x10), &Value::Image(TEST_IMG("foo at 0x10")))?;
@@ -3531,9 +3513,9 @@ mod tests {
let (tenant, ctx) = TenantHarness::create("no_duplicate_timelines")?
.load()
.await;
let _ = tenant.create_test_timeline(TIMELINE_ID, Lsn(0), DEFAULT_PG_VERSION, &ctx)?;
let _ = tenant.create_test_timeline(TIMELINE_ID, Lsn(0x10), DEFAULT_PG_VERSION, &ctx)?;
match tenant.create_empty_timeline(TIMELINE_ID, Lsn(0), DEFAULT_PG_VERSION, &ctx) {
match tenant.create_empty_timeline(TIMELINE_ID, Lsn(0x10), DEFAULT_PG_VERSION, &ctx) {
Ok(_) => panic!("duplicate timeline creation should fail"),
Err(e) => assert_eq!(
e.to_string(),
@@ -3562,7 +3544,8 @@ mod tests {
use std::str::from_utf8;
let (tenant, ctx) = TenantHarness::create("test_branch")?.load().await;
let tline = tenant.create_test_timeline(TIMELINE_ID, Lsn(0), DEFAULT_PG_VERSION, &ctx)?;
let tline =
tenant.create_test_timeline(TIMELINE_ID, Lsn(0x10), DEFAULT_PG_VERSION, &ctx)?;
let writer = tline.writer();
#[allow(non_snake_case)]
@@ -3659,7 +3642,8 @@ mod tests {
TenantHarness::create("test_prohibit_branch_creation_on_garbage_collected_data")?
.load()
.await;
let tline = tenant.create_test_timeline(TIMELINE_ID, Lsn(0), DEFAULT_PG_VERSION, &ctx)?;
let tline =
tenant.create_test_timeline(TIMELINE_ID, Lsn(0x10), DEFAULT_PG_VERSION, &ctx)?;
make_some_layers(tline.as_ref(), Lsn(0x20)).await?;
// this removes layers before lsn 40 (50 minus 10), so there are two remaining layers, image and delta for 31-50
@@ -3746,7 +3730,8 @@ mod tests {
TenantHarness::create("test_get_branchpoints_from_an_inactive_timeline")?
.load()
.await;
let tline = tenant.create_test_timeline(TIMELINE_ID, Lsn(0), DEFAULT_PG_VERSION, &ctx)?;
let tline =
tenant.create_test_timeline(TIMELINE_ID, Lsn(0x10), DEFAULT_PG_VERSION, &ctx)?;
make_some_layers(tline.as_ref(), Lsn(0x20)).await?;
tenant
@@ -3794,7 +3779,8 @@ mod tests {
TenantHarness::create("test_retain_data_in_parent_which_is_needed_for_child")?
.load()
.await;
let tline = tenant.create_test_timeline(TIMELINE_ID, Lsn(0), DEFAULT_PG_VERSION, &ctx)?;
let tline =
tenant.create_test_timeline(TIMELINE_ID, Lsn(0x10), DEFAULT_PG_VERSION, &ctx)?;
make_some_layers(tline.as_ref(), Lsn(0x20)).await?;
tenant
@@ -3817,7 +3803,8 @@ mod tests {
TenantHarness::create("test_parent_keeps_data_forever_after_branching")?
.load()
.await;
let tline = tenant.create_test_timeline(TIMELINE_ID, Lsn(0), DEFAULT_PG_VERSION, &ctx)?;
let tline =
tenant.create_test_timeline(TIMELINE_ID, Lsn(0x10), DEFAULT_PG_VERSION, &ctx)?;
make_some_layers(tline.as_ref(), Lsn(0x20)).await?;
tenant
@@ -3850,7 +3837,7 @@ mod tests {
{
let (tenant, ctx) = harness.load().await;
let tline =
tenant.create_test_timeline(TIMELINE_ID, Lsn(0x8000), DEFAULT_PG_VERSION, &ctx)?;
tenant.create_test_timeline(TIMELINE_ID, Lsn(0x7000), DEFAULT_PG_VERSION, &ctx)?;
make_some_layers(tline.as_ref(), Lsn(0x8000)).await?;
}
@@ -3870,7 +3857,7 @@ mod tests {
{
let (tenant, ctx) = harness.load().await;
let tline =
tenant.create_test_timeline(TIMELINE_ID, Lsn(0), DEFAULT_PG_VERSION, &ctx)?;
tenant.create_test_timeline(TIMELINE_ID, Lsn(0x10), DEFAULT_PG_VERSION, &ctx)?;
make_some_layers(tline.as_ref(), Lsn(0x20)).await?;
@@ -3907,7 +3894,8 @@ mod tests {
let harness = TenantHarness::create(TEST_NAME)?;
let (tenant, ctx) = harness.load().await;
let tline = tenant.create_test_timeline(TIMELINE_ID, Lsn(0), DEFAULT_PG_VERSION, &ctx)?;
let tline =
tenant.create_test_timeline(TIMELINE_ID, Lsn(0x10), DEFAULT_PG_VERSION, &ctx)?;
drop(tline);
drop(tenant);
@@ -3945,7 +3933,8 @@ mod tests {
#[tokio::test]
async fn test_images() -> anyhow::Result<()> {
let (tenant, ctx) = TenantHarness::create("test_images")?.load().await;
let tline = tenant.create_test_timeline(TIMELINE_ID, Lsn(0), DEFAULT_PG_VERSION, &ctx)?;
let tline =
tenant.create_test_timeline(TIMELINE_ID, Lsn(0x05), DEFAULT_PG_VERSION, &ctx)?;
let writer = tline.writer();
writer.put(*TEST_KEY, Lsn(0x10), &Value::Image(TEST_IMG("foo at 0x10")))?;
@@ -4010,7 +3999,8 @@ mod tests {
#[tokio::test]
async fn test_bulk_insert() -> anyhow::Result<()> {
let (tenant, ctx) = TenantHarness::create("test_bulk_insert")?.load().await;
let tline = tenant.create_test_timeline(TIMELINE_ID, Lsn(0), DEFAULT_PG_VERSION, &ctx)?;
let tline =
tenant.create_test_timeline(TIMELINE_ID, Lsn(0x05), DEFAULT_PG_VERSION, &ctx)?;
let mut lsn = Lsn(0x10);
@@ -4052,7 +4042,8 @@ mod tests {
#[tokio::test]
async fn test_random_updates() -> anyhow::Result<()> {
let (tenant, ctx) = TenantHarness::create("test_random_updates")?.load().await;
let tline = tenant.create_test_timeline(TIMELINE_ID, Lsn(0), DEFAULT_PG_VERSION, &ctx)?;
let tline =
tenant.create_test_timeline(TIMELINE_ID, Lsn(0x10), DEFAULT_PG_VERSION, &ctx)?;
const NUM_KEYS: usize = 1000;
@@ -4064,7 +4055,7 @@ mod tests {
// a read sees the latest page version.
let mut updated = [Lsn(0); NUM_KEYS];
let mut lsn = Lsn(0);
let mut lsn = Lsn(0x10);
#[allow(clippy::needless_range_loop)]
for blknum in 0..NUM_KEYS {
lsn = Lsn(lsn.0 + 0x10);
@@ -4126,7 +4117,7 @@ mod tests {
.load()
.await;
let mut tline =
tenant.create_test_timeline(TIMELINE_ID, Lsn(0), DEFAULT_PG_VERSION, &ctx)?;
tenant.create_test_timeline(TIMELINE_ID, Lsn(0x10), DEFAULT_PG_VERSION, &ctx)?;
const NUM_KEYS: usize = 1000;
@@ -4138,7 +4129,7 @@ mod tests {
// a read sees the latest page version.
let mut updated = [Lsn(0); NUM_KEYS];
let mut lsn = Lsn(0);
let mut lsn = Lsn(0x10);
#[allow(clippy::needless_range_loop)]
for blknum in 0..NUM_KEYS {
lsn = Lsn(lsn.0 + 0x10);
@@ -4209,7 +4200,7 @@ mod tests {
.load()
.await;
let mut tline =
tenant.create_test_timeline(TIMELINE_ID, Lsn(0), DEFAULT_PG_VERSION, &ctx)?;
tenant.create_test_timeline(TIMELINE_ID, Lsn(0x10), DEFAULT_PG_VERSION, &ctx)?;
const NUM_KEYS: usize = 100;
const NUM_TLINES: usize = 50;
@@ -4218,7 +4209,7 @@ mod tests {
// Track page mutation lsns across different timelines.
let mut updated = [[Lsn(0); NUM_KEYS]; NUM_TLINES];
let mut lsn = Lsn(0);
let mut lsn = Lsn(0x10);
#[allow(clippy::needless_range_loop)]
for idx in 0..NUM_TLINES {

View File

@@ -1264,7 +1264,8 @@ mod tests {
let harness = TenantHarness::create(test_name)?;
let (tenant, ctx) = runtime.block_on(harness.load());
// create an empty timeline directory
let _ = tenant.create_test_timeline(TIMELINE_ID, Lsn(0), DEFAULT_PG_VERSION, &ctx)?;
let _ =
tenant.create_test_timeline(TIMELINE_ID, Lsn(0x10), DEFAULT_PG_VERSION, &ctx)?;
let remote_fs_dir = harness.conf.workdir.join("remote_fs");
std::fs::create_dir_all(remote_fs_dir)?;

View File

@@ -581,6 +581,11 @@ impl Timeline {
self.disk_consistent_lsn.load()
}
#[cfg(test)]
pub(super) fn set_disk_consistent_lsn(&self, lsn: Lsn) {
self.disk_consistent_lsn.store(lsn)
}
pub fn get_remote_consistent_lsn(&self) -> Option<Lsn> {
if let Some(remote_client) = &self.remote_client {
remote_client.last_uploaded_consistent_lsn()
@@ -1559,9 +1564,16 @@ impl Timeline {
));
}
///
/// Initialize with an empty layer map. Used when creating a new timeline.
///
pub(super) fn init_empty_layer_map(&self, start_lsn: Lsn) {
let mut layers = self.layers.write().unwrap();
layers.next_open_layer_at = Some(Lsn(start_lsn.0));
}
///
/// Scan the timeline directory to populate the layer map.
/// Returns all timeline-related files that were found and loaded.
///
pub(super) fn load_layer_map(&self, disk_consistent_lsn: Lsn) -> anyhow::Result<()> {
let mut layers = self.layers.write().unwrap();
@@ -2629,7 +2641,11 @@ impl Timeline {
let layer;
if let Some(open_layer) = &layers.open_layer {
if open_layer.get_lsn_range().start > lsn {
bail!("unexpected open layer in the future");
bail!(
"unexpected open layer in the future: open layers starts at {}, write lsn {}",
open_layer.get_lsn_range().start,
lsn
);
}
layer = Arc::clone(open_layer);

View File

@@ -1324,7 +1324,7 @@ mod tests {
async fn dummy_state(harness: &TenantHarness<'_>) -> ConnectionManagerState {
let (tenant, ctx) = harness.load().await;
let timeline = tenant
.create_test_timeline(TIMELINE_ID, Lsn(0), crate::DEFAULT_PG_VERSION, &ctx)
.create_test_timeline(TIMELINE_ID, Lsn(0x10), crate::DEFAULT_PG_VERSION, &ctx)
.expect("Failed to create an empty timeline for dummy wal connection manager");
ConnectionManagerState {

View File

@@ -1,10 +1,12 @@
import concurrent.futures
import os
import re
from typing import List, Tuple
import pytest
from fixtures.log_helper import log
from fixtures.neon_fixtures import Endpoint, NeonEnv, NeonEnvBuilder
from fixtures.pageserver.utils import wait_until_tenant_state
from fixtures.types import TenantId, TimelineId
@@ -160,6 +162,115 @@ def test_timeline_init_break_before_checkpoint(neon_simple_env: NeonEnv):
), "pageserver should clean its temp timeline files on timeline creation failure"
def test_timeline_init_crash_restart_cleans_up_intermediate_files(neon_simple_env: NeonEnv):
env = neon_simple_env
pageserver_http = env.pageserver.http_client()
tenant_id, _ = env.neon_cli.create_tenant()
timelines_dir = env.repo_dir / "tenants" / str(tenant_id) / "timelines"
old_tenant_timelines = env.neon_cli.list_timelines(tenant_id)
initial_timeline_dirs = [d for d in timelines_dir.iterdir()]
# simulate crash during timeline init (some intermediate files are on disk), before it's checkpointed.
pageserver_http.configure_failpoints(("before-checkpoint-new-timeline", "exit"))
with pytest.raises(Exception):
_ = env.neon_cli.create_timeline("test_timeline_init_break_before_checkpoint", tenant_id)
# it should already be dead, any way to check that?
env.neon_cli.pageserver_stop(immediate=True)
# self-test: ensure that we crashed the pageserver whith intermediate files present
crashed_timeline_dirs = [d for d in timelines_dir.iterdir()]
assert set(crashed_timeline_dirs).issuperset(set(initial_timeline_dirs))
new_dirs = set(crashed_timeline_dirs) - set(initial_timeline_dirs)
expected = {
"initdb dir": r"^basebackup-.*\.___temp$",
"timeline dir": r"^[0-9a-f]+$",
"timeline uninit marker file": r"^[0-9a-f]+\.___uninit$",
}
assert len(new_dirs) == len(expected.keys()), f"unexpected new directories: {new_dirs}"
for new_dir in new_dirs:
for direntry_type, pattern in expected.items():
if re.match(pattern, new_dir.name):
expected.pop(direntry_type)
break
assert len(expected.keys()) == 0, f"unexpected new directories: {expected}"
# start the pageserver and assert that it cleans up
env.neon_cli.pageserver_start()
# Creating the timeline didn't finish. The other timelines on tenant should still be present and work normally.
new_tenant_timelines = env.neon_cli.list_timelines(tenant_id)
assert (
new_tenant_timelines == old_tenant_timelines
), f"Pageserver after restart should ignore non-initialized timelines for tenant {tenant_id}"
timeline_dirs = [d for d in timelines_dir.iterdir()]
assert (
timeline_dirs == initial_timeline_dirs
), "pageserver should clean its temp timeline files on timeline creation failure"
def test_timeline_init_crash_removal_failure_breaks_tenant(neon_simple_env: NeonEnv):
env = neon_simple_env
pageserver_http = env.pageserver.http_client()
tenant_id, _ = env.neon_cli.create_tenant()
timelines_dir = env.repo_dir / "tenants" / str(tenant_id) / "timelines"
initial_timeline_dirs = [d for d in timelines_dir.iterdir()]
# simulate crash during timeline init (some intermediate files are on disk), before it's checkpointed.
pageserver_http.configure_failpoints(("before-checkpoint-new-timeline", "exit"))
with pytest.raises(Exception):
_ = env.neon_cli.create_timeline("test_timeline_init_break_before_checkpoint", tenant_id)
# it should already be dead, any way to check that?
env.neon_cli.pageserver_stop(immediate=True)
# self-test: ensure that we crashed the pageserver whith intermediate files present
crashed_timeline_dirs = [d for d in timelines_dir.iterdir()]
assert set(crashed_timeline_dirs).issuperset(set(initial_timeline_dirs))
new_dirs = set(crashed_timeline_dirs) - set(initial_timeline_dirs)
expected = {
"initdb dir": r"^basebackup-.*\.___temp$",
"timeline dir": r"^[0-9a-f]+$",
"timeline uninit marker file": r"^[0-9a-f]+\.___uninit$",
}
assert len(new_dirs) == len(expected.keys()), f"unexpected new directories: {new_dirs}"
for new_dir in new_dirs:
for direntry_type, pattern in expected.items():
if re.match(pattern, new_dir.name):
expected.pop(direntry_type)
break
assert len(expected.keys()) == 0, f"unexpected new directories: {expected}"
try:
# make the directory unremovable
for new_dir in new_dirs:
new_dir.chmod(0o444)
# start the pageserver, it should attempt the removal
env.neon_cli.pageserver_start()
# and the removal should fail, causing the tenant to be Broken
wait_until_tenant_state(pageserver_http, tenant_id, "Broken", 5)
# inspect the reason
status = pageserver_http.tenant_status(tenant_id)
assert "remove" in status["state"]["data"]["reason"]
log_line = r".*load.*Failed to remove uninit marked timeline directory"
assert env.pageserver.log_contains(log_line)
env.pageserver.allowed_errors.append(log_line)
env.neon_cli.pageserver_stop(immediate=True)
finally:
# make the directory removable again
for new_dir in new_dirs:
new_dir.chmod(0o755)
env.neon_cli.pageserver_start()
wait_until_tenant_state(pageserver_http, tenant_id, "Active", 5)
def test_timeline_create_break_after_uninit_mark(neon_simple_env: NeonEnv):
env = neon_simple_env
pageserver_http = env.pageserver.http_client()