Define a layer's start and end bounds more precisely.

After this, a layer's start bound is always defined to be inclusive, and
end bound exclusive.

For example, if you have a layer in the range 100-200, that layer can be
used for GetPage@LSN requests at LSN 100, 199, or anything in between.
But for LSN 200, you need to look at the next layer (if one exists).

This is one part of a fix for https://github.com/zenithdb/zenith/issues/517.
After this, the page server shouldn't create layers for the same segment
with the same LSN, which avoids the issue. However, the same thing would
still happen, if you managed to create layers with same start LSN again.
That could happen e.g. if you had two page servers running, or in some
weird crash/restart scenario, or due to bugs or features added later. The
next commit makes the layer map more robust, so that it tolerates that
situation without deleting wrong files.
This commit is contained in:
Heikki Linnakangas
2021-09-24 14:10:49 +03:00
parent d4e037f1e7
commit 2319e0ec8f
6 changed files with 37 additions and 28 deletions

View File

@@ -1237,7 +1237,7 @@ impl LayeredTimeline {
// First modification on this timeline
start_lsn = self.ancestor_lsn;
trace!(
"creating file for write for {} at branch point {}/{}",
"creating layer for write for {} at branch point {}/{}",
seg,
self.timelineid,
start_lsn
@@ -1245,7 +1245,7 @@ impl LayeredTimeline {
} else {
start_lsn = prev_layer.get_end_lsn();
trace!(
"creating file for write for {} after previous layer {}/{}",
"creating layer for write for {} after previous layer {}/{}",
seg,
self.timelineid,
start_lsn

View File

@@ -25,11 +25,13 @@ OnDisk layers can be Image or Delta:
Dropped segments are always represented on disk by DeltaLayer.
LSN range defined by start_lsn and end_lsn:
- start_lsn is always inclusive.
- end_lsn depends on layer kind:
- InMemoryLayer is either unbounded (end_lsn = MAX_LSN) or dropped (end_lsn = drop_lsn)
- ImageLayer represents snapshot at one LSN, so end_lsn = lsn.
- DeltaLayer has explicit end_lsn, which represents end of incremental layer.
- start_lsn is inclusive.
- end_lsn is exclusive.
For an open in-memory layer, the end_lsn is MAX_LSN. For a frozen
in-memory layer or a delta layer, it is a valid end bound. An image
layer represents snapshot at one LSN, so end_lsn is always the
snapshot LSN + 1
Layers can be open or historical:
- Open layer is a writeable one. Only InMemory layer can be open.

View File

@@ -152,7 +152,8 @@ impl Layer for ImageLayer {
}
fn get_end_lsn(&self) -> Lsn {
self.lsn
// End-bound is exclusive
self.lsn + 1
}
/// Look up given page in the file

View File

@@ -506,6 +506,8 @@ impl InMemoryLayer {
) -> Result<InMemoryLayer> {
let seg = src.get_seg_tag();
assert!(oldest_pending_lsn >= start_lsn);
trace!(
"initializing new InMemoryLayer for writing {} on timeline {} at {}",
seg,
@@ -549,8 +551,11 @@ impl InMemoryLayer {
/// After completion, self is non-writeable, but not frozen.
pub fn freeze(self: Arc<Self>, cutoff_lsn: Lsn) -> Result<FreezeLayers> {
info!(
"freezing in memory layer for {} on timeline {} at {}",
self.seg, self.timelineid, cutoff_lsn
"freezing in memory layer {} on timeline {} at {} (oldest {})",
self.filename().display(),
self.timelineid,
cutoff_lsn,
self.oldest_pending_lsn
);
self.assert_not_frozen();
@@ -624,8 +629,8 @@ impl InMemoryLayer {
frozen.clone(),
self.timelineid,
self.tenantid,
cutoff_lsn,
cutoff_lsn,
cutoff_lsn + 1,
cutoff_lsn + 1,
)?;
let new_inner = new_open.inner.get_mut().unwrap();

View File

@@ -111,15 +111,14 @@ pub trait Layer: Send + Sync {
/// Identify the relish segment
fn get_seg_tag(&self) -> SegmentTag;
/// Inclusive start bound of the LSN range that this layer hold
/// Inclusive start bound of the LSN range that this layer holds
fn get_start_lsn(&self) -> Lsn;
/// 'end_lsn' meaning depends on the layer kind:
/// - in-memory layer is either unbounded (end_lsn = MAX_LSN) or dropped (end_lsn = drop_lsn)
/// - image layer represents snapshot at one LSN, so end_lsn = lsn
/// - delta layer has end_lsn
/// Exclusive end bound of the LSN range that this layer holds.
///
/// TODO Is end_lsn always exclusive for all layer kinds?
/// - For an open in-memory layer, this is MAX_LSN.
/// - For a frozen in-memory layer or a delta layer, this is a valid end bound.
/// - An image layer represents snapshot at one LSN, so end_lsn is always the snapshot LSN + 1
fn get_end_lsn(&self) -> Lsn;
/// Is the segment represented by this layer dropped by PostgreSQL?

View File

@@ -58,19 +58,21 @@ def test_layerfiles_gc(zenith_cli, pageserver, postgres, pg_bin):
layer_relfiles_remain = row['layer_relfiles_total'] - row['layer_relfiles_removed']
assert layer_relfiles_remain > 0
# Insert a row.
# Insert a row and run GC. Checkpoint should freeze the layer
# so that there is only the most recent image layer left for the rel,
# removing the old image and delta layer.
print("Inserting one row and running GC")
cur.execute("INSERT INTO foo VALUES (1)")
pscur.execute(f"do_gc {pageserver.initial_tenant} {timeline} 0")
row = pscur.fetchone()
print_gc_result(row);
assert row['layer_relfiles_total'] == layer_relfiles_remain + 1
assert row['layer_relfiles_removed'] == 1
assert row['layer_relfiles_total'] == layer_relfiles_remain + 2
assert row['layer_relfiles_removed'] == 2
assert row['layer_relfiles_dropped'] == 0
# Insert two more rows and run GC.
# This should create a new layer file with the new contents, and
# remove the old one.
# This should create new image and delta layer file with the new contents, and
# then remove the old one image and the just-created delta layer.
print("Inserting two more rows and running GC")
cur.execute("INSERT INTO foo VALUES (2)")
cur.execute("INSERT INTO foo VALUES (3)")
@@ -78,11 +80,11 @@ def test_layerfiles_gc(zenith_cli, pageserver, postgres, pg_bin):
pscur.execute(f"do_gc {pageserver.initial_tenant} {timeline} 0")
row = pscur.fetchone()
print_gc_result(row);
assert row['layer_relfiles_total'] == layer_relfiles_remain + 1
assert row['layer_relfiles_removed'] == 1
assert row['layer_relfiles_total'] == layer_relfiles_remain + 2
assert row['layer_relfiles_removed'] == 2
assert row['layer_relfiles_dropped'] == 0
# Do it again. Should again create a new layer file and remove old one.
# Do it again. Should again create two new layer files and remove old ones.
print("Inserting two more rows and running GC")
cur.execute("INSERT INTO foo VALUES (2)")
cur.execute("INSERT INTO foo VALUES (3)")
@@ -90,8 +92,8 @@ def test_layerfiles_gc(zenith_cli, pageserver, postgres, pg_bin):
pscur.execute(f"do_gc {pageserver.initial_tenant} {timeline} 0")
row = pscur.fetchone()
print_gc_result(row);
assert row['layer_relfiles_total'] == layer_relfiles_remain + 1
assert row['layer_relfiles_removed'] == 1
assert row['layer_relfiles_total'] == layer_relfiles_remain + 2
assert row['layer_relfiles_removed'] == 2
assert row['layer_relfiles_dropped'] == 0
# Run GC again, with no changes in the database. Should not remove anything.