mirror of
https://github.com/neondatabase/neon.git
synced 2026-01-10 15:02:56 +00:00
fixes https://github.com/neondatabase/neon/issues/7790 (duplicating most of the issue description here for posterity) # Background From the time before always-authoritative `index_part.json`, we had to handle duplicate layers. See the RFC for an illustration of how duplicate layers could happen:a8e6d259cb/docs/rfcs/027-crash-consistent-layer-map-through-index-part.md (L41-L50)As of #5198 , we should not be exposed to that problem anymore. # Problem 1 We still have 1. [code in Pageserver](82960b2175/pageserver/src/tenant/timeline.rs (L4502-L4521)) than handles duplicate layers 2. [tests in the test suite](d9dcbffac3/test_runner/regress/test_duplicate_layers.py (L15)) that demonstrates the problem using a failpoint However, the test in the test suite doesn't use the failpoint to induce a crash that could legitimately happen in production. What is does instead is to return early with an `Ok()`, so that the code in Pageserver that handles duplicate layers (item 1) actually gets exercised. That "return early" would be a bug in the routine if it happened in production. So, the tests in the test suite are tests for their own sake, but don't serve to actually regress-test any production behavior. # Problem 2 Further, if production code _did_ (it nowawdays doesn't!) create a duplicate layer, the code in Pageserver that handles the condition (item 1 above) is too little and too late: * the code handles it by discarding the newer `struct Layer`; that's good. * however, on disk, we have already overwritten the old with the new layer file * the fact that we do it atomically doesn't matter because ... * if the new layer file is not bit-identical, then we have a cache coherency problem * PS PageCache block cache: caches old bit battern * blob_io offsets stored in variables, based on pre-overwrite bit pattern / offsets * => reading based on these offsets from the new file might yield different data than before # Solution - Remove the test suite code pertaining to Problem 1 - Move & rename test suite code that actually tests RFC-27 crash-consistent layer map. - Remove the Pageserver code that handles duplicate layers too late (Problem 1) - Use `RENAME_NOREPLACE` to prevent over-rename the file during `.finish()`, bail with an error if it happens (Problem 2) - This bailing prevents the caller from even trying to insert into the layer map, as they don't even get a `struct Layer` at hand. - Add `abort`s in the place where we have the layer map lock and check for duplicates (Problem 2) - Note again, we can't reach there because we bail from `.finish()` much earlier in the code. - Share the logic to clean up after failed `.finish()` between image layers and delta layers (drive-by cleanup) - This exposed that test `image_layer_rewrite` was overwriting layer files in place. Fix the test. # Future Work This PR adds a new failure scenario that was previously "papered over" by the overwriting of layers: 1. Start a compaction that will produce 3 layers: A, B, C 2. Layer A is `finish()`ed successfully. 3. Layer B fails mid-way at some `put_value()`. 4. Compaction bails out, sleeps 20s. 5. Some disk space gets freed in the meantime. 6. Compaction wakes from sleep, another iteration starts, it attempts to write Layer A again. But the `.finish()` **fails because A already exists on disk**. The failure in step 5 is new with this PR, and it **causes the compaction to get stuck**. Before, it would silently overwrite the file and "successfully" complete the second iteration. The mitigation for this is to `/reset` the tenant.
148 lines
4.1 KiB
Rust
148 lines
4.1 KiB
Rust
/// Extensions to `std::fs` types.
|
|
use std::{fs, io, path::Path};
|
|
|
|
use anyhow::Context;
|
|
|
|
mod rename_noreplace;
|
|
pub use rename_noreplace::rename_noreplace;
|
|
|
|
pub trait PathExt {
|
|
/// Returns an error if `self` is not a directory.
|
|
fn is_empty_dir(&self) -> io::Result<bool>;
|
|
}
|
|
|
|
impl<P> PathExt for P
|
|
where
|
|
P: AsRef<Path>,
|
|
{
|
|
fn is_empty_dir(&self) -> io::Result<bool> {
|
|
Ok(fs::read_dir(self)?.next().is_none())
|
|
}
|
|
}
|
|
|
|
pub async fn is_directory_empty(path: impl AsRef<Path>) -> anyhow::Result<bool> {
|
|
let mut dir = tokio::fs::read_dir(&path)
|
|
.await
|
|
.context(format!("read_dir({})", path.as_ref().display()))?;
|
|
Ok(dir.next_entry().await?.is_none())
|
|
}
|
|
|
|
pub async fn list_dir(path: impl AsRef<Path>) -> anyhow::Result<Vec<String>> {
|
|
let mut dir = tokio::fs::read_dir(&path)
|
|
.await
|
|
.context(format!("read_dir({})", path.as_ref().display()))?;
|
|
|
|
let mut content = vec![];
|
|
while let Some(next) = dir.next_entry().await? {
|
|
let file_name = next.file_name();
|
|
content.push(file_name.to_string_lossy().to_string());
|
|
}
|
|
|
|
Ok(content)
|
|
}
|
|
|
|
pub fn ignore_not_found(e: io::Error) -> io::Result<()> {
|
|
if e.kind() == io::ErrorKind::NotFound {
|
|
Ok(())
|
|
} else {
|
|
Err(e)
|
|
}
|
|
}
|
|
|
|
pub fn ignore_absent_files<F>(fs_operation: F) -> io::Result<()>
|
|
where
|
|
F: Fn() -> io::Result<()>,
|
|
{
|
|
fs_operation().or_else(ignore_not_found)
|
|
}
|
|
|
|
#[cfg(test)]
|
|
mod test {
|
|
use crate::fs_ext::{is_directory_empty, list_dir};
|
|
|
|
use super::ignore_absent_files;
|
|
|
|
#[test]
|
|
fn is_empty_dir() {
|
|
use super::PathExt;
|
|
|
|
let dir = camino_tempfile::tempdir().unwrap();
|
|
let dir_path = dir.path();
|
|
|
|
// test positive case
|
|
assert!(
|
|
dir_path.is_empty_dir().expect("test failure"),
|
|
"new tempdir should be empty"
|
|
);
|
|
|
|
// invoke on a file to ensure it returns an error
|
|
let file_path = dir_path.join("testfile");
|
|
let f = std::fs::File::create(&file_path).unwrap();
|
|
drop(f);
|
|
assert!(file_path.is_empty_dir().is_err());
|
|
|
|
// do it again on a path, we know to be nonexistent
|
|
std::fs::remove_file(&file_path).unwrap();
|
|
assert!(file_path.is_empty_dir().is_err());
|
|
}
|
|
|
|
#[tokio::test]
|
|
async fn is_empty_dir_async() {
|
|
let dir = camino_tempfile::tempdir().unwrap();
|
|
let dir_path = dir.path();
|
|
|
|
// test positive case
|
|
assert!(
|
|
is_directory_empty(dir_path).await.expect("test failure"),
|
|
"new tempdir should be empty"
|
|
);
|
|
|
|
// invoke on a file to ensure it returns an error
|
|
let file_path = dir_path.join("testfile");
|
|
let f = std::fs::File::create(&file_path).unwrap();
|
|
drop(f);
|
|
assert!(is_directory_empty(&file_path).await.is_err());
|
|
|
|
// do it again on a path, we know to be nonexistent
|
|
std::fs::remove_file(&file_path).unwrap();
|
|
assert!(is_directory_empty(file_path).await.is_err());
|
|
}
|
|
|
|
#[test]
|
|
fn ignore_absent_files_works() {
|
|
let dir = camino_tempfile::tempdir().unwrap();
|
|
|
|
let file_path = dir.path().join("testfile");
|
|
|
|
ignore_absent_files(|| std::fs::remove_file(&file_path)).expect("should execute normally");
|
|
|
|
let f = std::fs::File::create(&file_path).unwrap();
|
|
drop(f);
|
|
|
|
ignore_absent_files(|| std::fs::remove_file(&file_path)).expect("should execute normally");
|
|
|
|
assert!(!file_path.exists());
|
|
}
|
|
|
|
#[tokio::test]
|
|
async fn list_dir_works() {
|
|
let dir = camino_tempfile::tempdir().unwrap();
|
|
let dir_path = dir.path();
|
|
|
|
assert!(list_dir(dir_path).await.unwrap().is_empty());
|
|
|
|
let file_path = dir_path.join("testfile");
|
|
let _ = std::fs::File::create(&file_path).unwrap();
|
|
|
|
assert_eq!(&list_dir(dir_path).await.unwrap(), &["testfile"]);
|
|
|
|
let another_dir_path = dir_path.join("testdir");
|
|
std::fs::create_dir(another_dir_path).unwrap();
|
|
|
|
let expected = &["testdir", "testfile"];
|
|
let mut actual = list_dir(dir_path).await.unwrap();
|
|
actual.sort();
|
|
assert_eq!(actual, expected);
|
|
}
|
|
}
|