neon/pageserver at 487f3202feb740fe71d8e4bf539befa676e5372e - neon

rust/neon

mirror of https://github.com/neondatabase/neon.git synced 2026-01-07 05:22:56 +00:00

Files

Christian Schwarz 487f3202fe pageserver read path: abort on fatal IO errors from disk / filesystem (#10786 )

Before this PR, an IO error returned from the kernel, e.g., due to a bad
disk, would get bubbled up, all the way to a user-visible query failing.

This is against the IO error handling policy where we have established
and is hence being rectified in this PR.
[[(internal Policy document
link)]](bef44149f7/src/storage/handling_io_and_logical_errors.md (L33-L35))

The practice on the write path seems to be that we call
`maybe_fatal_err()` or `fatal_err()` fairly high up the stack.
That is, regardless of whether std::fs, tokio::fs, or VirtualFile is
used to perform the IO.

For the read path, I choose a centralized approach in this PR by
checking for errors as close to the kernel interface as possible.
I believe this is better for long-term consistency.

To mitigate the problem of missing context if we abort so far down in
the stack, the `on_fatal_io_error` now captures and logs a backtrace.

I grepped the pageserver code base for `fs::read` to convince myself
that all non-VirtualFile reads already handle IO errors according to
policy.

Refs

- fixes https://github.com/neondatabase/neon/issues/10454

2025-02-13 20:53:39 +00:00

benches

pageserver: coalesce index uploads when possible (#10248 )

2025-01-14 21:10:17 +00:00

client

Split utils::http to separate crate (#10753 )

2025-02-11 22:06:53 +00:00

compaction

Update axum to 0.8.1 (#10332 )

2025-01-28 15:32:59 +00:00

ctl

pageserver: add page_trace API for debugging (#10293 )