neon/pageserver at dd6990567fadc34f12afab8d617fda9429736a3f - neon

rust/neon

mirror of https://github.com/neondatabase/neon.git synced 2026-07-08 22:50:37 +00:00

Files

Christian Schwarz dd6990567f walredo: apply_batch_postgres: get a backtrace whenever it encounters an error (#5541 )

For 2 weeks we've seen rare, spurious, not-reproducible page
reconstruction
failures with PG16 in prod.

One of the commits we deployed this week was

Commit

    commit fc467941f9
    Author: Joonas Koivunen <joonas@neon.tech>
    Date:   Wed Oct 4 16:19:19 2023 +0300

        walredo: log retryed error (#546)

With the logs from that commit, we learned that some read() or write()
system call that walredo does fails with `EAGAIN`, aka
`Resource temporarily unavailable (os error 11)`.

But we have no idea where exactly in the code we get back that error.

So, use anyhow instead of fake std::io::Error's as an easy way to get
a backtrace when the error happens, and change the logging to print
that backtrace (i.e., use `{:?}` instead of
`utils::error::report_compact_sources(e)`).

The `WalRedoError` type had to go because we add additional `.context()`
further up the call chain before we `{:?}`-print it. That additional
`.context()` further up doesn't see that there's already an
anyhow::Error
inside the `WalRedoError::ApplyWalRecords` variant, and hence captures
another backtrace and prints that one on `{:?}`-print instead of the
original one inside `WalRedoError::ApplyWalRecords`.

If we ever switch back to `report_compact_sources`, we should make sure
we have some other way to uniquely identify the places where we return
an error in the error message.

2023-10-13 14:08:23 +00:00

benches

walredo: apply_batch_postgres: get a backtrace whenever it encounters an error (#5541 )

2023-10-13 14:08:23 +00:00

ctl

fix: replace all std::PathBufs with camino::Utf8PathBuf (#5352 )

2023-10-04 17:52:23 +03:00

fixtures

walredo: simple tests and bench updates (#3045 )

2023-01-16 18:24:45 +02:00

src

walredo: apply_batch_postgres: get a backtrace whenever it encounters an error (#5541 )

2023-10-13 14:08:23 +00:00

Cargo.toml

fix: replace all std::PathBufs with camino::Utf8PathBuf (#5352 )

2023-10-04 17:52:23 +03:00