mirror of
https://github.com/neondatabase/neon.git
synced 2026-05-14 19:50:38 +00:00
For 2 weeks we've seen rare, spurious, not-reproducible page reconstruction
failures with PG16 in prod.
One of the commits we deployed this week was
Commit
commit fc467941f9
Author: Joonas Koivunen <joonas@neon.tech>
Date: Wed Oct 4 16:19:19 2023 +0300
walredo: log retryed error (#546)
With the logs from that commit, we learned that some read() or write()
system call that walredo does fails with `EAGAIN`, aka
`Resource temporarily unavailable (os error 11)`.
But we have no idea where exactly in the code we get back that error.
So, use anyhow instead of fake std::io::Error's as an easy way to get
a backtrace when the error happens, and change the logging to print
that backtrace (i.e., use `{:?}` instead of
`utils::error::report_compact_sources(e)`).
If we ever switch back to `report_compact_sources`, we should make sure
we have some other way to uniquely identify the places where we return
an error in the error message.