neon/utils at 30a7dd630c64ade80aef0aec935e1e51b0f93a50 - neon

rust/neon

mirror of https://github.com/neondatabase/neon.git synced 2026-01-06 21:12:55 +00:00

Files

Christian Schwarz 9fb77d6cdd buffered writer: add cancellation sensitivity (#11052 )

In
-
https://github.com/neondatabase/neon/pull/10993#issuecomment-2690428336

I added infinite retries for buffered writer flush IOs, primarily to
gracefully handle ENOSPC but more generally so that the buffered writer
is not left in a state where reads from the surrounding InMemoryLayer
cause panics.

However, I didn't add cancellation sensitivity, which is concerning
because then there is no way to detach a timeline/tenant that is
encountering the write IO errors.
That’s a legitimate scenario in the case of some edge case bug. 
See the #10993 description for details.


This PR
- first makes flush loop infallible, enabled by infinite retries
- then adds sensitivity to `Timeline::cancel` to the flush loop, thereby
making it fallible in one specific way again
- finally fixes the InMemoryLayer/EphemeralFile/BufferedWriter
amalgamate to remain read-available after flush loop is cancelled.

The support for read-availability after cancellation is necessary so
that reads from the InMemoryLayer that are already queued up behind the
RwLock that wraps the BufferedWriter won't panic because of the
`mutable=None` that we leave behind in case the flush loop gets
cancelled.

# Alternatives

One might think that we can only ship the change for read-availability
if flush encounters an error, without the infinite retrying and/or
cancellation sensitivity complexity.

The problem with that is that read-availability sounds good but is
really quite useless, because we cannot ingest new WAL without a
writable InMemoryLayer. Thus, very soon after we transition to read-only
mode, reads from compute are going to wait anyway, but on `wait_lsn`
instead of the RwLock, because ingest isn't progressing.

Thus, having the infinite flush retries still makes more sense because
they're just "slowness" to the user, whereas wait_lsn is hard errors.

2025-03-18 18:48:43 +00:00

benches

pageserver: better observability for slow wait_lsn (#11176 )

2025-03-13 15:03:53 +00:00

scripts

Enable sanitizers for postgres v17 (#10401 )

2025-02-06 12:53:43 +00:00

src

buffered writer: add cancellation sensitivity (#11052 )

2025-03-18 18:48:43 +00:00

tests

Migrate the last crates to edition 2024 (#10998 )

2025-02-27 09:40:40 +00:00

Cargo.toml

utils: explicit OTEL export config and OTEL enablement via common entry point (#11139 )

2025-03-12 11:07:49 +00:00