neon/utils at a537b2ffd05cb952a3198ca8b36e0dfdfd26e270 - neon

rust/neon

Fork 0

mirror of https://github.com/neondatabase/neon.git synced 2025-12-28 00:23:00 +00:00

Files

History

Christian Schwarz 7eb85c56ac tokio-epoll-uring: avoid warn! noise due to ECANCELED during shutdowns (#11819 )

# Problem

Before this PR, `test_pageserver_catchup_while_compute_down` would
occasionally fail due to scary-looking WARN log line

```
WARN ephemeral_file_buffered_writer{...}:flush_attempt{attempt=1}: \
error flushing buffered writer buffer to disk, retrying after backoff err=Operation canceled (os error 125)
```

After lengthy investigation, the conclusion is that this is likely due
to a kernel bug related due to io_uring async workers (io-wq) and
signals.
The main indicator is that the error only ever happens in correlation
with pageserver shtudown when SIGTERM is received.
There is a fix that is merged in 6.14
kernels (`io-wq: backoff when retrying worker creation`).
However, even when I revert that patch, the issue is not reproducible
on 6.14, so, it remains a speculation.

It was ruled out that the ECANCELED is due to the executor thread
exiting before the async worker starts processing the operation.

# Solution

The workaround in this issue is to retry the operation on ECANCELED
once.
Retries are safe because the low-level io_engine operations are
idempotent.
(We don't use O_APPEND and I can't think of another flag that would make
the APIs covered by this patch not idempotent.)

# Testing

With this PR, the warn! log no longer happens on [my reproducer
setup](https://github.com/neondatabase/neon/issues/11446#issuecomment-2843015111).
And the new rate-limited `info!`-level log line informing about the
internal retry shows up instead, as expected.

# Refs
- fixes https://github.com/neondatabase/neon/issues/11446

2025-05-08 06:33:29 +00:00

benches

pageserver: better observability for slow wait_lsn (#11176 )

2025-03-13 15:03:53 +00:00

scripts

Enable sanitizers for postgres v17 (#10401 )

2025-02-06 12:53:43 +00:00

src

tokio-epoll-uring: avoid warn! noise due to ECANCELED during shutdowns (#11819 )

2025-05-08 06:33:29 +00:00

tests

Migrate the last crates to edition 2024 (#10998 )

2025-02-27 09:40:40 +00:00

Cargo.toml

Teach neon_local to pass the Authorization header to compute_ctl (#11490 )

2025-04-15 17:27:49 +00:00