mirror of
https://github.com/neondatabase/neon.git
synced 2025-12-22 21:59:59 +00:00
refs https://github.com/neondatabase/neon/issues/7136 Problem ------- Before this PR, we were using `tokio_epoll_uring::thread_local_system()`, which panics on tokio_epoll_uring::System::launch() failure As we've learned in [the past](https://github.com/neondatabase/neon/issues/6373#issuecomment-1905814391), some older Linux kernels account io_uring instances as locked memory. And while we've raised the limit in prod considerably, we did hit it once on 2024-03-11 16:30 UTC. That was after we enabled tokio-epoll-uring fleet-wide, but before we had shipped release-5090 (c6ed86d3d0) which did away with the last mass-creation of tokio-epoll-uring instances as per commit3da410c8feAuthor: Christian Schwarz <christian@neon.tech> Date: Tue Mar 5 10:03:54 2024 +0100 tokio-epoll-uring: use it on the layer-creating code paths (#6378) Nonetheless, it highlighted that panicking in this situation is probably not ideal, as it can leave the pageserver process in a semi-broken state. Further, due to low sampling rate of Prometheus metrics, we don't know much about the circumstances of this failure instance. Solution -------- This PR implements a custom thread_local_system() that is pageserver-aware and will do the following on failure: - dump relevant stats to `tracing!`, hopefully they will be useful to understand the circumstances better - if it's the locked memory failure (or any other ENOMEM): abort() the process - if it's ENOMEM, retry with exponential back-off, capped at 3s. - add metric counters so we can create an alert This makes sense in the production environment where we know that _usually_, there's ample locked memory allowance available, and we know the failure rate is rare.
15 lines
442 B
TOML
15 lines
442 B
TOML
disallowed-methods = [
|
|
"tokio::task::block_in_place",
|
|
# Allow this for now, to deny it later once we stop using Handle::block_on completely
|
|
# "tokio::runtime::Handle::block_on",
|
|
# use tokio_epoll_uring_ext instead
|
|
"tokio_epoll_uring::thread_local_system",
|
|
]
|
|
|
|
disallowed-macros = [
|
|
# use std::pin::pin
|
|
"futures::pin_mut",
|
|
# cannot disallow this, because clippy finds used from tokio macros
|
|
#"tokio::pin",
|
|
]
|