mirror of
https://github.com/neondatabase/neon.git
synced 2025-12-23 06:09:59 +00:00
## Problem Found typos while reading the docs ## Summary of changes Fixed the typos found
110 lines
4.4 KiB
Markdown
110 lines
4.4 KiB
Markdown
## Thread management
|
|
|
|
The pageserver uses Tokio for handling concurrency. Everything runs in
|
|
Tokio tasks, although some parts are written in blocking style and use
|
|
spawn_blocking().
|
|
|
|
We currently use std blocking functions for disk I/O, however. The
|
|
current model is that we consider disk I/Os to be short enough that we
|
|
perform them while running in a Tokio task. Changing all the disk I/O
|
|
calls to async is a TODO.
|
|
|
|
Each Tokio task is tracked by the `task_mgr` module. It maintains a
|
|
registry of tasks, and which tenant or timeline they are operating
|
|
on.
|
|
|
|
### Handling shutdown
|
|
|
|
When a tenant or timeline is deleted, we need to shut down all tasks
|
|
operating on it, before deleting the data on disk. There's a function,
|
|
`shutdown_tasks`, to request all tasks of a particular tenant or
|
|
timeline to shutdown. It will also wait for them to finish.
|
|
|
|
A task registered in the task registry can check if it has been
|
|
requested to shut down, by calling `is_shutdown_requested()`. There's
|
|
also a `shutdown_watcher()` Future that can be used with `tokio::select!`
|
|
or similar, to wake up on shutdown.
|
|
|
|
|
|
### Async cancellation safety
|
|
|
|
In async Rust, futures can be "cancelled" at any await point, by
|
|
dropping the Future. For example, `tokio::select!` returns as soon as
|
|
one of the Futures returns, and drops the others. `tokio::time::timeout`
|
|
is another example. In the Rust ecosystem, some functions are
|
|
cancellation-safe, meaning they can be safely dropped without
|
|
side-effects, while others are not. See documentation of
|
|
`tokio::select!` for examples.
|
|
|
|
In the pageserver and safekeeper, async code is *not*
|
|
cancellation-safe by default. Unless otherwise marked, any async
|
|
function that you call cannot be assumed to be async
|
|
cancellation-safe, and must be polled to completion.
|
|
|
|
The downside of non-cancellation safe code is that you have to be very
|
|
careful when using `tokio::select!`, `tokio::time::timeout`, and other
|
|
such functions that can cause a Future to be dropped. They can only be
|
|
used with functions that are explicitly documented to be cancellation-safe,
|
|
or you need to spawn a separate task to shield from the cancellation.
|
|
|
|
At the entry points to the code, we also take care to poll futures to
|
|
completion, or shield the rest of the code from surprise cancellations
|
|
by spawning a separate task. The code that handles incoming HTTP
|
|
requests, for example, spawns a separate task for each request,
|
|
because Hyper will drop the request-handling Future if the HTTP
|
|
connection is lost.
|
|
|
|
|
|
#### How to cancel, then?
|
|
|
|
If our code is not cancellation-safe, how do you cancel long-running
|
|
tasks? Use CancellationTokens.
|
|
|
|
TODO: More details on that. And we have an ongoing discussion on what
|
|
to do if cancellations might come from multiple sources.
|
|
|
|
#### Exceptions
|
|
Some library functions are cancellation-safe, and are explicitly marked
|
|
as such. For example, `utils::seqwait`.
|
|
|
|
#### Rationale
|
|
|
|
The alternative would be to make all async code cancellation-safe,
|
|
unless otherwise marked. That way, you could use `tokio::select!` more
|
|
liberally. The reasons we didn't choose that are explained in this
|
|
section.
|
|
|
|
Writing code in a cancellation-safe manner is tedious, as you need to
|
|
scrutinize every `.await` and ensure that if the `.await` call never
|
|
returns, the system is in a safe, consistent state. In some ways, you
|
|
need to do that with `?` and early `returns`, too, but `.await`s are
|
|
easier to miss. It is also easier to perform cleanup tasks when a
|
|
function returns an `Err` than when an `.await` simply never
|
|
returns. You can use `scopeguard` and Drop guards to perform cleanup
|
|
tasks, but it is more tedious. An `.await` that never returns is more
|
|
similar to a panic.
|
|
|
|
Note that even if you only use building blocks that themselves are
|
|
cancellation-safe, it doesn't mean that the code as whole is
|
|
cancellation-safe. For example, consider the following code:
|
|
|
|
```
|
|
while let Some(i) = work_inbox.recv().await {
|
|
if let Err(_) = results_outbox.send(i).await {
|
|
println!("receiver dropped");
|
|
return;
|
|
}
|
|
}
|
|
}
|
|
```
|
|
|
|
It reads messages from one channel, sends them to another channel. If
|
|
this code is cancelled at the `results_outbox.send(i).await`, the
|
|
message read from the receiver is lost. That may or may not be OK,
|
|
depending on the context.
|
|
|
|
Another reason to not require cancellation-safety is historical: we
|
|
already had a lot of async code that was not scrutinized for
|
|
cancellation-safety when this issue was raised. Scrutinizing all
|
|
existing code is no fun.
|