Instead of spawning helper threads, we now use Tokio tasks. There are multiple Tokio runtimes, for different kinds of tasks. One for serving libpq client connections, another for background operations like GC and compaction, and so on. That's not strictly required, we could use just one runtime, but with this you can still get an overview of what's happening with "top -H". There's one subtle behavior in how TenantState is updated. Before this patch, if you deleted all timelines from a tenant, its GC and compaction loops were stopped, and the tenant went back to Idle state. We no longer do that. The empty tenant stays Active. The changes to test_tenant_tasks.py are related to that. There's still plenty of synchronous code and blocking. For example, we still use blocking std::io functions for all file I/O, and the communication with WAL redo processes is still uses low-level unix poll(). We might want to rewrite those later, but this will do for now. The model is that local file I/O is considered to be fast enough that blocking - and preventing other tasks running in the same thread - is acceptable.
1.6 KiB
Thread management
The pageserver uses Tokio for handling concurrency. Everything runs in Tokio tasks, although some parts are written in blocking style and use spawn_blocking().
Each Tokio task is tracked by the task_mgr module. It maintains a
registry of tasks, and which tenant or timeline they are operating
on.
Handling shutdown
When a tenant or timeline is deleted, we need to shut down all tasks
operating on it, before deleting the data on disk. There's a function,
shutdown_tasks, to request all tasks of a particular tenant or
timeline to shutdown. It will also wait for them to finish.
A task registered in the task registry can check if it has been
requested to shut down, by calling is_shutdown_requested(). There's
also a shudown_watcher() Future that can be used with tokio::select!
or similar, to wake up on shutdown.
Sync vs async
We use async to wait for incoming data on network connections, and to perform other long-running operations. For example, each WAL receiver connection is handled by a tokio Task. Once a piece of WAL has been received from the network, the task calls the blocking functions in the Repository to process the WAL.
The core storage code in layered_repository/ is synchronous, with
blocking locks and I/O calls. The current model is that we consider
disk I/Os to be short enough that we perform them while running in a
Tokio task. If that becomes a problem, we should use spawn_blocking
before entering the synchronous parts of the code, or switch to using
tokio I/O functions.
Be very careful when mixing sync and async code!