refactor(walreceiver): eliminate task_mgr usage (#7260)

rust/neon

mirror of https://github.com/neondatabase/neon.git synced 2026-01-05 20:42:54 +00:00

We want to move the code base away from task_mgr.

This PR refactors the walreceiver code such that it doesn't use
`task_mgr` anymore.

# Background

As a reminder, there are three tasks in a Timeline that's ingesting WAL.
`WalReceiverManager`, `WalReceiverConnectionHandler`, and
`WalReceiverConnectionPoller`.
See the documentation in `task_mgr.rs` for how they interact.

Before this PR, cancellation was requested through
task_mgr::shutdown_token() and `TaskHandle::shutdown`.

Wait-for-task-finish was implemented using a mixture of
`task_mgr::shutdown_tasks` and `TaskHandle::shutdown`.

This drawing might help:

<img width="300" alt="image"
src="https://github.com/neondatabase/neon/assets/956573/b6be7ad6-ecb3-41d0-b410-ec85cb8d6d20">


# Changes

For cancellation, the entire WalReceiver task tree now has a
`child_token()` of `Timeline::cancel`. The `TaskHandle` no longer is a
cancellation root.
This means that `Timeline::cancel.cancel()` is propagated.

For wait-for-task-finish, all three tasks in the task tree hold the
`Timeline::gate` open until they exit.

The downside of using the `Timeline::gate` is that we can no longer wait
for just the walreceiver to shut down, which is particularly relevant
for `Timeline::flush_and_shutdown`.
Effectively, it means that we might ingest more WAL while the
`freeze_and_flush()` call is ongoing.

Also, drive-by-fix the assertiosn around task kinds in `wait_lsn`. The
check for `WalReceiverConnectionHandler` was ineffective because that
never was a task_mgr task, but a TaskHandle task. Refine the assertion
to check whether we would wait, and only fail in that case.

# Alternatives

I contemplated (ab-)using the `Gate` by having a separate `Gate` for
`struct WalReceiver`.
All the child tasks would use _that_ gate instead of `Timeline::gate`.
And `struct WalReceiver` itself would hold an `Option<GateGuard>` of the
`Timeline::gate`.
Then we could have a `WalReceiver::stop` function that closes the
WalReceiver's gate, then drops the `WalReceiver::Option<GateGuard>`.

However, such design would mean sharing the WalReceiver's `Gate` in an
`Arc`, which seems awkward.
A proper abstraction would be to make gates hierarchical, analogous to
CancellationToken.

In the end, @jcsp and I talked it over and we determined that it's not
worth the effort at this time.

# Refs

part of #7062

This commit is contained in:

Christian Schwarz

2024-04-03 12:28:04 +02:00

committed by

GitHub

parent bc05d7eb9c

commit 3de416a016

10 changed files with 174 additions and 98 deletions

									
										12

libs/utils/src/seqwait.rs
									
												View File
												
				@@ -182,6 +182,18 @@ where

				        }

				    }

				    /// Check if [`Self::wait_for`] or [`Self::wait_for_timeout`] would wait if called with `num`.

				    pub fn would_wait_for(&self, num: V) -> Result<(), V> {

				        let internal = self.internal.lock().unwrap();

				        let cnt = internal.current.cnt_value();

				        drop(internal);

				        if cnt >= num {

				            Ok(())

				        } else {

				            Err(cnt)

				        }

				    }

				    /// Register and return a channel that will be notified when a number arrives,

				    /// or None, if it has already arrived.

				    fn queue_for_wait(&self, num: V) -> Result<Option<Receiver<()>>, SeqWaitError> {

refactor(walreceiver): eliminate task_mgr usage (#7260)

12 libs/utils/src/seqwait.rs Unescape Escape View File

12

libs/utils/src/seqwait.rs

View File