It was only marked as async because it calls relsize_get(), but
relsize_get() will in fact never block when it's called with the max
LSN value, like put_wal_record() does. Refactor to avoid marking
put_wal_record() as 'async'.
After the rocksdb patch (commit 6aa38d3f7d), the CacheEntry struct was
used only momentarily in the communication between the page_cache and
the walredo modules. It was in fact not stored in any cache anymore.
For clarity, refactor the communication.
There is now a WalRedoManager struct, with `request_redo` function,
that can be used to request WAL replay of a particular page. It sends
a request to a queue like before, but the queue has been replaced with
tokio::sync::mpsc. Previously, the resulting page image was stored
directly in the CacheEntry, and the requestor was notified using a
condition variable. Now, the requestor includes a 'oneshot' channel in
the request, and the WAL redo manager sends the response there.
Clippy pointed out that `drop(waiters)` didn't do anything, because
there was a misplaced ";" causing `waiters` to be a unit type `()`.
This change makes it do what was intended: the lock should be dropped
first, then the wakeups should be processed.
The comment was incorrect, claiming that ZTimelineId is a 32-byte value.
It is actually 16 bytes wide. While we're at it, improve the comment,
explaining what a zenith timeline is, and why it's different from
PostgreSQL timelines.
When calling into the page cache, it was possible to wait on a blocking
mutex, which can stall the async executor.
Replace that sleep with a SeqWait::wait_for(lsn).await so that the
executor can go on with other work while we wait.
Change walreceiver_works to an AtomicBool to avoid the awkwardness of
taking the lock, then dropping it while we call wait_for and then
acquiring it again to do real work.
SeqWait adds a way to .await the arrival of some sequence number.
It provides wait_for(num) which is an async fn, and advance(num) which
is synchronous.
This should be useful in solving the page cache deadlocks, and may be
useful in other areas too.
This implementation still uses a Mutex internally, but only for a brief
critical section. If we find this code broadly useful and start to care
more about executor stalls due to unfair thread scheduling, there might
be ways to make it lock-free.
- remove needless return
- remove needless format!
- remove a few more needless clone()
- from_str_radix(_, 10) -> .parse()
- remove needless reference
- remove needless `mut`
Also manually replaced a match statement with map_err() because after
clippy was done with it, there was almost nothing left in the match
expression.
Replay XLOG_XACT_COMMIT and XLOG_XACT_ABORT records in walredo.
Don't wait for lsn catchup before walreceiver connected.
Use 'request_nonrel' branch of vendor/postgres