If there isn't any version specified for a dependency crate, Cargo may
choose a newer version. This could happen when Cargo.lock is updated
("cargo update") but can also happen unexpectedly when adding or
changing other dependencies. This can allow API-breaking changes to be
picked up, breaking the build.
To prevent this, specify versions for all dependencies. Cargo is still
allowed to pick newer versions that are (hopefully) non-breaking, by
analyzing the semver version number.
There are two special cases here:
1. serde_derive::{Serialize, Deserialize} isn't really used any more. It
was only a separate crate in the past because of compiler limitations.
Nowadays, people turn on the "derive" feature of the serde crate and
use serde::{Serialize, Deserialize}.
2. parse_duration is unmaintained and has an open security issue. (gh
iss. 87) That issue probably isn't critical for us because of where we
use that crate, but it's probably still better to pin the version so we
can't get hit with an API-breaking change at an awkward time.
Make the caller of request_redo() responsible for gathering the WAL records
to redo, and for storing the reconstructed page image back in the page
cache. This leaves the WAL redo manager purely responsible for dealing with
the postgres child process, removing its dependency on the PageCache.
Having multiple copies of the same values is a source of confusion.
Commit da9bf5dc63 fixed one race condition caused by that, for example.
See also discussion at
https://github.com/zenithdb/zenith/issues/57#issuecomment-824393470
This changes SeqWait.advance() to return the old number, and not panic if
you try to move the value backwards. The caller should check for that and
act accordingly.
Remove 'async' usage a much as feasible. Async code is harder to debug,
and mixing async and non-async code is a recipe for confusion and bugs.
There are a couple of exceptions:
- The code in walredo.rs, which needs to read and write to the child
process simultaneously, still uses async. It's more convenient there.
The 'async' usage is carefully limited to just the functions that
communicate with the child process.
- Code in walreceiver.rs that uses tokio-postgres to do streaming
replication. We have to use async there, because tokio-postgres is
async. Most rust-postgres functionality has non-async wrappers, but
not the new replication client code. The async usage is very limited
here, too: we use just block_on to call the tokio-postgres functions.
The code in 'page_service.rs' now launches a dedicated thread for each
connection.
This replaces tokio::sync:⌚:channel with std::sync:mpsc in
'seqwait.rs', to make that non-async. It's not a drop-in replacement,
though: std::sync::mpsc doesn't support multiple consumers, so we cannot
share a channel between multiple waiters. So this removes the code to
check if an existing channel can be reused, and creates a new one for
each waiter. That created another problem: BTreeMap cannot hold
duplicates, so I replaced that with BinaryHeap.
Similarly, the tokio::{mpsc, oneshot} channels used between WAL redo
manager and PageCache are replaced with std::sync::mpsc. (There is no
separate 'oneshot' channel in the standard library.)
Fixes github issue #58, and coincidentally also issue #66.
AtomicLsn is a wrapper around AtomicU64 that has load() and store()
members that are cheap (on x86, anyway) and can be safely used in any
context.
This commit uses AtomicLsn in the page cache, and fixes up some
downstream code that manually implemented LSN formatting.
There's also a bugfix to the logging in wait_lsn, which prints the
wrong lsn value.
It was only marked as async because it calls relsize_get(), but
relsize_get() will in fact never block when it's called with the max
LSN value, like put_wal_record() does. Refactor to avoid marking
put_wal_record() as 'async'.
After the rocksdb patch (commit 6aa38d3f7d), the CacheEntry struct was
used only momentarily in the communication between the page_cache and
the walredo modules. It was in fact not stored in any cache anymore.
For clarity, refactor the communication.
There is now a WalRedoManager struct, with `request_redo` function,
that can be used to request WAL replay of a particular page. It sends
a request to a queue like before, but the queue has been replaced with
tokio::sync::mpsc. Previously, the resulting page image was stored
directly in the CacheEntry, and the requestor was notified using a
condition variable. Now, the requestor includes a 'oneshot' channel in
the request, and the WAL redo manager sends the response there.