This does the postgres & rust builds, caching the results, and preserves
its outputs in a "workspace" for downstream test jobs (which can run in
parallel).
Pytest jobs are parameterized, so adding new pytest-based tests requires
only adding a new job to the "workflows" section at the end.
This could use some optimization:
- The "apt-get install" step is quite slow.
- The rust build step will always happen, even if only unrelated changes
are present (e.g. modified a python test file)
- Saving/restoring the rust cache (/target) is very slow (it contains
1.3GB of data)
- Saving the workspace is very slow.
- The "install" step is ugly; postgres and rust artifacts could take a
much better form.
Use pytest to manage background services, paths, and environment
variables.
Benefits:
- Tests are a little easier to write.
- Cleanup is more reliable. You can CTRL-C a test and it will still shut
down gracefully. If you manually start a conflicting process, the test
fixtures will detect this and abort at startup.
- Don't need to worry about remembering '--test-threads=1'
- Output of sub-processes can be captured to files.
- Test fixtures configure everything to operate under a single test
output directory, making it easier to capture logs in CI.
- Detects all the necessary paths if run from the git root, but can also
run from arbitrary paths by setting environment variables.
There is also a deliberately broken test (test_broken.py) that can be
used to test whether the test fixtures properly clean up after
themselves. It won't run by default; the comment at the top explains how
to enable it.
Remove the check that enforces running from the git root directory.
Discover the zenith binary path from current_exe().
Look for postgres in $POSTGRES_BIN or $CWD/tmp_install.
Previously, 'zenith init' would initialize a PostgreSQL cluster with
"initdb -D tmp", creating the temp cluster under current directory.
It moves the 'tmp' directory under the correct snapshot directory in
the zenith repository after that, but if something goes wrong in initdb,
or in the steps that follow, it could leave behind the 'tmp' directory
under current dir. Better to create the temporary directory under the
repository directory to begin with, as ".zenith/tmp".
- Move notifications to a separate job, run only on push.
- Build and test will execute on [pull_request, push].
- Use actions-rs/toolchain@v1 to get the rust toolchain.
- Add matrix hook to allow multiple toolchain versions in the future
(now set to [stable]).
- Run all the cargo tests, not just test_pageserver
Make the caller of request_redo() responsible for gathering the WAL records
to redo, and for storing the reconstructed page image back in the page
cache. This leaves the WAL redo manager purely responsible for dealing with
the postgres child process, removing its dependency on the PageCache.
Having multiple copies of the same values is a source of confusion.
Commit da9bf5dc63 fixed one race condition caused by that, for example.
See also discussion at
https://github.com/zenithdb/zenith/issues/57#issuecomment-824393470
This changes SeqWait.advance() to return the old number, and not panic if
you try to move the value backwards. The caller should check for that and
act accordingly.
Remove 'async' usage a much as feasible. Async code is harder to debug,
and mixing async and non-async code is a recipe for confusion and bugs.
There are a couple of exceptions:
- The code in walredo.rs, which needs to read and write to the child
process simultaneously, still uses async. It's more convenient there.
The 'async' usage is carefully limited to just the functions that
communicate with the child process.
- Code in walreceiver.rs that uses tokio-postgres to do streaming
replication. We have to use async there, because tokio-postgres is
async. Most rust-postgres functionality has non-async wrappers, but
not the new replication client code. The async usage is very limited
here, too: we use just block_on to call the tokio-postgres functions.
The code in 'page_service.rs' now launches a dedicated thread for each
connection.
This replaces tokio::sync:⌚:channel with std::sync:mpsc in
'seqwait.rs', to make that non-async. It's not a drop-in replacement,
though: std::sync::mpsc doesn't support multiple consumers, so we cannot
share a channel between multiple waiters. So this removes the code to
check if an existing channel can be reused, and creates a new one for
each waiter. That created another problem: BTreeMap cannot hold
duplicates, so I replaced that with BinaryHeap.
Similarly, the tokio::{mpsc, oneshot} channels used between WAL redo
manager and PageCache are replaced with std::sync::mpsc. (There is no
separate 'oneshot' channel in the standard library.)
Fixes github issue #58, and coincidentally also issue #66.