This introduces new timeline field latest_gc_cutoff. It is updated
before each gc iteration. New check is added to branch_timelines to
prevent branch creation with start point less than latest_gc_cutoff.
Also this adds a check to get_page_at_lsn which asserts that lsn at
which the page is requested was not garbage collected. This check
currently is triggered for readonly nodes which are pinned to specific
lsn and because they are not tracked in pageserver garbage collection
can remove data that still might be referenced. This is a bug and will
be fixed separately.
Instead of having a lot of separate fixtures for setting up the page
server, the compute nodes, the safekeepers etc., have one big ZenithEnv
object that encapsulates the whole environment. Every test either uses
a shared "zenith_simple_env" fixture, which contains the default setup
of a pageserver with no authentication, and no safekeepers. Tests that
want to use safekeepers or authentication set up a custom test-specific
ZenithEnv fixture.
Gathering information about the whole environment into one object makes
some things simpler. For example, when a new compute node is created,
you no longer need to pass the 'wal_acceptors' connection string as
argument to the 'postgres.create_start' function. The 'create_start'
function fetches that information directly from the ZenithEnv object.
* Add yapf run to CircleCI
* Pin yapf version
* Enable `SPLIT_ALL_TOP_LEVEL_COMMA_SEPARATED_VALUES` setting
* Reformat all existing code with slight manual adjustments
* test_runner/README: note that yapf is forced
* Use logging in python tests
* Use f-strings for logs
* Don't log test output while running
* Use only pytest logging handler
* Add more info about pytest logging
After this, a layer's start bound is always defined to be inclusive, and
end bound exclusive.
For example, if you have a layer in the range 100-200, that layer can be
used for GetPage@LSN requests at LSN 100, 199, or anything in between.
But for LSN 200, you need to look at the next layer (if one exists).
This is one part of a fix for https://github.com/zenithdb/zenith/issues/517.
After this, the page server shouldn't create layers for the same segment
with the same LSN, which avoids the issue. However, the same thing would
still happen, if you managed to create layers with same start LSN again.
That could happen e.g. if you had two page servers running, or in some
weird crash/restart scenario, or due to bugs or features added later. The
next commit makes the layer map more robust, so that it tolerates that
situation without deleting wrong files.
Some dropped layers serve as tombstones for earlier layers and thus cannot be garbage collected.
Add new fields to GcResult for layers that are preserved as tombstones
Split each relish into fixed-sized 10 MB segments. Separate layers are
created for each segment. This reduces the write amplification if you
have a large relation and update only parts of it; the downside is
that you have a lot more files. The 10 MB is just a guess, we should
do some modeling and testing in the future to figure out the optimal
size.
Each segment tracks the size of the segment separately. To figure out
the total size of a relish, you need to loop through the segment to
find the highest segment that's in use. That's a bit inefficient, but
will do for now. We might want to add a cache or something later.
This replaces the RocksDB based implementation with an approach using
"snapshot files" on disk, and in-memory btreemaps to hold the recent
changes.
This make the repository implementation a configuration option. You can
choose 'layered' or 'rocksdb' with "zenith init --repository-format=<format>"
The unit tests have been refactored to exercise both implementations.
'layered' is now the default.
Push/pull is not implemented. The 'test_history_inmemory' test has been
commented out accordingly. It's not clear how we will implement that
functionality; probably by copying the snapshot files directly.