Use GUC zenith.max_cluster_size to set the limit.
If limit is reached, extend requests will throw out-of-space error.
When current size is too close to the limit - throw a warning.
Add new test: test_timeline_size_quota.
to pass current_timeline_size to compute node
Put standby_status_update fields into ZenithFeedback and send them as one message.
Pass values sizes together with keys in ZenithFeedback message.
Currently it's included with minimal changes and lives aside of the main
workspace. Later we may re-use and combine common parts with zenith
control_plane.
This change is mostly needed to unify cloud deployment pipeline:
1.1. build compute-tools image
1.2. build compute-node image based on the freshly built compute-tools
2. build zenith image
So we can roll new compute image and new storage required by it to
operate properly. Also it becomes easier to test console against some
specific version of compute-node/-tools.
Change meaning of lsns in HOT_STANDBY_FEEDBACK:
flush_lsn = disk_consistent_lsn,
apply_lsn = remote_consistent_lsn
Update compute node backpressure configuration respectively.
Update compute node configuration:
set 'synchronous_commit=remote_write' in setup without safekeepers.
This way compute node doesn't have to wait for data checkpoint on pageserver.
This doesn't guarantee data durability, but we only use this setup for tests, so it's fine.
This is needed for implementation of tenant rebalancing. With this
change safekeeper becomes aware of which pageserver is supposed to be
used for replication from this particular compute.
write_lsn - The last LSN received and processed by pageserver's walreceiver.
flush_lsn - same as write_lsn. At pageserver it doesn't guarantees data persistence, but it's fine. We rely on safekeepers.
apply_lsn - The LSN at which pageserver guaranteed persistence of all received data (disk_consistent_lsn).
Persist full history of term switches on safekeepers instead of storing only the
single term of the highest entry (called epoch). This allows easily and
correctly find the divergence point of two logs and truncate the obsolete part
before overwriting it with entries of the newer proposer(s).
Full history of the proposer is transferred in separate message before proposer
starts streaming; it is immediately persisted by safekeeper, though he might not
yet have entries for some older terms there. That's because we can't atomically
append to WAL and update the control file anyway, so locally available WAL must
be taken into account when looking at the history.
We should sometimes purge term history entries beyond truncate_lsn; this is not
done here.
Per https://github.com/zenithdb/rfcs/pull/12Closes#296.
Bumps vendor/postgres.
Change 'zenith.signal' file to a human-readable format, similar to
backup_label. It can contain a "PREV LSN: %X/%X" line, or a special
value to indicate that it's OK to start with invalid LSN ('none'), or
that it's a read-only node and generating WAL is forbidden
('invalid').
The 'zenith pg create' and 'zenith pg start' commands now take a node
name parameter, separate from the branch name. If the node name is not
given, it defaults to the branch name, so this doesn't break existing
scripts.
If you pass "foo@<lsn>" as the branch name, a read-only node anchored
at that LSN is created. The anchoring is performed by setting the
'recovery_target_lsn' option in the postgresql.conf file, and putting
the server into standby mode with 'standby.signal'.
We no longer store the synthetic checkpoint record in the WAL segment.
The postgres startup code has been changed to use the copy of the
checkpoint record in the pg_control file, when starting in zenith
mode.
All the changes are in the vendor/postgres side. However, because we now
generate fewer Full Page Writes, the 'branch_behind' test needs to be
modified so that it still generates enough WAL to consume a few WAL
segments.
Postgres commit message:
PQgetCopyData can sometimes indicate that the copy is done if the
backend returns an error response. So while we still expect that the
walkeeper never sends CopyDone, we can't expect it to never produce
errors.
- Use different message formats for different kinds of response messages.
- Add an Error message, for passing errors from page server to Postgres.
Previously, we would respond to 'exists' request with 'false', and
to 'nblocks' request with 0, if an error happened. Fix those to return
an error message to the client. GetPage requests had a mechanism to
return an error, but it was just a flag with no error message.
- Add a flag to requests, to indicate that we actually want the latest
page version on the timeline, and the LSN is just a hint that we know
that there haven't been any modifications since that LSN. The flag isn't
used for anything yet, but I'm planning to use it to fix
https://github.com/zenithdb/zenith/issues/567
In a passing fix two minor issues with basabackup:
* check that we can't create branches with pre-initdb LSN's
* normalize branch LSN's that are pointing to the segment boundary
patch by @knizhnik
closes#506
There are two main reasons for that:
a) Latest unfinished record may disapper after compute node restart, so let's
try not leak volatile part of the WAL into the repository. Always use
last_valid_record instead.
That change requires different getPage@LSN logic in postgres -- we need
to ask LSN's that point to some complete record instead of GetFlushRecPtr()
that can point in the middle of the record. That was already done by @knizhnik
to deal with the same problem during the work on `postgres --sync-safekeepers`.
Postgres will use LSN's aligned on 0x8 boundary in get_page requests, so we
also need to be sure that last_valid_record is aligned.
b) Switch to get_last_record_lsn() in basebackup@no_lsn. When compute node
is running without safekeepers and streams WAL directly
to pageserver it is important to match basebackup LSN and LSN of replication
start. Before this commit basebackup@no_lsn was waiting for last_valid_lsn
and walreceiver started replication with last_record_lsn, which can be less.
So replication was failing since compute node doesn't have requested WAL.
Because the t_cid field was missing from the XlHeapDelete struct that
corresponds to the PostgreSQL xl_heap_delete struct, the check for the
XLH_DELETE_ALL_VISIBLE_CLEARED flag did not work correctly.
Decoding XlHeapUpdate struct was also missing the t_cid field, but that
didn't cause any immediate problems because in that struct, the t_cid
field is after all the fields that the page server cares about. But fix
that too, as it was an accident waiting to happen.
The bug was mostly hidden by the VM page handling in zenith_wallog_page,
where it forcibly generates a FPW record whenever a VM page is evicted:
else if (forknum == VISIBILITYMAP_FORKNUM && !RecoveryInProgress())
{
/*
* Always WAL-log vm.
* We should never miss clearing visibility map bits.
*
* TODO Is it too bad for performance?
* Hopefully we do not evict actively used vm too often.
*/
XLogRecPtr recptr;
recptr = log_newpage_copy(&reln->smgr_rnode.node, forknum, blocknum, buffer, false);
XLogFlush(recptr);
lsn = recptr;
But that was just hiding the issue: it's still visible if you had a
read-only node relying on the data in the page server, or you killed and
restarted the primary node, or you started a branch. In the included test
case, I used a new branch to expose this.
Fixes https://github.com/zenithdb/zenith/issues/461