A timeline ID is only guaranteed to be unique for a particular tenant,
so you need to use tenant ID + timeline ID as the key, rather than just
timeline ID.
The safekeeper currently makes the same assumption, and we should fix that
too, but this commit just addresses this one case in the page server.
In the passing, reorder some function arguments to be more consistent.
The walkeeper launch two threads for each connection, and uses a guard
object to remove entry from 'replicas' array, when finishes. But only
the background thread held onto the guard object, so if the background
thread finished before the other thread, the array entry would be
removed prematurely, which lead to panic in the check_stop_streaming()
call.
Fixes https://github.com/zenithdb/zenith/issues/1103
Hexalize zids there for better output; since Serde doesn't support several
formats for one struct, on-disk representation is changed as well, make
upgrade.rs cope with it.
to avoid a subtle race condition.
Without safekeeper, walreceiver reconnection can stuck,
because of IO deadlock between walsender auth and regular backend.
* Do not hold timelines lock during GC
refer #1087
* Add gc_cs mutex for preveting creation of new timelines during GC
* Make clippy happy
* Use Mutex<()> instead of Mutex<i32> for GC critical section
Introduce the concept of a "ZenithWalRecord", which can be a Postgres WAL
record that is replayed with the Postgres WAL redo process, or a built-in
type that is handled entirely by pageserver code.
Replace the special code to replay Postgres XACT commit/abort records
with new Zenith WAL records. A separate zenith WAL record is created for
each modified CLOG page. This allows removing the 'main_data_offset'
field from stored PostgreSQL WAL records, which saves some memory and
some disk space in delta layers.
Introduce zenith WAL records for updating bits in the visibility map.
Previously, when e.g. a heap insert cleared the VM bit, we duplicated the
heap insert WAL record for the affected VM page. That was very wasteful.
The heap WAL record could be massive, containing a full page image in
the worst case. This addresses github issue #941.
The first COPY generates about 230 MB of write I/O, but the second
COPY, after deleting most of the rows and vacuuming the rows away,
generates 370 MB of writes. Both COPYs insert the same amount of data,
so they should generate roughly the same amount of I/O. This commit
doesn't try to fix the issue, just adds a test case to demonstrate it.
Add a new 'checkpoint' command to the pageserver API. Previously,
we've used 'do_gc' for that, but many tests, including this new one,
really only want to perform a checkpoint and don't care about GC. For
now, I only used the command in the new test, though, and didn't
convert any existing tests to use it.
It creates busy loop if pageserver <-> safekeeper connection fails after it was
established (e.g. currently due to 'segment checkpoint not found' error on
pageserver).
Also wake up callmemaybe thread regularly once in recall_period regardless of
channel activity.
This patch allows to shutdown wal receiver when there are no messages
and wal receiver is blocked inside tokio-postgres. In this case it
cannot check the shutdown flag.
This patch switches to use async interface of tokio-postgres directly
without sync wrappers. It opens the possibility to use tokio::select!
between the phsycal_stream.next() and a shutdown channel readiness to
interrupt replication process.
Also this allows to shutdown only particular wal receiver without
using global shutdown_requested flag.
Currently it's included with minimal changes and lives aside of the main
workspace. Later we may re-use and combine common parts with zenith
control_plane.
This change is mostly needed to unify cloud deployment pipeline:
1.1. build compute-tools image
1.2. build compute-node image based on the freshly built compute-tools
2. build zenith image
So we can roll new compute image and new storage required by it to
operate properly. Also it becomes easier to test console against some
specific version of compute-node/-tools.
If safekeepers sync fast enough, callmemaybe thread may never make a call before receiving Unsubscribe request. This leads to the situation, when pageserver lacks data that exists on safekeepers.
Do it separately with SafekeeperPostgresCommand enum as a result. Since query is
always C string, switch postgres_backend process_query argument from Bytes to
&str.
Make passing ztli/ztenant id in safekeeper connection string optional; this is
needed for upcoming intra-safekeeper heartbeat cmd which is not bound to any
timeline.
Change meaning of lsns in HOT_STANDBY_FEEDBACK:
flush_lsn = disk_consistent_lsn,
apply_lsn = remote_consistent_lsn
Update compute node backpressure configuration respectively.
Update compute node configuration:
set 'synchronous_commit=remote_write' in setup without safekeepers.
This way compute node doesn't have to wait for data checkpoint on pageserver.
This doesn't guarantee data durability, but we only use this setup for tests, so it's fine.