We need checksums to verify data integrity, when we read it from
untrusted place (e.g. local disk) or via untrusted communication channel
(e.g. network). At the same time, we trust pageserver <-> redo process
communication channel, as it is just a pipe.
Here we enable calculation of data checksums in the wal redo process and
when we extract FPI during WAL injestion. Compute node (Postgres) will
verify checksum of every page after receiving it back from pageserver.
So it is pretty similar to how vanilla Postgres checks them.
There are two other places where we should verify checksums to
detect data corruption earlier:
- when we receive WAL records from safekeepers (already implemented,
see: WalStreamDecoder::poll_decode)
- when we write layer files to disk and read back in memory from local
disk or S3
* Actual generation logic is in a separate crate `postgres_ffi/wal_generate`
* The create also provides a binary for debug purposes akin to `initdb`
* Two tests currently fail and are ignored
* There is no easy way to test this directly in Safekeeper as it starts restoring from commit_lsn.
So testing would require disconnecting Safekeeper just after it has received the WAL,
but before it is committed.