Files
neon/libs/postgres_ffi
Heikki Linnakangas 62b1e07b0f Consume fewer XIDs when restarting primary
The pageserver tracks the latest XID seen in the WAL, in the nextXid
field in the "checkpoint" key-value pair. To reduce the churn on that
single storage key, it's not tracked exactly. Rather, when we advance
it, we always advance it to the next multiple of 1024 XIDs. That way,
we only need to insert a new checkpoint value to the storage every
1024 transactions.

However, read-only replicas now scan the WAL at startup, to find any
XIDs that haven't been explicitly aborted or committed, and treats
them as still in-progress (PR #7288). When we bump up the nextXid
counter by 1024, all those skipped XID look like in-progress XIDs to a
read replica. There's a limited amount of space for tracking
in-progress XIDs, so there's more cost ot skipping XIDs now. We had a
case in production where a read replica did not start up, because the
primary had gone through many restart cycles without writing any
running-xacts or checkpoint WAL records, and each restart added almost
1024 "orphaned" XIDs that had to be tracked as in-progress in the
replica. As soon as the primary writes a running-xacts or checkpoint
record, the orphaned XIDs can be removed from the in-progress XIDs
list and hte problem resolves, but if those recors are not written,
the orphaned XIDs just accumulate.

We should work harder to make sure that a running-xacts or checkpoint
record is written at primary startup or shutdown. But at the same
time, we can just make XID_CHECKPOINT_INTERVAL smaller, to consume
fewer XIDs in such scenarios. That means that we will generate more
versions of the checkpoint key-value pair in the storage, but we
haven't seen any problems with that so it's probably fine to go from
1024 to 128.
2024-07-05 19:33:42 +03:00
..
2023-09-12 15:11:32 +02:00

This module contains utilities for working with PostgreSQL file formats. It's a collection of structs that are auto-generated from the PostgreSQL header files using bindgen, and Rust functions to read and manipulate them.

There are also a bunch of constants in pg_constants.rs that are copied from various PostgreSQL headers, rather than auto-generated. They mostly should be auto-generated too, but that's a TODO.

The PostgreSQL on-disk file format is not portable across different CPU architectures and operating systems. It is also subject to change in each major PostgreSQL version. Currently, this module supports PostgreSQL v14, v15 and v16: bindings and code that depends on them are version-specific. This code is organized in modules postgres_ffi::v14, postgres_ffi::v15 and postgres_ffi::v16. Version independent code is explicitly exported into shared postgres_ffi.

TODO: Currently, there is also some code that deals with WAL records in pageserver/src/waldecoder.rs. That should be moved into this module. The rest of the codebase should not have intimate knowledge of PostgreSQL file formats or WAL layout, that knowledge should be encapsulated in this module.