Up to this point, all of our migrations have run on the catalog, which
is shared across Postgres databases, so we have tracked migrations in
the "postgres" database.
With the release of Postgres versions 14.12, 15.7, and 16.3, a CVE was
disclosed for all clusters created prior to these latest point releases.
The fix for the CVE is a SQL script that must run in every database in
a cluster, including template0 and template1.
This presents a little bit of a problem with the way we run migrations.
We have a neon_migration.migration_id table which has one row that marks
the last migration that was ran. That table is stored in the postgres
database.
Running this migration isn't transactional. A typical migration is of
the form:
BEGIN
-- Run migration
COMMIT
But transactions are not cluster-wide. _A_ solution to this is to run
the fix on every database that isn't the "postgres" database, and then
after all of those transactions are successful, "commit" that we've ran
the migration into the neon_migration.migration_id table of the
"postgres" database.
In addition, we have to pay attention to the connectability and validity
of the databases when running per-database migration. We can skip invalid
databases (pg_database.datconnectivity = -2), but we need to adjust
ALLOW_CONNECTIONS for a database, and then reset it back.
This is preparatory work for the next commit.
Link: https://www.postgresql.org/support/security/CVE-2024-4317/
Signed-off-by: Tristan Partin <tristan@neon.tech>