From 011cb519a2a50bdfbea233bf91ef20d392f00f51 Mon Sep 17 00:00:00 2001 From: Christian Schwarz Date: Mon, 17 Jun 2024 17:45:21 +0200 Subject: [PATCH] add RFC --- docs/rfcs/033-pageserver-postgres-bundle.md | 139 ++++++++++++++++++++ 1 file changed, 139 insertions(+) create mode 100644 docs/rfcs/033-pageserver-postgres-bundle.md diff --git a/docs/rfcs/033-pageserver-postgres-bundle.md b/docs/rfcs/033-pageserver-postgres-bundle.md new file mode 100644 index 0000000000..8524471be1 --- /dev/null +++ b/docs/rfcs/033-pageserver-postgres-bundle.md @@ -0,0 +1,139 @@ +# Postgres Bundle for Pageserver + +Created on 2024-06-17 + +## Summary + +This RFC defines the responsibilities of Compute and Storage team regarding the +build & deployment of the Postgres code that Pageserver must run +(`initdb`, `postgres --wal-redo`). + +## Motivation + +Pageserver has to run Postgres binaries to do its job, specifically + +* `initdb` +* `postgres --wal-redo` mode + +Currently there is **no clear ownership** of +* how these binaries are built + * including, critically, dynamic linkage against other libraries such as `libicu` +* what build of the binaries ends up running on Pageservers +* how the binaries and runtime dependencies (e.g., shared libraries) are delivered to Pageservers + +Further, these binaries have dependencies (e.g., libicu) which +1. prevent the Storage team from switching Pageserver distro and/or version, and +2. some dependencies impact compatibility between Storage and Compute (e.g., [libicu version impacts collation incompatibilty](https://github.com/neondatabase/neon/pull/8074)) +3. some dependencies can cause database corruption if updated carelessly (locale => libc) + +## Why Is This Worth Solving + +1. Clearly defined ownership generally boosts execution speed & bug triage. + * Example for why execution speed matters: CVE in dependency => who takes care of patching & updating. +2. Centralize understanding of risks involved with some dependencies. + Currently, there is no team clearly responsible for assessing / tracking the risks. As a reminder from previous section, these are + * runtime incompatibilities + * database corruption + +Also, it is an unlock for additional future value, see "Future Work" section. + +## Impacted components (e.g. pageserver, safekeeper, console, etc) + +Pageserver (neon.git) +Compute (neon.git) +Deployment process (aws.git) + +## Design + +The basic interface between Compute and Storage team is as follows: + +* Compute team publishes a "bundle" of the binaries required by Pageserver +* Storage team uses a pinned bundle in the Pageserver build process +* Storage team code review is required to update the pinned version + +The "bundle" provides an interface agreed upon by Compute and Storage teams to run +* for each supported Postgres version at Neon (v14, v15, v16, ...) + * the `initdb` process + * behaving like a vanilla Postgres `initdb` + * `postgres --wal-redo` mode process + * following the walredo protocol specified elsewhere + +The bundle is self-contained, i.e., it behaves the same way on any Linux system. +The only ambient runtime dependency is the Linux kernel. +The minimum Linux kernel version is 5.10. + +### Variant 1: bundle = fully statically linked binaries +The "bundle" is a tarball of fully statically linked binaries + +``` +v14/initdb +v14/postgres +v15/initdb +v15/postgres +v16/initdb +v16/postgres +... +``` + +The directory structure is part of the interface. + +### Variant 2: bundle = chrooted directory + +The "bundle" is a tarball that contains all sorts of files, plus a launcher script. + +``` +LAUNCHER +storage +storage/does +storage/does/not +storage/does/not/care +``` + +To launch `initdb` or `postgres --wal-redo`, the Pageserver does +1. fork child process +2. `chroot` into the extracted directory +3. inside the chroot, run `/LAUNCHER VERSION PG_BINARY [FLAGS...]` +4. The `LAUNCHER` script sets up library search paths, etc, and then `exec`s the correct binary + +We acknowledge this is half-way reinventing OCI + linux containers. +However, our needs are much simpler than what OCI & Docker provide. +Specifically, we do not want Pageserver to be runtime-dependent on e.g. Docker as the launcher. + +The `chroot` is to enforce that the "bundle" be self-contained. +The special path `/inout` int he bundle is reserved, e.g., for `initdb` output. + +### Variant 3: ??? + +Your design here, feedback welcome. + +## Security implications + +It's an improvement because a single team (Compute) will be responsible for runtime dependencies. + +## Implementation & Rollout + +Storage and Compute teams agree on a bundle definition. + +Compute team changes their build process to produce both +1. existing: compute image / vm compute image +2. existing: pg_install tarball (currently built by `neon.git:Dockerfile`) +2. new: the bundle + +Storage makes `neon.git` Pageserver changes to support using bundle (behind feature flag). +With feature flag disabled, existing `pg_install` tarball is used instead. + +Storage & infra make `aws.git` changes to deploy bundle to pageservers, with feature flag disabled. + +Storage team does gradual rollout. + +Storage & infra teams remove support for `pg_install`, delete it from the nodes (experimentation in staging to ensure no hidden runtime deps!) + +Compute team stops producing `pg_install` tarball. + + +## Future Work + +We know that we can easily make pageserver fully statically linked. +Together with the self-contained "bundle" proposed above, Pageserver can then be deployed to different OSes. +For example, we have been entertaining the idea of trying Amazon Linux instead of Debian for Pageserver. +That experiment would be a lot simpler.