mirror of
https://github.com/neondatabase/neon.git
synced 2026-01-15 01:12:56 +00:00
add RFC
This commit is contained in:
139
docs/rfcs/033-pageserver-postgres-bundle.md
Normal file
139
docs/rfcs/033-pageserver-postgres-bundle.md
Normal file
@@ -0,0 +1,139 @@
|
||||
# Postgres Bundle for Pageserver
|
||||
|
||||
Created on 2024-06-17
|
||||
|
||||
## Summary
|
||||
|
||||
This RFC defines the responsibilities of Compute and Storage team regarding the
|
||||
build & deployment of the Postgres code that Pageserver must run
|
||||
(`initdb`, `postgres --wal-redo`).
|
||||
|
||||
## Motivation
|
||||
|
||||
Pageserver has to run Postgres binaries to do its job, specifically
|
||||
|
||||
* `initdb`
|
||||
* `postgres --wal-redo` mode
|
||||
|
||||
Currently there is **no clear ownership** of
|
||||
* how these binaries are built
|
||||
* including, critically, dynamic linkage against other libraries such as `libicu`
|
||||
* what build of the binaries ends up running on Pageservers
|
||||
* how the binaries and runtime dependencies (e.g., shared libraries) are delivered to Pageservers
|
||||
|
||||
Further, these binaries have dependencies (e.g., libicu) which
|
||||
1. prevent the Storage team from switching Pageserver distro and/or version, and
|
||||
2. some dependencies impact compatibility between Storage and Compute (e.g., [libicu version impacts collation incompatibilty](https://github.com/neondatabase/neon/pull/8074))
|
||||
3. some dependencies can cause database corruption if updated carelessly (locale => libc)
|
||||
|
||||
## Why Is This Worth Solving
|
||||
|
||||
1. Clearly defined ownership generally boosts execution speed & bug triage.
|
||||
* Example for why execution speed matters: CVE in dependency => who takes care of patching & updating.
|
||||
2. Centralize understanding of risks involved with some dependencies.
|
||||
Currently, there is no team clearly responsible for assessing / tracking the risks. As a reminder from previous section, these are
|
||||
* runtime incompatibilities
|
||||
* database corruption
|
||||
|
||||
Also, it is an unlock for additional future value, see "Future Work" section.
|
||||
|
||||
## Impacted components (e.g. pageserver, safekeeper, console, etc)
|
||||
|
||||
Pageserver (neon.git)
|
||||
Compute (neon.git)
|
||||
Deployment process (aws.git)
|
||||
|
||||
## Design
|
||||
|
||||
The basic interface between Compute and Storage team is as follows:
|
||||
|
||||
* Compute team publishes a "bundle" of the binaries required by Pageserver
|
||||
* Storage team uses a pinned bundle in the Pageserver build process
|
||||
* Storage team code review is required to update the pinned version
|
||||
|
||||
The "bundle" provides an interface agreed upon by Compute and Storage teams to run
|
||||
* for each supported Postgres version at Neon (v14, v15, v16, ...)
|
||||
* the `initdb` process
|
||||
* behaving like a vanilla Postgres `initdb`
|
||||
* `postgres --wal-redo` mode process
|
||||
* following the walredo protocol specified elsewhere
|
||||
|
||||
The bundle is self-contained, i.e., it behaves the same way on any Linux system.
|
||||
The only ambient runtime dependency is the Linux kernel.
|
||||
The minimum Linux kernel version is 5.10.
|
||||
|
||||
### Variant 1: bundle = fully statically linked binaries
|
||||
The "bundle" is a tarball of fully statically linked binaries
|
||||
|
||||
```
|
||||
v14/initdb
|
||||
v14/postgres
|
||||
v15/initdb
|
||||
v15/postgres
|
||||
v16/initdb
|
||||
v16/postgres
|
||||
...
|
||||
```
|
||||
|
||||
The directory structure is part of the interface.
|
||||
|
||||
### Variant 2: bundle = chrooted directory
|
||||
|
||||
The "bundle" is a tarball that contains all sorts of files, plus a launcher script.
|
||||
|
||||
```
|
||||
LAUNCHER
|
||||
storage
|
||||
storage/does
|
||||
storage/does/not
|
||||
storage/does/not/care
|
||||
```
|
||||
|
||||
To launch `initdb` or `postgres --wal-redo`, the Pageserver does
|
||||
1. fork child process
|
||||
2. `chroot` into the extracted directory
|
||||
3. inside the chroot, run `/LAUNCHER VERSION PG_BINARY [FLAGS...]`
|
||||
4. The `LAUNCHER` script sets up library search paths, etc, and then `exec`s the correct binary
|
||||
|
||||
We acknowledge this is half-way reinventing OCI + linux containers.
|
||||
However, our needs are much simpler than what OCI & Docker provide.
|
||||
Specifically, we do not want Pageserver to be runtime-dependent on e.g. Docker as the launcher.
|
||||
|
||||
The `chroot` is to enforce that the "bundle" be self-contained.
|
||||
The special path `/inout` int he bundle is reserved, e.g., for `initdb` output.
|
||||
|
||||
### Variant 3: ???
|
||||
|
||||
Your design here, feedback welcome.
|
||||
|
||||
## Security implications
|
||||
|
||||
It's an improvement because a single team (Compute) will be responsible for runtime dependencies.
|
||||
|
||||
## Implementation & Rollout
|
||||
|
||||
Storage and Compute teams agree on a bundle definition.
|
||||
|
||||
Compute team changes their build process to produce both
|
||||
1. existing: compute image / vm compute image
|
||||
2. existing: pg_install tarball (currently built by `neon.git:Dockerfile`)
|
||||
2. new: the bundle
|
||||
|
||||
Storage makes `neon.git` Pageserver changes to support using bundle (behind feature flag).
|
||||
With feature flag disabled, existing `pg_install` tarball is used instead.
|
||||
|
||||
Storage & infra make `aws.git` changes to deploy bundle to pageservers, with feature flag disabled.
|
||||
|
||||
Storage team does gradual rollout.
|
||||
|
||||
Storage & infra teams remove support for `pg_install`, delete it from the nodes (experimentation in staging to ensure no hidden runtime deps!)
|
||||
|
||||
Compute team stops producing `pg_install` tarball.
|
||||
|
||||
|
||||
## Future Work
|
||||
|
||||
We know that we can easily make pageserver fully statically linked.
|
||||
Together with the self-contained "bundle" proposed above, Pageserver can then be deployed to different OSes.
|
||||
For example, we have been entertaining the idea of trying Amazon Linux instead of Debian for Pageserver.
|
||||
That experiment would be a lot simpler.
|
||||
Reference in New Issue
Block a user