This makes it possible for the compiler to validate that a match block
matched all PostgreSQL versions we support.
## Problem
We did not have a complete picture about which places we had to test
against PG versions, and what format these versions were: The full PG
version ID format (Major/minor/bugfix `MMmmbb`) as transfered in
protocol messages, or only the Major release version (`MM`). This meant
type confusion was rampant.
With this change, it becomes easier to develop new version-dependent
features, by making type and niche confusion impossible.
## Summary of changes
Every use of `pg_version` is now typed as either `PgVersionId` (u32,
valued in decimal `MMmmbb`) or PgMajorVersion (an enum, with a value for
every major version we support, serialized and stored like a u32 with
the value of that major version)
---------
Co-authored-by: Arpad Müller <arpad-m@users.noreply.github.com>
This improves `pagebench getpage-latest-lsn` gRPC support by:
* Using `page_api::Client`.
* Removing `--protocol`, and using the `page-server-connstring` scheme
instead.
* Adding `--compression` to enable zstd compression.
Introduce a separate `postgres_ffi_types` crate which contains a few
types and functions that were used in the API. `postgres_ffi_types` is a
much small crate than `postgres_ffi`, and it doesn't depend on bindgen
or the Postgres C headers.
Move NeonWalRecord and Value types to wal_decoder crate. They are only
used in the pageserver-safekeeper "ingest" API. The rest of the ingest
API types are defined in wal_decoder, so move these there as well.
## Problem
The gRPC page service should support compression.
Requires #12111.
Touches #11728.
Touches https://github.com/neondatabase/cloud/issues/25679.
## Summary of changes
Add support for gzip and zstd compression in the server, and a client
parameter to enable compression.
This will need further benchmarking under realistic network conditions.
## Problem
Base64 0.13 is outdated.
## Summary of changes
Update base64 to 0.22. Affects mostly proxy and proxy libs. Also upgrade
serde_with to remove another dep on base64 0.13 from dep tree.
neon_local's timeline import subcommand creates timelines manually, but
doesn't create them on the safekeepers. If a test then tries to open an
endpoint to read from the timeline, it will error in the new world with
`--timelines-onto-safekeepers`.
Therefore, if that flag is enabled, create the timelines on the
safekeepers.
Note that this import functionality is different from the fast import
feature (https://github.com/neondatabase/neon/issues/10188, #11801).
Part of #11670
As well as part of #11712
## Problem
The new gRPC page service protocol supports client-side batches. The
current libpq protocol only does best-effort server-side batching.
To compare these approaches, Pagebench should support submitting
contiguous page batches, similar to how Postgres will submit them (e.g.
with prefetches or vectored reads).
## Summary of changes
Add a `--batch-size` parameter specifying the size of contiguous page
batches. One batch counts as 1 RPS and 1 queue depth.
For the libpq protocol, a batch is submitted as individual requests and
we rely on the server to batch them for us. This will give a realistic
comparison of how these would be processed in the wild (e.g. when
Postgres sends 100 prefetch requests).
This patch also adds some basic validation of responses.
## Problem
The gRPC `page_api` domain types used smallvecs to avoid heap
allocations in the common case where a single page is requested.
However, this is pointless: the Protobuf types use a normal vec, and
converting a smallvec into a vec always causes a heap allocation anyway.
## Summary of changes
Use a normal `Vec` instead of a `SmallVec` in `page_api` domain types.
## Problem
We need gRPC support in Pagebench to benchmark the new gRPC Pageserver
implementation.
Touches #11728.
## Summary of changes
Adds a `Client` trait to make the client transport swappable, and a gRPC
client via a `--protocol grpc` parameter. This must also specify the
connstring with the gRPC port:
```
pagebench get-page-latest-lsn --protocol grpc --page-service-connstring grpc://localhost:51051
```
The client is implemented using the raw Tonic-generated gRPC client, to
minimize client overhead.
## Problem
The page service logic asserts that a tracing span is present with
tenant/timeline/shard IDs. An initial gRPC page service implementation
thus requires a tracing span.
Touches https://github.com/neondatabase/neon/issues/11728.
## Summary of changes
Adds an `ObservabilityLayer` middleware that generates a tracing span
and decorates it with IDs from the gRPC metadata.
This is a minimal implementation to address the tracing span assertion.
It will be extended with additional observability in later PRs.
## Problem
part of https://github.com/neondatabase/neon/issues/11813
## Summary of changes
* Integrate feature store with tenant structure.
* gc-compaction picks up the current strategy from the feature store.
* We only log them for now for testing purpose. They will not be used
until we have more patches to support different strategies defined in
PostHog.
* We don't support property-based evaulation for now; it will be
implemented later.
* Evaluating result of the feature flag is not cached -- it's not
efficient and cannot be used on hot path right now.
* We don't report the evaluation result back to PostHog right now.
I plan to enable it in staging once we get the patch merged.
---------
Signed-off-by: Alex Chi Z <chi@neon.tech>
## Problem
For the gRPC Pageserver API, we should convert the Protobuf types to
stricter, canonical Rust types.
Touches https://github.com/neondatabase/neon/issues/11728.
## Summary of changes
Adds Rust domain types that mirror the Protobuf types, with conversion
and validation.
## Problem
We want to expose the page service over gRPC, for use with the
communicator.
Requires #11995.
Touches #11728.
## Summary of changes
This patch wires up a gRPC server in the Pageserver, using Tonic. It
does not yet implement the actual page service.
* Adds `listen_grpc_addr` and `grpc_auth_type` config options (disabled
by default).
* Enables gRPC by default with `neon_local`.
* Stub implementation of `page_api.PageService`, returning unimplemented
errors.
* gRPC reflection service for use with e.g. `grpcurl`.
Subsequent PRs will implement the actual page service, including
authentication and observability.
Notably, TLS support is not yet implemented. Certificate reloading
requires us to reimplement the entire Tonic gRPC server.
## Problem
We're about to implement a gRPC interface for Pageserver. Let's upgrade
Tonic first, to avoid a more painful migration later. It's currently
only used by storage-broker.
Touches #11728.
## Summary of changes
Upgrade Tonic 0.12.3 → 0.13.1. Also opportunistically upgrade Prost
0.13.3 → 0.13.5. This transitively pulls in Indexmap 2.0.1 → 2.9.0, but
it doesn't appear to be used in any particularly critical code paths.
## Problem
Hitting max_client_conn from pgbouncer would lead to invalidation of the
conn info cache.
Customers would hit the limit on wake_compute.
## Summary of changes
`should_retry_wake_compute` detects this specific error from pgbouncer
as non-retriable,
meaning we won't try to wake up the compute again.
## Problem
See
Discussion:
https://neondb.slack.com/archives/C033RQ5SPDH/p1746645666075799
Issue: https://github.com/neondatabase/cloud/issues/28609
Relation size cache is not correctly updated at PS in case of replicas.
## Summary of changes
1. Have two caches for relation size in timeline:
`rel_size_primary_cache` and `rel_size_replica_cache`.
2. `rel_size_primary_cache` is actually what we have now. The only
difference is that it is not updated in `get_rel_size`, only by WAL
ingestion
3. `rel_size_replica_cache` has limited size (LruCache) and it's key is
`(Lsn,RelTag)` . It is updated in `get_rel_size`. Only strict LSN
matches are accepted as cache hit.
---------
Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>
## Problem
For the [communicator
project](https://github.com/neondatabase/company_projects/issues/352),
we want to move to gRPC for the page service protocol.
Touches #11728.
## Summary of changes
This patch adds an experimental gRPC Protobuf schema for the page
service. It is equivalent to the current page service, but with several
improvements, e.g.:
* Connection multiplexing.
* Reduced head-of-line blocking.
* Client-side batching.
* Explicit tenant shard routing.
* GetPage request classification (normal vs. prefetch).
* Explicit rate limiting ("slow down" response status).
The API is exposed as a new `pageserver/page_api` package. This is
separate from the `pageserver_api` package to reduce the dependency
footprint for the communicator. The longer-term plan is to also split
out e.g. the WAL ingestion service to a separate gRPC package, e.g.
`pageserver/wal_api`.
Subsequent PRs will: add Rust domain types for the Protobuf types,
expose a gRPC server, and implement the page service.
Preliminary prototype benchmarks of this gRPC API is within 10% of
baseline libpq performance. We'll do further benchmarking and
optimization as the implementation lands in `main` and is deployed to
staging.
You still need to provide a max size up-front, but memory is only
allocated for the portion that is in use.
The module is currently unused, but will be used by the new compute
communicator project, in the neon Postgres extension. See
https://github.com/neondatabase/neon/issues/11729
---------
Co-authored-by: Erik Grinaker <erik@neon.tech>
There were some incompatible changes. Most churn was from switching from
the now-deprecated fcntl:flock() function to
fcntl::Flock::lock(). The new function returns a guard object, while
with the old function, the lock was associated directly with the file
descriptor.
It's good to stay up-to-date in general, but the impetus to do this now
is that in https://github.com/neondatabase/neon/pull/11929, I want to
use some functions that were added only in the latest version of 'nix',
and it's nice to not have to build multiple versions. (Although,
different versions of 'nix' are still pulled in as indirect dependencies
from other packages)
## Problem
Timeline imports do not have progress checkpointing. Any time that the
tenant is shut-down, all progress is lost
and the import restarts from the beginning when the tenant is
re-attached.
## Summary of changes
This PR adds progress checkpointing.
### Preliminaries
The **unit of work** is a `ChunkProcessingJob`. Each
`ChunkProcessingJob` deals with the import for a set of key ranges. The
job split is done by using an estimation of how many pages each job will
produce.
The planning stage must be **pure**: given a fixed set of contents in
the import bucket, it will always yield the same plan. This property is
enforced by checking that the hash of the plan is identical when
resuming from a checkpoint.
The storage controller tracks the progress of each shard in the import
in the database in the form of the **latest
job** that has has completed.
### Flow
This is the high level flow for the happy path:
1. On the first run of the import task, the import task queries storcon
for the progress and sees that none is recorded.
2. Execute the preparatory stage of the import
3. Import jobs start running concurrently in a `FuturesOrdered`. Every
time the checkpointing threshold of jobs has been reached, notify the
storage controller.
4. Tenant is detached and re-attached
5. Import task starts up again and gets the latest progress checkpoint
from the storage controller in the form of a job index.
6. The plan is computed again and we check that the hash matches with
the original plan.
7. Jobs are spawned from where the previous import task left off. Note
that we will not report progress after the completion of each job, so
some jobs might run twice.
Closes https://github.com/neondatabase/neon/issues/11568
Closes https://github.com/neondatabase/neon/issues/11664
## Problem
part of https://github.com/neondatabase/neon/issues/11813
## Summary of changes
Add a lite PostHog client that only uses the local flag evaluation
functionality. Added a test case that parses an example feature flag and
gets the evaluation result.
TODO: support boolean flag, remote config; implement all operators in
PostHog.
---------
Signed-off-by: Alex Chi Z <chi@neon.tech>
## Problem
Read replicas cannot grant permissions for roles for Neon RLS. Usually
the permission is already granted, so we can optimistically check. See
INC-509
## Summary of changes
Perform a permission lookup prior to actually executing any grants.
Add `/lfc/(prewarm|offload)` routes to `compute_ctl` which interact with
endpoint storage.
Add `prewarm_lfc_on_startup` spec option which, if enabled, downloads
LFC prewarm data on compute startup.
Resolves: https://github.com/neondatabase/cloud/issues/26343
## Problem
Broker supports only HTTP, no HTTPS
- Closes: https://github.com/neondatabase/cloud/issues/27492
## Summary of changes
- Add `listen_https_addr`, `ssl_key_file`, `ssl_cert_file`,
`ssl_cert_reload_period` arguments to storage broker
- Make `listen_addr` argument optional
- Listen https in storage broker
- Support https for storage broker request in neon_local
- Add `use_https_storage_broker_api` option to NeonEnvBuilder
## Problem
It seems are production-ready cert-manager setup now includes a full
certificate chain. This was not accounted for and the decoder would
error.
## Summary of changes
Change the way we decode certificates to support cert-chains, ignoring
all but the first cert.
This also changes a log line to not use multi-line errors.
~~I have tested this code manually against real certificates/keys, I
didn't want to embed those in a test just yet, not until the cert
expires in 24 hours.~~
# Problem
The Pageserver read path exclusively uses direct IO if
`virtual_file_io_mode=direct`.
The write path is half-finished. Here is what the various writing
components use:
|what|buffering|flags on <br/>`v_f_io_mode`<br/>=`buffered`|flags on
<br/>`virtual_file_io_mode`<br/>=`direct`|
|-|-|-|-|
|`DeltaLayerWriter`| BlobWriter<BUFFERED=true> | () | () |
|`ImageLayerWriter`| BlobWriter<BUFFERED=false> | () | () |
|`download_layer_file`|BufferedWriter|()|()|
|`InMemoryLayer`|BufferedWriter|()|O_DIRECT|
The vehicle towards direct IO support is `BufferedWriter` which
- largely takes care of O_DIRECT alignment & size-multiple requirements
- double-buffering to mask latency
`DeltaLayerWriter`, `ImageLayerWriter` use `blob_io::BlobWriter` , which
has neither of these.
# Changes
## High-Level
At a high-level this PR makes the following primary changes:
- switch the two layer writer types to use `BufferedWriter` & make
sensitive to `virtual_file_io_mode` (via open_with_options_**v2**)
- make `download_layer_file` sensitive to `virtual_file_io_mode` (also
via open_with_options_**v2**)
- add `virtual_file_io_mode=direct-rw` as a feature gate
- we're hackish-ly piggybacking on OpenOptions's ask for write access
here
- this means with just `=direct` InMemoryLayer reads and writes no
longer uses O_DIRECT
- this is transitory and we'll remove the `direct-rw` variant once the
rollout is complete
(The `_v2` APIs for opening / creating VirtualFile are those that are
sensitive to `virtual_file_io_mode`)
The result is:
|what|uses <br/>`BufferedWriter`|flags on
<br/>`v_f_io_mode`<br/>=`buffered`|flags on
<br/>`v_f_io_mode`<br/>=`direct`|flags on
<br/>`v_f_io_mode`<br/>=`direct-rw`|
|-|-|-|-|-|
|`DeltaLayerWriter`| ~~Blob~~BufferedWriter | () | () | O_DIRECT |
|`ImageLayerWriter`| ~~Blob~~BufferedWriter | () | () | O_DIRECT |
|`download_layer_file`|BufferedWriter|()|()|O_DIRECT|
|`InMemoryLayer`|BufferedWriter|()|~~O_DIRECT~~()|O_DIRECT|
## Code-Level
The main change is:
- Switch `blob_io::BlobWriter` away from its own buffering method to use
`BufferedWriter`.
Additional prep for upholding `O_DIRECT` requirements:
- Layer writer `finish()` methods switched to use IoBufferMut for
guaranteed buffer address alignment. The size of the buffers is PAGE_SZ
and thereby implicitly assumed to fulfill O_DIRECT requirements.
For the hacky feature-gating via `=direct-rw`:
- Track `OpenOptions::write(true|false)` in a field; bunch of mechanical
churn.
- Consolidate the APIs in which we "open" or "create" VirtualFile for
better overview over which parts of the code use the `_v2` APIs.
Necessary refactorings & infra work:
- Add doc comments explaining how BufferedWriter ensures that writes are
compliant with O_DIRECT alignment & size constraints. This isn't new,
but should be spelled out.
- Add the concept of shutdown modes to `BufferedWriter::shutdown` to
make writer shutdown adhere to these constraints.
- The `PadThenTruncate` mode might not be necessary in practice because
I believe all layer files ever written are sized in multiples `PAGE_SZ`
and since `PAGE_SZ` is larger than the current alignment requirements
(512/4k depending on platform), it won't be necesary to pad.
- Some test (I believe `round_trip_test_compressed`?) required it though
- [ ] TODO: decide if we want to accept that complexity; if we do then
address TODO in the code to separate alignment requirement from buffer
capacity
- Add `set_len` (=`ftruncate`) VirtualFile operation to support the
above.
- Allow `BufferedWriter` to start at a non-zero offset (to make room for
the summary block).
Cleanups unlocked by this change:
- Remove non-positional APIs from VirtualFile (e.g. seek, write_full,
read_full)
Drive-by fixes:
- PR https://github.com/neondatabase/neon/pull/11585 aimed to run unit
tests for all `virtual_file_io_mode` combinations but didn't because of
a missing `_` in the env var.
# Performance
This section assesses this PR's impact on deployments with current
production setting (`=direct`) and anticipated impact of switching to
(`=direct-rw`).
For `DeltaLayerWriter`, `=direct` should remain unchanged to slightly
improved on throughput because the `BlobWriter`'s buffer had the same
size as the `BufferedWriter`'s buffer, but it didn't have the
double-buffering that `BufferedWriter` has.
The `=direct-rw` enables direct IO; throughput should not be suffering
because of double-buffering; benchmarks will show if this is true.
The `ImageLayerWriter` was previously not doing any buffering
(`BUFFERED=false`).
It went straight to issuing the IO operation to the underlying
VirtualFile and the buffering was done by the kernel.
The switch to `BufferedWriter` under `=direct` adds an additional memcpy
into the BufferedWriter's buffer.
We will win back that memcpy when enabling direct IO via `=direct-rw`.
A nice win from the switch to `BufferedWriter` is that ImageLayerWriter
performs >=16x fewer write operations to VirtualFile (the BlobWriter
performs one write per len field and one write per image value).
This should save low tens of microseconds of CPU overhead from doing all
these syscalls/io_uring operations, regardless of `=direct` or
`=direct-rw`.
Aside from problems with alignment, this write frequency without
double-buffering is prohibitive if we actually have to wait for the
disk, which is what will happen when we enable direct IO via
(`=direct-rw`).
Throughput should not be suffering because of BufferedWrite's
double-buffering; benchmarks will show if this is true.
`InMemoryLayer` at `=direct` will flip back to using buffered IO but
remain on BufferedWriter.
The buffered IO adds back one memcpy of CPU overhead.
Throughput should not suffer and will might improve on
not-memory-pressured Pageservers but let's remember that we're doing the
whole direct IO thing to eliminate global memory pressure as a source of
perf variability.
## bench_ingest
I reran `bench_ingest` on `im4gn.2xlarge` and `Hetzner AX102`.
Use `git diff` with `--word-diff` or similar to see the change.
General guidance on interpretation:
- immediate production impact of this PR without production config
change can be gauged by comparing the same `io_mode=Direct`
- end state of production switched over to `io_mode=DirectRw` can be
gauged by comparing old results' `io_mode=Direct` to new results'
`io_mode=DirectRw`
Given above guidance, on `im4gn.2xlarge`
- immediate impact is a significant improvement in all cases
- end state after switching has same significant improvements in all
cases
- ... except `ingest/io_mode=DirectRw volume_mib=128 key_size_bytes=8192
key_layout=Sequential write_delta=Yes` which only achieves `238 MiB/s`
instead of `253.43 MiB/s`
- this is a 6% degradation
- this workload is typical for image layer creation
# Refs
- epic https://github.com/neondatabase/neon/issues/9868
- stacked atop
- preliminary refactor https://github.com/neondatabase/neon/pull/11549
- bench_ingest overhaul https://github.com/neondatabase/neon/pull/11667
- derived from https://github.com/neondatabase/neon/pull/10063
Co-authored-by: Yuchen Liang <yuchen@neon.tech>
* Replace yanked papaya version
* Remove unused allowed license: OpenSSL
* Remove Zlib license from general allow list since it's listed in the
exceptions section per crate
* Drop clarification for ring since they have separate LICENSE files now
* List the tower-otel repo as allowed source while we sort out the OTel
deps
Switches the tenant snapshot subcommand of the storage scrubber to
`remote_storage`. As this is the last piece of the storage scrubber
still using the S3 SDK, this finishes the project started in #7547.
This allows us to do tenant snapshots on Azure as well.
Builds on #11671Fixes#8830
Update the sentry crate to 0.37. This deduplicates the `webpki-roots`
crate in our crate graph, and brings another dependency onto newer
rustls `0.23.18`.
## Problem
Pageservers and safakeepers do not pass CA certificates to broker
client, so the client do not trust locally issued certificates.
- Part of https://github.com/neondatabase/cloud/issues/27492
## Summary of changes
- Change `ssl_ca_certs` type in PS/SK's config to `Pem` which may be
converted to both `reqwest` and `tonic` certificates.
- Pass CA certificates to storage broker client in PS and SK
Finally figured out the right incantation. I had had this in my original
go, but due to some refactoring and apparently missed testing, I
committed a mistake. The reason this doesn't currently break anything is
that we bypass the authorization middleware when the "testing" cargo
feature is enabled.
Signed-off-by: Tristan Partin <tristan@neon.tech>
## Problem
We need to export some metrics about certs/connections to configure
alerts and make sure that all HTTP requests are gone before turning
https-only mode on.
- Closes: https://github.com/neondatabase/cloud/issues/25526
## Summary of changes
- Add started connection and connection error metrics to http/https
Server.
- Add certificate expiration time and reload metrics to
ReloadingCertificateResolver.
Service targeted for storing and retrieving LFC prewarm data.
Can be used for proxying S3 access for Postgres extensions like
pg_mooncake as well.
Requests must include a Bearer JWT token.
Token is validated using a pemfile (should be passed in infra/).
Note: app is not tolerant to extra trailing slashes, see app.rs
`delete_prefix` test for comments.
Resolves: https://github.com/neondatabase/cloud/issues/26342
Unrelated changes: gate a `rename_noreplace` feature and disable it in
`remote_storage` so as `object_storage` can be built with musl
Based on https://github.com/neondatabase/neon/pull/11139
## Problem
We want to export performance traces from the pageserver in OTEL format.
End goal is to see them in Grafana.
## Summary of changes
https://github.com/neondatabase/neon/pull/11139 introduces the
infrastructure required to run the otel collector alongside the
pageserver.
### Design
Requirements:
1. We'd like to avoid implementing our own performance tracing stack if
possible and use the `tracing` crate if possible.
2. Ideally, we'd like zero overhead of a sampling rate of zero and be a
be able to change the tracing config for a tenant on the fly.
3. We should leave the current span hierarchy intact. This includes
adding perf traces without modifying existing tracing.
To satisfy (3) (and (2) in part) a separate span hierarchy is used.
`RequestContext` gains an optional `perf_span` member
that's only set when the request was chosen by sampling. All perf span
related methods added to `RequestContext` are no-ops for requests that
are not sampled.
This on its own is not enough for (3), so performance spans use a
separate tracing subscriber. The `tracing` crate doesn't have great
support for this, so there's a fair amount of boilerplate to override
the subscriber at all points of the perf span lifecycle.
### Perf Impact
[Periodic
pagebench](https://neonprod.grafana.net/d/ddqtbfykfqfi8d/e904990?orgId=1&from=2025-02-08T14:15:59.362Z&to=2025-03-10T14:15:59.362Z&timezone=utc)
shows no statistically significant regression with a sample ratio of 0.
There's an annotation on the dashboard on 2025-03-06.
### Overview of changes:
1. Clean up the `RequestContext` API a bit. Namely, get rid of the
`RequestContext::extend` API and use the builder instead.
2. Add pageserver level configs for tracing: sampling ratio, otel
endpoint, etc.
3. Introduce some perf span tracking utilities and expose them via
`RequestContext`. We add a `tracing::Span` wrapper to be used for perf
spans and a `tracing::Instrumented` equivalent for it. See doc comments
for reason.
4. Set up OTEL tracing infra according to configuration. A separate
runtime is used for the collector.
5. Add perf traces to the read path.
## Refs
- epic https://github.com/neondatabase/neon/issues/9873
---------
Co-authored-by: Christian Schwarz <christian@neon.tech>
In sqlstate, we have a manual `phf` construction, which is not
explicitly guaranteed to be stable - you're intended to use a build.rs
or the macro to make sure it's constructed correctly each time. This was
inherited from tokio-postgres upstream, which has the same issue
(https://github.com/rust-phf/rust-phf/pull/321#issuecomment-2724521193).
We don't need this encoding of sqlstate, so I've switched it to simply
parse 5 bytes
(https://www.postgresql.org/docs/current/errcodes-appendix.html).
While here, I switched out log for tracing.
## Problem
SSL certs are loaded only during start up. It doesn't allow the rotation
of short-lived certificates without server restart.
- Closes: https://github.com/neondatabase/cloud/issues/25525
## Summary of changes
- Implement `ReloadingCertificateResolver` which reloads certificates
from disk periodically.
## Problem
Pageservers use unencrypted HTTP requests for storage controller API.
- Closes: https://github.com/neondatabase/cloud/issues/25524
## Summary of changes
- Replace hyper0::server::Server with http_utils::server::Server in
storage controller.
- Add HTTPS handler for storage controller API.
- Support `ssl_ca_file` in pageserver.
Both crates seem well maintained. x509-cert is part of the high quality
RustCrypto project that we already make heavy use of, and I think it
makes sense to reduce the dependencies where possible.
Closes: https://github.com/neondatabase/cloud/issues/22998
If control-plane reports that TLS should be used, load the certificates
(and watch for updates), make sure postgres use them, and detects
updates.
Procedure:
1. Load certificates
2. Reconfigure postgres/pgbouncer
3. Loop on a timer until certificates have loaded
4. Go to 1
Notes:
1. We only run this procedure if requested on startup by control plane.
2. We needed to compile pgbouncer with openssl enabled
3. Postgres doesn't allow tls keys to be globally accessible - must be
read only to the postgres user. I couldn't convince the autoscaling team
to let me put this logic into the VM settings, so instead compute_ctl
will copy the keys to be read-only by postgres.
4. To mitigate a race condition, we also verify that the key matches the
cert.