## Problem
With current approach for the base images in `Dockerfiles`, it's hard to
track when image is updated, and as they are base, than update will
invalidate all the layers, as base image changed.
That also becomes more complicated, as we have a number of runners, and
they may have different images with the tag `bookworm-slim`, so that
will lead to invalidate caches, when image build on one runner will be
used on another runners.
To fix that problem, we could pin our base images to the specific sha,
and that not only align images across runners, and also will allow us to
have reproducible build and don't depend on any spontaneous changes in
upstream.
Fix: https://github.com/neondatabase/cloud/issues/24084
## Summary of changes
Beside of the main goal, that PR also included some small changes around
Dockerfiles:
1. Main change: use `SHA` for `bookworm-slim` and `bullseye-slim` debian
images
2. For the layers requiring `curl` we could add `curl` and `unzip` to
the `build-deps` image, and use it as a base image for all the steps,
removing extra dependency on `alpine/curl`
3. added `retry-on-host-error=on` for the `wgetrc` as it happened to me:
fail to resolve hostname
Run "apt install" first, and only then COPY the files from the
intermediary build layers to the final image. This way, if you modify
any of the sources that trigger e.g. rebuilding compute_ctl, the "apt
install" step can still be cached.
I don't see the point of static linking, postgres itself and many of the
extensions are already built dynamically.
One reason for the change is that I'm working on bigger changes to start
using systemd in the compute, and as part of that I wanted to add the
--with-systemd configure option to pgbouncer, and there doesn't seem to
be a static version of libsystemd (at least not on Debian).
Building the compute rust binaries from scratch is pretty slow, it takes
between 4-15 minutes on my laptop, depending on which compiler flags and
other tricks I use. A cache mount allows caching the dependencies and
incremental builds, which speeds up rebuilding significantly when you
only makes a small change in a source file.
The awscli was downloaded at the last stages of the overall compute
image build, which meant that if you modified any part of the build, it
would trigger a re-download of the awscli. That's a bit annoying when
developing locally and rebuilding the compute image repeatedly. Move it
to a separate layer, to cache separately and to avoid the spurious
rebuilds.
The full build of all extensions takes a long time. When working locally
on parts that don't need extensions, you can iterate more quickly by
skipping the unnecessary extensions.
This adds a build argument to the dockerfile to specify extensions to
build. There are three options:
- EXTENSIONS=all (default)
- EXTENSIONS=minimal: Build only a few extensions that are listed in
shared_preload_libraries in the default neon config.
- EXTENSIONS=none: Build no extensions (except for the mandatory 'neon'
extension).
## Problem
Ref: https://github.com/neondatabase/cloud/issues/23461
and follow-up after: https://github.com/neondatabase/neon/pull/10553
we used `echo` to set-up `.wgetrc` and `.curlrc`, and there we used `\n`
to make these multiline configs with one echo command.
The problem is that Debian `/bin/sh`'s built-in echo command behaves
differently from the `/bin/echo` executable and from the `echo` built-in
in `bash`. Namely, it does not support the`-e` option, and while it does
treat `\n` as a newline, passing `-e` here will add that `-e` to the
output.
At the same time, when we use different base images, for example
`alpine/curl`, their `/bin/sh` supports and requires `-e` for treating
escape sequences like `\n`.
But having different `echo` and remembering difference in their
behaviour isn't best experience for the developer and makes bad
experience maintaining Dockerfiles.
Work-arounds:
- Explicitly use `/bin/bash` (like in this PR)
- Use `/bin/echo` instead of the shell's built-in echo function
- Use printf "foo\n" instead of echo -e "foo\n"
## Summary of changes
1. To fix that, we process with the option setting `/bin/bash` as a
SHELL for the debian-baysed layers
2. With no changes for `alpine/curl` based layers.
3. And one more change here: in `extensions` layer split to the 2 steps:
installing dependencies from `CPAN` and installing `lcov` from github,
so upgrading `lcov` could reuse previous layer with installed cpan
modules.
Refactor how extensions are built in compute Dockerfile
1. Rename some of the extension layers, so that names correspond more
precisely to the upstream repository name and the source directory
name. For example, instead of "pg-jsonschema-pg-build", spell it
"pg_jsonschema-build". Some of the layer names had the extra "pg-"
part, and some didn't; harmonize on not having it. And use an
underscore if the upstream project name uses an underscore.
2. Each extension now consists of two dockerfile targets:
[extension]-src and [extension]-build. By convention, the -src
target downloads the sources and applies any neon-specific patches
if necessary. The source tarball is downloaded and extracted under
/ext-src. For example, the 'pgvector' extension creates the
following files and directory:
/ext-src/pgvector.tar.gz # original tarball
/ext-src/pgvector.patch # neon-specific patch, copied from patches/ dir
/ext-src/pgvector-src/ # extracted tarball, with patch applied
This separation avoids re-downloading the sources every time the
extension is recompiled. The 'extension-tests' target also uses the
[extension]-src layers, by copying the /ext-src/ dirs from all
the extensions together into one image.
This refactoring came about when I was experimenting with different
ways of splitting up the Dockerfile so that each extension would be in
a separate file. That's not part of this PR yet, but this is a good
step in modularizing the extensions.
Ref: https://github.com/neondatabase/cloud/issues/23461
## Problem
Just made changes around and see these 2 base layers could be optimised.
and after review comment from @myrrc setting up timeouts and retries in
`alpine/curl` image
## Summary of changes
## Problem
Both these versions are binary compatible, but the way pgvector
structures the SQL files forbids installing 0.7.4 if you have a 0.8.0
distribution. Yet, some users may need a previous version for backward
compatibility, e.g., restoring the dump.
See this thread for discussion
https://neondb.slack.com/archives/C04DGM6SMTM/p1735911490242919?thread_ts=1731343604.259169&cid=C04DGM6SMTM
## Summary of changes
Put `vector--0.7.4.sql` file into compute image to allow installing this
version as well.
Tested on staging and it seems to be working as expected:
```sql
select * from pg_available_extensions where name = 'vector';
name | default_version | installed_version | comment
--------+-----------------+-------------------+------------------------------------------------------
vector | 0.8.0 | (null) | vector data type and ivfflat and hnsw access methods
create extension vector version '0.7.4';
select * from pg_available_extensions where name = 'vector';
name | default_version | installed_version | comment
--------+-----------------+-------------------+------------------------------------------------------
vector | 0.8.0 | 0.7.4 | vector data type and ivfflat and hnsw access methods
alter extension vector update;
select * from pg_available_extensions where name = 'vector';
name | default_version | installed_version | comment
--------+-----------------+-------------------+------------------------------------------------------
vector | 0.8.0 | 0.8.0 | vector data type and ivfflat and hnsw access methods
drop extension vector;
create extension vector;
select * from pg_available_extensions where name = 'vector';
name | default_version | installed_version | comment
--------+-----------------+-------------------+------------------------------------------------------
vector | 0.8.0 | 0.8.0 | vector data type and ivfflat and hnsw access methods
```
If we find out it's a good approach, we can adopt the same for other
extensions with a stable ABI -- support both `current` and `current - 1`
releases.
## Problem
We are removing the `pg_anon` v1 extension from Neon. So we don't need
to test it anymore and can remove the code for simplicity.
## Summary of changes
The code required for testing `pg_anon` is removed.
The extension now supports Postgres 17. The release also seems to be
binary compatible with the previous version.
Signed-off-by: Tristan Partin <tristan@neon.tech>
By setting PATH in the 'pg-build' layer, all the extension build layers
will inherit. No need to pass PG_CONFIG to all the various make
invocations either: once pg_config is in PATH, the Makefiles will pick
it up from there.
Generally ed25519 seems to be much preferred for cryptographic strength
to P256 nowadays, and it is NIST approved finally. We should use it
where we can as it's also faster than p256.
This PR makes the re-signed JWTs between local_proxy and pg_session_jwt
use ed25519.
This does introduce a new dependency on ed25519, but I do recall some
Neon Authorise customers asking for support for ed25519, so I am
justifying this dependency addition in the context that we can then
introduce support for customer ed25519 keys
sources:
* https://csrc.nist.gov/pubs/fips/186-5/final subsection 7 (EdDSA)
* https://datatracker.ietf.org/doc/html/rfc8037#section-3.1
There used to be some pg version dependencies in these extensions, but
now that there isn't, follow the simpler pattern used in other
extensions. No change in the produced images.
## Problem
Building local_proxy and compute_tools features the same dependency
tree, but as they are currently built in separate clean layers all that
progress is wasted. For our arm builds that's an extra 10 minutes.
## Summary of changes
Combines the compute_tools and local_proxy build layers.
## Problem
s5cmd doesn't pick up the pod service account
```
2024/12/16 16:26:01 Ignoring, HTTP credential provider invalid endpoint host, "169.254.170.23", only loopback hosts are allowed. <nil>
ERROR "ls s3://neon-dev-bulk-import-us-east-2/import-pgdata/fast-import/v1/br-wandering-hall-w2xobawv": NoCredentialProviders: no valid providers in chain. Deprecated. For verbose messaging see aws.Config.CredentialsChainVerboseErrors
```
## Summary of changes
Switch to offical CLI.
## Testing
Tested the pre-merge image in staging, using `job_image` override in
project settings.
https://neondb.slack.com/archives/C033RQ5SPDH/p1734554944391949?thread_ts=1734368383.258759&cid=C033RQ5SPDH
## Future Work
Switch back to s5cmd once https://github.com/peak/s5cmd/pull/769 gets
merged.
## Refs
- fixes https://github.com/neondatabase/cloud/issues/21876
---------
Co-authored-by: Gleb Novikov <NanoBjorn@users.noreply.github.com>
Don't build tests in h3 and rdkit: ~15 min speedup.
Use Ninja as cmake generator where possible: ~10 min speedup.
Clean apt cache for smaller images: around 250mb size loss for
intermediate layers
## Problem
The extensions for Postgres v17 are ready but we do not test the
extensions shipped with v17
## Summary of changes
Build the test image based on Postgres v17. Run the tests for v17.
---------
Co-authored-by: Anastasia Lubennikova <anastasia@neon.tech>
## Problem
Current compute images for Postgres 14-16 don't build on Debian 12
because of issues with extensions.
This PR fixes that, but for the current setup, it is mostly a no-op
change.
## Summary of changes
- Use `/bin/bash -euo pipefail` as SHELL to fail earlier
- Fix `plv8` build: backport a trivial patch for v8
- Fix `postgis` build: depend `sfgal` version on Debian version instead
of Postgres version
Tested in: https://github.com/neondatabase/neon/pull/9849
## Problem
We have a couple of CI workflows that still run on Debian Bullseye, and
the default Debian version in images is Bullseye as well (we explicitly
set building on Bookworm)
## Summary of changes
- Run `pgbench-pgvector` on Bookworm (fix a couple of packages)
- Run `trigger_bench_on_ec2_machine_in_eu_central_1` on Bookworm
- Change default `DEBIAN_VERSION` in Dockerfiles to Bookworm
- Make `pinned` docker tag an alias to `pinned-bookworm`
Co-authored-by: Heikki Linnakangas <heikki@neon.tech>
Co-authored-by: Stas Kelvic <stas@neon.tech>
# Context
This PR contains PoC-level changes for a product feature that allows
onboarding large databases into Neon without going through the regular
data path.
# Changes
This internal RFC provides all the context
* https://github.com/neondatabase/cloud/pull/19799
In the language of the RFC, this PR covers
* the Importer code (`fast_import`)
* all the Pageserver changes (mgmt API changes, flow implementation,
etc)
* a basic test for the Pageserver changes
# Reviewing
As acknowledged in the RFC, the code added in this PR is not ready for
general availability.
Also, the **architecture is not to be discussed in this PR**, but in the
RFC and associated Slack channel instead.
Reviewers of this PR should take that into consideration.
The quality bar to apply during review depends on what area of the code
is being reviewed:
* Importer code (`fast_import`): practically anything goes
* Core flow (`flow.rs`):
* Malicious input data must be expected and the existing threat models
apply.
* The code must not be safe to execute on *dedicated* Pageserver
instances:
* This means in particular that tenants *on other* Pageserver instances
must not be affected negatively wrt data confidentiality, integrity or
availability.
* Other code: the usual quality bar
* Pay special attention to correct use of gate guards, timeline
cancellation in all places during shutdown & migration, etc.
* Consider the broader system impact; if you find potentially
problematic interactions with Storage features that were not covered in
the RFC, bring that up during the review.
I recommend submitting three separate reviews, for the three high-level
areas with different quality bars.
# References
(Internal-only)
* refs https://github.com/neondatabase/cloud/issues/17507
* refs https://github.com/neondatabase/company_projects/issues/293
* refs https://github.com/neondatabase/company_projects/issues/309
* refs https://github.com/neondatabase/cloud/issues/20646
---------
Co-authored-by: Stas Kelvich <stas.kelvich@gmail.com>
Co-authored-by: Heikki Linnakangas <heikki@neon.tech>
Co-authored-by: John Spray <john@neon.tech>
This exporter logs an ERROR if a file called `postgres_exporter.yml` is
not located in its current working directory. We can silence it by
adding an empty config file and pointing the exporter at it.
Signed-off-by: Tristan Partin <tristan@neon.tech>