Compare commits

..

100 Commits

Author SHA1 Message Date
Conrad Ludgate
b66e545e26 a little more type-safety, a little more verbose... 2024-10-24 12:33:10 +01:00
Conrad Ludgate
c8108a4b84 make ComputeConnectBackend dyn 2024-10-24 11:55:31 +01:00
Conrad Ludgate
2d34fec39b minor changes 2024-10-24 11:48:43 +01:00
Conrad Ludgate
3da4705775 rename to serverless backend 2024-10-24 11:44:15 +01:00
Conrad Ludgate
80c5576816 proxy: continue streamlining auth::Backend 2024-10-24 11:43:46 +01:00
Christian Schwarz
6f34f97573 refactor(pageserver(load_remote_timeline)) remove dead code handling absence of IndexPart (#9408)
The code is dead at runtime since we're nowadays always running with
remote storage and treat it as the source of truth during attach.

Clean it up as a preliminary to
https://github.com/neondatabase/neon/pull/9218.

Related: https://github.com/neondatabase/neon/pull/9366
2024-10-24 09:00:22 +01:00
Tristan Partin
b86432c29e Fix buggy sizeof
A sizeof on a pointer on a 64 bit machine is 8 bytes whereas
Entry::old_name is a 64 byte array of characters. There was most likely
no fallout since the string would start with NUL bytes, but best to fix
nonetheless.

Signed-off-by: Tristan Partin <tristan@neon.tech>
2024-10-23 21:52:22 -06:00
Vlad Lazar
ac1205c14c pageserver: add metric for number of zeroed pages on rel extend (#9492)
## Problem

Filling the gap in with zeroes is annoying for sharded ingest. We are
not sure it even happens in reality.

## Summary of Changes

Add one global counter which tracks how many such gap blocks we filled
on relation extends. We can add more metrics once we understand the
scope.
2024-10-23 19:58:28 +01:00
John Spray
e3ff87ce3b tests: avoid using background_process when invoking pg_ctl (#9469)
## Problem

Occasionally, we get failures to start the storage controller's db with
errors like:
```
aborting due to panic at /__w/neon/neon/control_plane/src/background_process.rs:349:67:
claim pid file: lock file

Caused by:
    file is already locked
```
e.g.
https://neon-github-public-dev.s3.amazonaws.com/reports/pr-9428/11380574562/index.html#/testresult/1c68d413ea9ecd4a

This is happening in a stop,start cycle during a test. Presumably the
pidfile from the startup background process is still held at the point
we stop, because we let pg_ctl keep running in the background.

## Summary of changes

- Refactor pg_ctl invocations into a helper
- In the controller's `start` function, use pg_ctl & a wait loop for
pg_isready, instead of using background_process

---------

Co-authored-by: Arpad Müller <arpad-m@users.noreply.github.com>
2024-10-23 16:29:55 +00:00
Tristan Partin
0595320c87 Protect call to pg_current_wal_lsn() in retained_wal query
We can't call pg_current_wal_lsn() if we are a standby instance (read
replica). Any attempt to call this function while in recovery results
in:

ERROR:  recovery is in progress

Signed-off-by: Tristan Partin <tristan@neon.tech>
2024-10-23 09:55:00 -06:00
Folke Behrens
92d5e0e87a proxy: clear lib.rs of code items (#9479)
We keep lib.rs for crate configs, lint configs and re-exports for the binaries.
2024-10-23 08:21:28 +02:00
Arpad Müller
3a3bd34a28 Rename IndexPart::{from_s3_bytes,to_s3_bytes} (#9481)
We support multiple storage backends now, so remove the `_s3_` from the
name.

Analogous to the names adopted for tenant manifests added in #9444.
2024-10-23 00:34:24 +02:00
Alex Chi Z.
64949a37a9 fix(pageserver): make delta split layer writer finish atomic (#9048)
similar to https://github.com/neondatabase/neon/pull/8841, we make the
delta layer writer atomic when finishing the layers.

## Summary of changes

* `put_value` not taking discard fn anymore
* `finish` decides what layers to keep

---------

Signed-off-by: Alex Chi Z <chi@neon.tech>
2024-10-22 22:06:21 +00:00
Arpad Müller
6f8fcdf9ea Timeline offloading persistence (#9444)
Persist timeline offloaded state to S3.

Right now, as of #8907, at each restart of the pageserver, all offloaded
state is lost, so we load the full timeline again. As it starts with an
empty local directory, we might potentially download some files again,
leading to downloads that are ultimately wasteful.

This patch adds support for persisting the offloaded state, allowing us
to never load offloaded timelines in the first place. The persistence
feature is facilitated via a new file in S3 that is tenant-global, which
contains a list of all offloaded timelines. It is updated each time we
offload or unoffload a timeline, and otherwise never touched.

This choice means that tenants where no offloading is happening will not
immediately get a manifest, keeping the change very minimal at the
start.

We leave generation support for future work. It is important to support
generations, as in the worst case, the manifest might be overwritten by
an older generation after a timeline has been unoffloaded (and
unarchived), so the next pageserver process instantiation might wrongly
believe that some timeline is still offloaded even though it should be
active.

Part of #9386, #8088
2024-10-22 20:52:30 +00:00
Tristan Partin
fcb55a2aa2 Fix copy-paste error in checkpoints_timed metric
Importing the wrong metric. Sigh...

Signed-off-by: Tristan Partin <tristan@neon.tech>
2024-10-22 14:34:26 -06:00
a-masterov
f36cf3f885 Fix local errors for the tests with the versions mix (#9477)
## Problem
If the environment variables `COMPATIBILITY_NEON_BIN` or
`COMPATIBILITY_POSTGRES_DISTRIB_DIR` are not set (this is usual during a
local run), the tests with the versions mix cannot run.
## Summary of changes
If these variables are not set turn off the version mix.

---------

Co-authored-by: Alexander Bayandin <alexander@neon.tech>
2024-10-22 21:58:55 +02:00
John Spray
8dca188974 storage controller: add metrics for tenant shard, node count (#9475)
## Problem

Previously, figuring out how many tenant shards were managed by a
storage controller was typically done by peeking at the database or
calling into the API. A metric makes it easier to monitor, as
unexpectedly increasing shard counts can be indicative of problems
elsewhere in the system.

## Summary of changes

- Add metrics `storage_controller_pageserver_nodes` (updated on node
CRUD operations from Service) and `storage_controller_tenant_shards`
(updated RAII-style from TenantShard)
2024-10-22 19:43:02 +01:00
Tristan Partin
b7fa93f6b7 Use make's builtin RM variable
At least as far as removing individual files goes, this is the best
pattern for removing. I can't say the same for removing directories, but
I went ahead and changed those to `$(RM) -r` anyway.

Signed-off-by: Tristan Partin <tristan@neon.tech>
2024-10-22 09:14:29 -06:00
Arseny Sher
1e8e04bb2c safekeeper: refactor timeline initialization (#9362)
Always do timeline init through atomic rename of temp directory. Add
GlobalTimelines::load_temp_timeline which does this, and use it from
both pull_timeline and basic timeline creation. Fixes a collection
of issues:
- previously timeline creation didn't really flushed cfile to disk
  due to 'nothing to do if state didn't change' check;
- even if it did, without tmp dir it is possible to lose the cfile
  but leave timeline dir in place, making it look corrupted;
- tenant directory creation fsync was missing in timeline creation;
- pull_timeline is now protected from concurrent both itself and
  timeline creation;
- now global timelines map entry got special CreationInProgress
  entry type which prevents from anyone getting access to timeline
  while it is being created (previously one could get access to it,
  but it was locked during creation, which is valid but confusing if
  creation failed).

fixes #8927
2024-10-22 07:11:36 +01:00
David Gomes
94369af782 chore(compute): bumps pg_session_jwt to latest version (#9474) 2024-10-21 23:39:30 +00:00
Arpad Müller
34b6bd416a offloaded timeline list API (#9461)
Add a way to list the offloaded timelines.

Before, one had to look at logs to figure out if a timeline has been
offloaded or not, or use the non-presence of a certain timeline in the
list of normal timelines. Now, one can list them directly.
 
Part of #8088
2024-10-21 16:33:05 +01:00
Yuchen Liang
49d5e56c08 pageserver: use direct IO for delta and image layer reads (#9326)
Part of #8130 

## Problem

Pageserver previously goes through the kernel page cache for all the
IOs. The kernel page cache makes light-loaded pageserver have deceptive
fast performance. Using direct IO would offer predictable latencies of
our virtual file IO operations.

In particular for reads, the data pages also have an extremely low
temporal locality because the most frequently accessed pages are cached
on the compute side.

## Summary of changes

This PR enables pageserver to use direct IO for delta layer and image
layer reads. We can ship them separately because these layers are
write-once, read-many, so we will not be mixing buffered IO with direct
IO.

- implement `IoBufferMut`, an buffer type with aligned allocation
(currently set to 512).
- use `IoBufferMut` at all places we are doing reads on image + delta
layers.
- leverage Rust type system and use `IoBufAlignedMut` marker trait to
guarantee that the input buffers for the IO operations are aligned.
- page cache allocation is also made aligned.

_* in-memory layer reads and the write path will be shipped separately._

## Testing

Integration test suite run with O_DIRECT enabled:
https://github.com/neondatabase/neon/pull/9350

## Performance

We evaluated performance based on the `get-page-at-latest-lsn`
benchmark. The results demonstrate a decrease in the number of IOps, no
sigificant change in the latency mean, and an slight improvement on the
p99.9 and p99.99 latencies.


[Benchmark](https://www.notion.so/neondatabase/Benchmark-O_DIRECT-for-image-and-delta-layers-2024-10-01-112f189e00478092a195ea5a0137e706?pvs=4)

## Rollout

We will add `virtual_file_io_mode=direct` region by region to enable
direct IO on image + delta layers.

Signed-off-by: Yuchen Liang <yuchen@neon.tech>
2024-10-21 11:01:25 -04:00
Alex Chi Z.
aca81f5fa4 fix(pageserver): make image split layer writer finish atomic (#8841)
Part of https://github.com/neondatabase/neon/issues/8836

## Summary of changes

This pull request makes the image layer split writer atomic when
finishing the layers. All the produced layers either finish at the same
time, or discard at the same time. Note that this does not prevent
atomicity when crash, but anyways, it will be cleaned up on pageserver
restart.

---------

Signed-off-by: Alex Chi Z <chi@neon.tech>
Co-authored-by: Christian Schwarz <christian@neon.tech>
2024-10-21 15:59:48 +01:00
Ivan Efremov
2dcac94194 proxy: Use common error interface for error handling with cplane (#9454)
- Remove obsolete error handles.
- Use one source of truth for cplane errors.
#18468
2024-10-21 17:20:09 +03:00
Ivan Efremov
ababa50cce Use '-f' for make clean in Makefile compute (#9464)
Use '-f' instead of '--force' because it is impossible to clean the
targets on MacOS
2024-10-21 16:20:39 +03:00
Alexander Bayandin
163beaf9ad CI: use build-tools on Debian 12 whenever we use Neon artifact (#9463)
## Problem

```
+ /tmp/neon/pg_install/v16/bin/psql '***' -c 'SELECT version()'
/tmp/neon/pg_install/v16/bin/psql: /lib/x86_64-linux-gnu/libc.so.6: version `GLIBC_2.33' not found (required by /tmp/neon/pg_install/v16/bin/psql)
/tmp/neon/pg_install/v16/bin/psql: /lib/x86_64-linux-gnu/libc.so.6: version `GLIBC_2.34' not found (required by /tmp/neon/pg_install/v16/bin/psql)
/tmp/neon/pg_install/v16/bin/psql: /lib/x86_64-linux-gnu/libc.so.6: version `GLIBC_2.32' not found (required by /tmp/neon/pg_install/v16/lib/libpq.so.5)
/tmp/neon/pg_install/v16/bin/psql: /lib/x86_64-linux-gnu/libc.so.6: version `GLIBC_2.33' not found (required by /tmp/neon/pg_install/v16/lib/libpq.so.5)
/tmp/neon/pg_install/v16/bin/psql: /lib/x86_64-linux-gnu/libc.so.6: version `GLIBC_2.34' not found (required by /tmp/neon/pg_install/v16/lib/libpq.so.5)
```

## Summary of changes
- Use `build-tools:pinned-bookworm` whenever we download Neon artefact
2024-10-21 12:14:19 +01:00
Alexander Bayandin
5b37485c99 Rename dockerfiles from Dockerfile.<something> to <something>.Dockerfile (#9446)
## Problem

Our dockerfiles, for some historical reason, have unconventional names
`Dockerfile.<something>`, and some tools (like GitHub UI) fail to highlight
the syntax in them.

> Some projects may need distinct Dockerfiles for specific purposes. A
common convention is to name these `<something>.Dockerfile`

From: https://docs.docker.com/build/concepts/dockerfile/#filename

## Summary of changes
- Rename `Dockerfile.build-tools` -> `build-tools.Dockerfile`
- Rename `compute/Dockerfile.compute-node` ->
`compute/compute-node.Dockerfile`
2024-10-21 09:51:12 +01:00
Folke Behrens
ed958da38a proxy: Make tests fail fast when test proxy exited early (#9432)
This currently happens when proxy is not compiled with feature
`testing`.
Also fix an adjacent function.
2024-10-21 08:29:23 +00:00
Conrad Ludgate
cc25ef7342 bump pg-session-jwt version (#9455)
forgot to bump this before
2024-10-20 14:42:50 +02:00
Arpad Müller
71d09c78d4 Accept basebackup <tenant> <timeline> --gzip requests (#9456)
In #9453, we want to remove the non-gzipped basebackup code in the
computes, and always request gzipped basebackups.

However, right now the pageserver's page service only accepts basebackup
requests in the following formats:

* `basebackup <tenant_id> <timeline_id>`, lsn is determined by the
pageserver as the most recent one (`timeline.get_last_record_rlsn()`)
* `basebackup <tenant_id> <timeline_id> <lsn>`
* `basebackup <tenant_id> <timeline_id> <lsn> --gzip`

We add a fourth case, `basebackup <tenant_id> <timeline_id> --gzip` to
allow gzipping the request for the latest lsn as well.
2024-10-19 00:23:49 +02:00
Tristan Partin
62a334871f Take the collector name as argument when generating sql_exporter configs
In neon_collector_autoscaling.jsonnet, the collector name is hardcoded
to neon_collector_autoscaling. This issue manifests itself such that
sql_exporter would not find the collector configuration.

Signed-off-by: Tristan Partin <tristan@neon.tech>
2024-10-18 09:36:29 -05:00
Vlad Lazar
e162ab8b53 storcon: handle ongoing deletions gracefully (#9449)
## Problem

Pageserver returns 409 (Conflict) if any of the shards are already
deleting the timeline. This resulted in an error being propagated out of
the HTTP handler and to the client. It's an expected scenario so we
should handle it nicely.

This caused failures in `test_storage_controller_smoke`
[here](https://neon-github-public-dev.s3.amazonaws.com/reports/pr-9435/11390431900/index.html#suites/8fc5d1648d2225380766afde7c428d81/86eee4b002d6572d).

## Summary of Changes

Instead of returning an error on 409s, we now bubble the status code up
and let the HTTP handler code retry until it gets a 404 or times out.
2024-10-18 15:33:04 +01:00
Conrad Ludgate
5cbdec9c79 [local_proxy]: install pg_session_jwt extension on demand (#9370)
Follow up on #9344. We want to install the extension automatically. We
didn't want to couple the extension into compute_ctl so instead
local_proxy is the one to issue requests specific to the extension.

depends on #9344 and #9395
2024-10-18 14:41:21 +01:00
Vlad Lazar
ec6d3422a5 pageserver: disconnect when asking client to reconnect (#9390)
## Problem

Consider the following sequence of events:
1. Shard location gets downgraded to secondary while there's a libpq
connection in pagestream mode from the compute
2. There's no active tenant, so we return `QueryError::Reconnect` from
`PageServerHandler::handle_get_page_at_lsn_request`.
3. Error bubbles up to `PostgresBackendIO::process_message`, bailing us
out of pagestream mode.
4. We instruct the client to reconnnect, but continue serving the libpq
connection. The client isn't yet aware of the request to reconnect and
believes it is still in pagestream mode. Pageserver fails to deserialize
get page requests wrapped in `CopyData` since it's not in pagestream
mode.

## Summary of Changes

When we wish to instruct the client to reconnect, also disconnect from
the server side after flushing the error.

Closes https://github.com/neondatabase/cloud/issues/17336
2024-10-18 13:38:59 +01:00
Arseny Sher
fecff15f18 walproposer: immediately exit if sync-safekeepers collected 0/0. (#9442)
Otherwise term history starting with 0/0 is streamed to safekeepers.

ref https://github.com/neondatabase/neon/issues/9434
2024-10-18 15:31:50 +03:00
Jere Vaara
3532ae76ef compute_ctl: Add endpoint that allows extensions to be installed (#9344)
Adds endpoint to install extensions:

**POST** `/extensions`
```
{"extension":"pg_sessions_jwt","database":"neondb","version":"1.0.0"}
```

Will be used by `local-proxy`.
Example, for the JWT authentication to work the database needs to have
the pg_session_jwt extension and also to enable JWT to work in RLS
policies.

---------

Co-authored-by: Conrad Ludgate <conradludgate@gmail.com>
2024-10-18 15:07:36 +03:00
Folke Behrens
15fecffe6b Update ruff to much newer version (#9433)
Includes a multidict patch release to fix build with newer cpython.
2024-10-18 12:42:41 +02:00
Arseny Sher
98fee7a97d Increase shared_buffers in test_subscriber_synchronous_commit. (#9427)
Might make the test less flaky.
2024-10-18 13:31:14 +03:00
John Spray
b7173b1ef0 storcon: fix case where we might fail to send compute notifications after two opposite migrations (#9435)
## Problem

If we migrate A->B, then B->A, and the notification of A->B fails, then
we might have retained state that makes us think "A" is the last state
we sent to the compute hook, whereas when we migrate B->A we should
really be sending a fresh notification in case our earlier failed
notification has actually mutated the remote compute config.

Closes: #9417 

## Summary of changes

- Add a reproducer for the bug
(`test_storage_controller_compute_hook_revert`)
- Refactor compute hook code to represent remote state with
`ComputeRemoteState` which stores a boolean for whether the compute has
fully applied the change as well as the request that the compute
accepted.
- The actual bug fix: after sending a compute notification, if we got a
423 response then update our ComputeRemoteState to reflect that we have
mutated the remote state. This way, when we later try and notify for our
historic location, we will properly see that as a change and send the
notification.

Co-authored-by: Vlad Lazar <vlad@neon.tech>
2024-10-18 11:29:23 +01:00
Jere Vaara
24654b8eee compute_ctl: Add endpoint that allows setting role grants (#9395)
This PR introduces a `/grants` endpoint which allows setting specific
`privileges` to certain `role` for a certain `schema`.

Related to #9344 

Together these endpoints will be used to configure JWT extension and set
correct usage to its schema to specific roles that will need them.

---------

Co-authored-by: Conrad Ludgate <conradludgate@gmail.com>
2024-10-18 11:25:45 +01:00
Conrad Ludgate
b8304f90d6 2024 oct new clippy lints (#9448)
Fixes new lints from `cargo +nightly clippy` (`clippy 0.1.83 (798fb83f
2024-10-16)`)
2024-10-18 10:27:50 +01:00
Conrad Ludgate
d762ad0883 update rustls (#9396)
The forever ongoing effort of juggling multiple versions of rustls :3

now with new crypto library aws-lc.

Because of dependencies, it is currently impossible to not have both
ring and aws-lc in the dep tree, therefore our only options are not
updating rustls or having both crypto backends enabled...

According to benchmarks run by the rustls maintainer, aws-lc is faster
than ring in some cases too <https://jbp.io/graviola/>, so it's not
without its upsides,
2024-10-17 20:45:37 +01:00
Arpad Müller
928d98b6dc Update Rust to 1.82.0 and mold to 2.34.0 (#9445)
We keep the practice of keeping the compiler up to date, pointing to the
latest release. This is done by many other projects in the Rust
ecosystem as well.

[Release notes](https://github.com/rust-lang/rust/blob/master/RELEASES.md#version-1820-2024-10-17).

Also update mold. [release notes for
2.34.0](https://github.com/rui314/mold/releases/tag/v2.34.0), [release
notes for 2.34.1](https://github.com/rui314/mold/releases/tag/v2.34.1).

Prior update was in #8939.
2024-10-17 21:25:51 +02:00
John Spray
24398bf060 pageserver: detect & warn on loading an old index which is probably the result of a bad generation (#9383)
## Problem

The pageserver generally trusts the storage controller/control plane to
give it valid generations. However, sometimes it should be obvious that
a generation is bad, and for defense in depth we should detect that on
the pageserver.

This PR is part 1 of 2:
1. in this PR we detect and warn on such situations, but do not block
starting up the tenant. Once we have confidence that the check is not
firing unexpectedly in the field
2. part 2 of 2 will introduce a condition that refuses to start a tenant
in this situtation, and a test for that (maybe, if we can figure out how
to spoof an ancient mtime)

Related: #6951

## Summary of changes

- When loading an index older than 2 weeks, log an INFO message noting
that we will check for other indices
- When loading an index older than 2 weeks _and_ a newer-generation
index exists, log a warning.
2024-10-17 19:02:24 +01:00
Alex Chi Z.
63b3491c1b refactor(pageserver): remove aux v1 code path (#9424)
Part of the aux v1 retirement
https://github.com/neondatabase/neon/issues/8623

## Summary of changes

Remove write/read path for aux v1, but keeping the config item and the
index part field for now.

---------

Signed-off-by: Alex Chi Z <chi@neon.tech>
2024-10-17 17:22:44 +01:00
Anastasia Lubennikova
858867c627 Add logging of installed_extensions (#9438)
Simple PR to log installed_extensions statistics.

in the following format:
```
2024-10-17T13:53:02.860595Z  INFO [NEON_EXT_STAT] {"extensions":[{"extname":"plpgsql","versions":["1.0"],"n_databases":2},{"extname":"neon","versions":["1.5"],"n_databases":1}]}
```
2024-10-17 16:35:19 +01:00
Erik Grinaker
299cde899b safekeeper: flush WAL on compute disconnect (#9436)
## Problem

In #9259, we found that the `check_safekeepers_synced` fast path could
result in a lower basebackup LSN than the `flush_lsn` reported by
Safekeepers in `VoteResponse`, causing the compute to panic once on
startup.

This would happen if the Safekeeper had unflushed WAL records due to a
compute disconnect. The `TIMELINE_STATUS` query would report a
`flush_lsn` below these unflushed records, while `VoteResponse` would
flush the WAL and report the advanced `flush_lsn`. See
https://github.com/neondatabase/neon/issues/9259#issuecomment-2410849032.

## Summary of changes

Flush the WAL if the compute disconnects during WAL processing.
2024-10-17 17:19:18 +02:00
Erik Grinaker
4c9835f4a3 storage_controller: delete stale shards when deleting tenant (#9333)
## Problem

Tenant deletion only removes the current shards from remote storage. Any
stale parent shards (before splits) will be left behind. These shards
are kept since child shards may reference data from the parent until new
image layers are generated.

## Summary of changes

* Document a special case for pageserver tenant deletion that deletes
all shards in remote storage when given an unsharded tenant ID, as well
as any unsharded tenant data.
* Pass an unsharded tenant ID to delete all remote storage under the
tenant ID prefix.
* Split out `RemoteStorage::delete_prefix()` to delete a bucket prefix,
with additional test coverage.
* Add a `delimiter` argument to `asset_prefix_empty()` to support
partial prefix matches (i.e. all shards starting with a given tenant
ID).
2024-10-17 14:34:51 +00:00
Alex Chi Z.
f3a3eefd26 feat(pageserver): do space check before gc-compaction (#9250)
part of https://github.com/neondatabase/neon/issues/9114

## Summary of changes

gc-compaction may take a lot of disk space, and if it does, the caller
should do a partial gc-compaction. This patch adds space check for the
compaction job.

---------

Signed-off-by: Alex Chi Z <chi@neon.tech>
2024-10-17 10:29:53 -04:00
Ivan Efremov
a7c05686cc test_runner: Update the README.md to build neon with 'testing' (#9437)
Without having the '--features testing' in the cargo build the proxy
won't start causing tests to fail.
2024-10-17 17:20:42 +03:00
Anastasia Lubennikova
8b47938140 Add support of extensions for v17 (part 3) (#9430)
- pgvector 7.4

update support of extensions for v14-v16:
- pgvector 7.2 -> 7.4
2024-10-17 13:37:21 +01:00
Arpad Müller
35e7d91bc9 Add config variable for timeline offloading (#9421)
Adds a configuration variable for timeline offloading support. The added
pageserver-global config option controls whether the pageserver
automatically offloads timelines during compaction.

Therefore, already offloaded timelines are not affected by this, nor is
the manual testing endpoint.

This allows the rollout of timeline offloading to be driven by the
storage team.

Part of #8088
2024-10-17 12:07:58 +00:00
Ivan Efremov
22d8834474 proxy: move the connection pools to separate file (#9398)
First PR for #9284
Start unification of the client and connection pool interfaces:
- Exclude the 'global_connections_count' out from the get_conn_entry()
- Move remote connection pools to the conn_pool_lib as a reference
- Unify clients among all the conn pools
2024-10-17 13:38:24 +03:00
John Spray
db68e82235 storage_scrubber: fixes to garbage commands (#9409)
## Problem

While running `find-garbage` and `purge-garbage`, I encountered two
things that needed updating:
- Console API may omit `user_id` since org accounts were added
- When we cut over to using GenericRemoteStorage, the object listings we
do during purge did not get proper retry handling, so could easily fail
on usual S3 errors, and make the whole process drop out.

...and one bug:
- We had a `.unwrap` which expects that after finding an object in a
tenant path, a listing in that path will always return objects. This is
not true, because a pageserver might be deleting the path at the same
time as we scan it.

## Summary of changes

- When listing objects during purge, use backoff::retry
- Make `user_id` an `Option`
- Handle the case where a tenant's objects go away during find-garbage.
2024-10-17 10:06:02 +01:00
Konstantin Knizhnik
934dbb61f5 Check access_count in lfc_evict (#9407)
## Problem

See
https://neondb.slack.com/archives/C033A2WE6BZ/p1729007738526309?thread_ts=1722942856.987979&cid=C033A2WE6BZ

When replica receives WAL record which target page is not present in
shared buffer, we evict this page from LFC.
If all pages from the LFC chunk are evicted, then chunk is moved to the
beginning of LRU least to force it reuse.
Unfortunately access_count is not checked and if the entry is access at
this moment then this operation can cause LRU list corruption.

## Summary of changes

Check `access_count` in `lfc_evict`

## Checklist before requesting a review

- [ ] I have performed a self-review of my code.
- [ ] If it is a core feature, I have added thorough tests.
- [ ] Do we need to implement analytics? if so did you add the relevant
metrics to the dashboard?
- [ ] If this PR requires public announcement, mark it with
/release-notes label and add several sentences in this section.

## Checklist before merging

- [ ] Do not forget to reformat commit message to not include the above
checklist

Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>
2024-10-17 08:04:57 +03:00
Christian Schwarz
67d5d98b19 readme: fix build instructions for debian 12 (#9371)
We need libprotobuf-dev for some of the
`/usr/include/google/protobuf/...*.proto`
referenced by our protobuf decls.
2024-10-16 21:47:53 +02:00
Tristan Partin
e0fa6bcf1a Fix some sql_exporter metrics for PG 17
Checkpointer related statistics moved from pg_stat_bgwriter to
pg_stat_checkpointer, so we need to adjust our queries accordingly.

Signed-off-by: Tristan Partin <tristan@neon.tech>
2024-10-16 14:46:33 -05:00
Tristan Partin
409a286eaa Fix typo in sql_exporter generator
Bad copy-paste seemingly. This manifested itself as a failure to start
for the sql_exporter, and was just dying on loop in staging. A future PR
will have E2E testing of sql_exporter.

Signed-off-by: Tristan Partin <tristan@neon.tech>
2024-10-16 13:08:40 -05:00
Arpad Müller
0551cfb6a7 Fix beta clippy warnings (#9419)
```
warning: first doc comment paragraph is too long
  --> compute_tools/src/installed_extensions.rs:35:1
   |
35 | / /// Connect to every database (see list_dbs above) and get the list of installed extensions.
36 | | /// Same extension can be installed in multiple databases with different versions,
37 | | /// we only keep the highest and lowest version across all databases.
   | |_
   |
   = help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#too_long_first_doc_paragraph
   = note: `#[warn(clippy::too_long_first_doc_paragraph)]` on by default
help: add an empty line
   |
35 ~ /// Connect to every database (see list_dbs above) and get the list of installed extensions.
36 + ///
   |
```
2024-10-16 19:04:56 +01:00
Folke Behrens
ed694732e7 proxy: merge AuthError and AuthErrorImpl (#9418)
Since GetAuthInfoError now boxes the ControlPlaneError message the
variant is not big anymore and AuthError is 32 bytes.
2024-10-16 19:10:49 +02:00
Alex Chi Z.
8a114e3aed refactor(pageserver): upgrade remote_storage to use hyper1 (#9405)
part of https://github.com/neondatabase/neon/issues/9255

## Summary of changes

Upgrade remote_storage crate to use hyper1. Hyper0 is used when
providing the streaming HTTP body to the s3 SDK, and it is refactored to
use hyper1.


Signed-off-by: Alex Chi Z <chi@neon.tech>
2024-10-16 16:19:45 +01:00
Arpad Müller
55b246085e Activate timelines during unoffload (#9399)
The current code has forgotten to activate timelines during unoffload,
leading to inability to receive the basebackup, due to the timeline
still being in loading state.

```
  stderr:
    command failed: compute startup failed: failed to get basebackup@0/0 from pageserver postgresql://no_user@localhost:15014

    Caused by:
        0: db error: ERROR: Not found: Timeline 508546c79b2b16a84ab609fdf966e0d3/bfc18c24c4b837ecae5dbb5216c80fce is not active, state: Loading
        1: ERROR: Not found: Timeline 508546c79b2b16a84ab609fdf966e0d3/bfc18c24c4b837ecae5dbb5216c80fce is not active, state: Loading
```

Therefore, also activate the timeline during unoffloading.

Part of #8088
2024-10-16 16:47:17 +02:00
Anastasia Lubennikova
9668601f46 Add support of extensions for v17 (part 2) (#9389)
- plv8 3.2.3
    - HypoPG 1.4.1
    - pgtap 1.3.3
    - timescaledb 2.17.0
    - pg_hint_plan 17_1_7_0
    - rdkit Release_2024_09_1
    - pg_uuidv7 1.6.0
    - wal2json 2.6
    - pg_ivm 1.9
    - pg_partman 5.1.0

    update support of extensions for v14-v16:
    - HypoPG 1.4.0 -> 1.4.1
    - pgtap 1.2.0 -> 1.3.3
    - plpgsql_check 2.5.3 -> 2.7.11
    - pg_uuidv7 1.0.1 -> 1.6.0
    - wal2json 2.5 -> 2.6
    - pg_ivm 1.7 -> 1.9
    - pg_partman 5.0.1 -> 5.1.0
2024-10-16 15:29:23 +01:00
Arpad Müller
3140c14d60 Remove allow(clippy::unknown_lints) (#9416)
the lint stabilized in 1.80.
2024-10-16 16:28:55 +02:00
John Spray
d6281cbe65 tests: stabilize test_timelines_parallel_endpoints (#9413)
## Problem

This test would get failures like `command failed: Found no timeline id
for branch name 'branch_8'`

It's because neon_local is being invoked concurrently for branch
creation, which is unsafe (they'll step on each others' JSON writes)

Example failure:
https://neon-github-public-dev.s3.amazonaws.com/reports/pr-9410/11363051979/index.html#testresult/5ddc56c640f5422b/retries

## Summary of changes

- Don't do branch creation concurrently with endpoint creation via neon_local
2024-10-16 15:27:46 +01:00
Vlad Lazar
d490ad23e0 storcon: use the same trace fields for reconciler and results (#9410)
## Problem

The reconciler use `seq`, but processing of results uses `sequence`.
Order is different too. It makes it annoying to read logs.

## Summary of Changes

Use the same tracing fields in both
2024-10-16 14:04:17 +01:00
Folke Behrens
f14e45f0ce proxy: format imports with nightly rustfmt (#9414)
```shell
cargo +nightly fmt -p proxy -- -l --config imports_granularity=Module,group_imports=StdExternalCrate,reorder_imports=true
```

These rust-analyzer settings for VSCode should help retain this style:
```json
  "rust-analyzer.imports.group.enable": true,
  "rust-analyzer.imports.prefix": "crate",
  "rust-analyzer.imports.merge.glob": false,
  "rust-analyzer.imports.granularity.group": "module",
  "rust-analyzer.imports.granularity.enforce": true,
```
2024-10-16 15:01:56 +02:00
John Spray
89a65a9e5a pageserver: improve handling of archival_config calls during Timeline shutdown (#9415)
## Problem

In test `test_timeline_offloading`, we see failures like:
```
PageserverApiException: queue is in state Stopped
```

Example failure:
https://neon-github-public-dev.s3.amazonaws.com/reports/main/11356917668/index.html#testresult/ff0e348a78a974ee/retries

## Summary of changes

- Amend code paths that handle errors from RemoteTimelineClient to check
for cancellation and emit the Cancelled error variant in these cases
(will give clients a 503 to retry)
- Remove the implicit `#[from]` for the Other error case, to make it
harder to add code that accidentally squashes errors into this
(500-equivalent) error variant.

This would be neater if we made RemoteTimelineClient return a structured
error instead of anyhow::Error, but that's a bigger refactor.

I'm not sure if the test really intends to hit this path, but the error
handling fix makes sense either way.
2024-10-16 13:39:58 +01:00
Cihan Demirci
bc6b8cee01 don't trigger workflows in two repos (#9340)
https://github.com/neondatabase/cloud/issues/16723
2024-10-16 10:43:48 +01:00
Tristan Partin
061ea0de7a Add jsonnetfmt targets
This should make it a little bit easier for people wanting to check if
their files are formated correctly. Has the added bonus of making the CI
check simpler as well.

Signed-off-by: Tristan Partin <tristan@neon.tech>
2024-10-15 20:01:13 -05:00
Tristan Partin
be5d6a69dc Fix jsonnet_files wildcard
Just a typo in a path.

Signed-off-by: Tristan Partin <tristan@neon.tech>
2024-10-15 16:30:31 -05:00
Matthias van de Meent
18f4e5f10c Add newly added metrics from neondatabase/neon#9116 to exports (#9402)
They weren't added in that PR, but should be available immediately on
rollout as the neon extension already defaults to 1.5.
2024-10-15 23:13:31 +02:00
Alex Chi Z.
f1eb703256 fix(pageserver): use a buffer for basebackup; add aux basebackup metrics log (#9401)
Our replication bench project is stuck because it is too slow to
generate basebackup and it caused compute to disconnect.

https://neondb.slack.com/archives/C03438W3FLZ/p1728330685012419

The compute timeout for waiting for basebackup is 10m (is it true?).
Generating basebackup directly on pageserver takes ~3min. Therefore, I
suspect it's because there are too many wasted round-trip time for
writing the 10000+ snapshot aux files. Also, it is possible that the
basebackup process takes too long time retrieving all aux files that it
did not write anything over the wire protocol, causing a read timeout.

Basebackup size is 800KB gzipped for that project and was 55MB tar
before compression.

## Summary of changes

* Potentially fix the issue by placing a write buffer for basebackup.
* Log how many aux files did we read + the time spent on it.

Signed-off-by: Alex Chi Z <chi@neon.tech>
2024-10-15 16:35:21 -04:00
Tristan Partin
cf7a596a15 Generate sql_exporter config files with Jsonnet
There are quite a few benefits to this approach:

- Reduce config duplication
  - The two sql_exporter configs were super similar with just a few
    differences
- Pull SQL queries into standalone files
  - That means we could run a SQL formatter on the file in the future
  - It also means access to syntax highlighting
- In the future, run different queries for different PG versions
  - This is relevant because right now, we have queries that are failing
    on PG 17 due to catalog updates

Signed-off-by: Tristan Partin <tristan@neon.tech>
2024-10-15 11:18:38 -05:00
Konstantin Knizhnik
614c3aef72 Remove redundant code (#9373)
## Problem

There is double update of resize cache in `put_rel_truncation`
Also `page_server_request` contains check that fork is MAIN_FORKNUM
which
1. is incorrect (because Vm/FSM pages are shreded in the same way as
MAIN fork pages and
2. is redundant because `page_server_request` is never called for `get
page` request so first part to OR condition is always true.

## Summary of changes

Remove redundant code

## Checklist before requesting a review

- [ ] I have performed a self-review of my code.
- [ ] If it is a core feature, I have added thorough tests.
- [ ] Do we need to implement analytics? if so did you add the relevant
metrics to the dashboard?
- [ ] If this PR requires public announcement, mark it with
/release-notes label and add several sentences in this section.

## Checklist before merging

- [ ] Do not forget to reformat commit message to not include the above
checklist

---------

Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>
2024-10-15 17:18:52 +03:00
Folke Behrens
fb74c21e8c proxy: Migrate jwt module away from anyhow (#9361) 2024-10-15 15:24:56 +02:00
Conrad Ludgate
d92d36a315 [local_proxy] update api for pg_session_jwt (#9359)
pg_session_jwt now:
1. Sets the JWK in a PGU_BACKEND session guc, no longer in the init()
function.
2. JWK no longer needs the kid.
2024-10-15 12:13:57 +00:00
Arpad Müller
ec4cc30de9 Shut down timelines during offload and add offload tests (#9289)
Add a test for timeline offloading, and subsequent unoffloading.

Also adds a manual endpoint, and issues a proper timeline shutdown
during offloading which prevents a pageserver hang at shutdown.

Part of #8088.
2024-10-15 09:46:51 +00:00
John Spray
73c6626b38 pageserver: stabilize & refine controller scale test (#8971)
## Problem

We were seeing timeouts on migrations in this test.

The test unfortunately tends to saturate local storage, which is shared
between the pageservers and the control plane database, which makes the
test kind of unrealistic. We will also want to increase the scale of
this test, so it's worth fixing that.

## Summary of changes

- Instead of randomly creating timelines at the same time as the other
background operations, explicitly identify a subset of tenant which will
have timelines, and create them at the start. This avoids pageservers
putting a lot of load on the test node during the main body of the test.
- Adjust the tenants created to create some number of 8 shard tenants
and the rest 1 shard tenants, instead of just creating a lot of 2 shard
tenants.
- Use archival_config to exercise tenant-mutating operations, instead of
using timeline creation for this.
- Adjust reconcile_until_idle calls to avoid waiting 5 seconds between
calls, which causes timelines with large shard count tenants.
- Fix a pageserver bug where calls to archival_config during activation
get 404
2024-10-15 09:31:18 +01:00
Alexander Bayandin
0fc4ada3ca Switch CI, Storage and Proxy to Debian 12 (Bookworm) (#9170)
## Problem

This PR switches CI and Storage to Debain 12 (Bookworm) based images.

## Summary of changes
- Add Debian codename (`bookworm`/`bullseye`) to most of docker tags,
create un-codenamed images to be used by default
- `vm-compute-node-image`: create a separate spec for `bookworm` (we
don't need to build cgroups in the future)
- `neon-image`: Switch to `bookworm`-based `build-tools` image
  - Storage components and Proxy use it
- CI: run lints and tests on `bookworm`-based `build-tools` image
2024-10-14 21:12:43 +01:00
Matthias van de Meent
dab96a6eb1 Add more timing histogram and gauge metrics to the Neon extension (#9116)
We now also track:

- Number of PS IOs in-flight
- Number of pages cached by smgr prefetch implementation
- IO timing histograms for LFC reads and writes, per IO issued

## Problem

There's little insight into the timing metrics of LFC, and what the
prefetch state of each backend is.

This changes that, by measuring (and subsequently exposing) these data
points.

## Summary of changes

- Extract IOHistogram as separate type, rather than a collection of
fields on NeonMetrics
- others, see items above.

Part of https://github.com/neondatabase/neon/issues/8926
2024-10-14 20:30:21 +02:00
Arpad Müller
f54e3e9147 Also consider offloaded timelines for obtaining retain_lsn (#9308)
Also consider offloaded timelines for obtaining `retain_lsn`. This is
required for correctness for all timelines that have not been flattened
yet: otherwise we GC data that might still be required for reading.

This somewhat counteracts the original purpose of timeline offloading of
not having to iterate over offloaded timelines, but sadly it's required.
In the future, we can improve the way the offloaded timelines are
stored.

We also make the `retain_lsn` optional so that in the future, when we
implement flattening, we can make it None. This also applies to full
timeline objects by the way, where it would probably make most sense to
add a bool flag whether the timeline is successfully flattened, and if
it is, one can exclude it from `retain_lsn` as well.

Also, track whether a timeline was offloaded or not in `retain_lsn` so
that the `retain_lsn` can be excluded from visibility and size
calculation.

Part of #8088
2024-10-14 17:54:03 +02:00
Vlad Lazar
f4f7ea247c tests: make size comparisons more lenient (#9388)
The empirically determined threshold doesn't hold for PG 17.
Bump the limit to stabilise ci.
2024-10-14 16:50:12 +01:00
Arpad Müller
d92ff578c4 Add test for fixed storage broker issue (#9311)
Adds a test for the (now fixed) storage broker limit issue, see #9268
for the description and #9299 for the fix.

Also fix a race condition with endpoint creation/starts running in parallel,
leading to file not found errors.
2024-10-14 14:34:57 +02:00
Alexander Bayandin
31b7703fa8 CI(build-build-tools): fix unexpected cancellations (#9357)
## Problem
When `Dockerfile.build-tools` gets changed, several PRs catch up with
it and some might get unexpectedly cancelled workflows because of
GitHub's concurrency model for workflows.
See the comment in the code for more details.

It should be possible to revert it after
https://github.com/orgs/community/discussions/41518 (I don't expect it
anytime soon, but I subscribed)

## Summary of changes
- Do not queue `build-build-tools-image` workflows in the concurrency
group
2024-10-14 11:51:01 +01:00
Konstantin Knizhnik
d056ae9be5 Ignore pg_dynshmem fiel when comparing directories (#9374)
## Problem

At MacOS `pg_dynshmem` file is create in PGDATADIR which cause mismatch
in directories comparison

## Summary of changes

Add this files to the ignore list.

## Checklist before requesting a review

- [ ] I have performed a self-review of my code.
- [ ] If it is a core feature, I have added thorough tests.
- [ ] Do we need to implement analytics? if so did you add the relevant
metrics to the dashboard?
- [ ] If this PR requires public announcement, mark it with
/release-notes label and add several sentences in this section.

## Checklist before merging

- [ ] Do not forget to reformat commit message to not include the above
checklist

Co-authored-by: Konstantin Knizhnik <knizhnik@neon.tech>
2024-10-14 13:45:20 +03:00
Conrad Ludgate
cb9ab7463c proxy: split out the console-redirect backend flow (#9270)
removes the ConsoleRedirect backend from the main auth::Backends enum,
copy-paste the existing crate::proxy::task_main structure to use the
ConsoleRedirectBackend exclusively.

This makes the logic a bit simpler at the cost of some fairly trivial
code duplication.
2024-10-14 12:25:55 +02:00
Conrad Ludgate
ab5bbb445b proxy: refactor auth backends (#9271)
preliminary for #9270 

The auth::Backend didn't need to be in the mega ProxyConfig object, so I
split it off and passed it manually in the few places it was necessary.

I've also refined some of the uses of config I saw while doing this
small refactor.

I've also followed the trend and make the console redirect backend it's
own struct, same as LocalBackend and ControlPlaneBackend.
2024-10-11 20:14:52 +01:00
Alexander Bayandin
5ef805e12c CI(run-python-test-set): allow to skip missing compatibility snapshot (#9365)
## Problem
Action `run-python-test-set` fails if it is not used for `regress_tests`
on release PR, because it expects
`test_compatibility.py::test_create_snapshot` to generate a snapshot,
and the test exists only in `regress_tests` suite.
For example, in https://github.com/neondatabase/neon/pull/9291
[`test-postgres-client-libs`](https://github.com/neondatabase/neon/actions/runs/11209615321/job/31155111544)
job failed.

## Summary of changes
- Add `skip-if-does-not-exist` input to `.github/actions/upload` action
(the same way we do for `.github/actions/download`)
- Set `skip-if-does-not-exist=true` for "Upload compatibility snapshot"
step in `run-python-test-set` action
2024-10-11 16:58:41 +01:00
a-masterov
091a175a3e Test versions mismatch (#9167)
## Problem
We faced the problem of incompatibility of the different components of
different versions.
This should be detected automatically to prevent production bugs.
## Summary of changes
The test for this situation was implemented

Co-authored-by: Alexander Bayandin <alexander@neon.tech>
2024-10-11 15:29:54 +02:00
Fedor Dikarev
326cd80f0d ci: gh-workflow-stats-action v0.1.4: remove debug output and proper pagination (#9356)
## Problem
In previous version pagination didn't work so we collect information
only for first 30 jobs in WorkflowRun
2024-10-11 14:46:45 +02:00
Folke Behrens
6baf1aae33 proxy: Demote some errors to warnings in logs (#9354) 2024-10-11 11:29:08 +02:00
John Spray
184935619e tests: stabilize test_storage_controller_heartbeats (#9347)
## Problem

This could fail with `reconciliation in progress` if running on a slow
test node such that background reconciliation happens at the same time
as we call consistency_check.

Example:
https://neon-github-public-dev.s3.amazonaws.com/reports/main/11258171952/index.html#/testresult/54889c9469afb232

## Summary of changes

- Call reconcile_until_idle before calling consistency check once,
rather than calling consistency check until it passes
2024-10-11 09:41:08 +01:00
Ivan Efremov
b2ecbf3e80 Introduce "quota" ErrorKind (#9300)
## Problem
Fixes #8340
## Summary of changes
Introduced ErrorKind::quota to handle quota-related errors
## Checklist before requesting a review

- [x] I have performed a self-review of my code.
- [ ] If it is a core feature, I have added thorough tests.
- [ ] Do we need to implement analytics? if so did you add the relevant
metrics to the dashboard?
- [ ] If this PR requires public announcement, mark it with
/release-notes label and add several sentences in this section.

## Checklist before merging

- [ ] Do not forget to reformat commit message to not include the above
checklist
2024-10-11 10:45:55 +03:00
Tristan Partin
53147b51f9 Use valid type hints for Python 3.9
I have no idea how this made it past the linters.

Signed-off-by: Tristan Partin <tristan@neon.tech>
2024-10-10 13:00:25 -05:00
Tristan Partin
006d9dfb6b Add compute_config_dir fixture
Allows easy access to various compute config files.

Signed-off-by: Tristan Partin <tristan@neon.tech>
2024-10-10 12:43:40 -05:00
Tristan Partin
1f7904c917 Enable cargo caching in check-codestyle-rust
This job takes an extraordinary amount of time for what I understand it
to do. The obvious win is caching dependencies.

Rory disabled caching in cd5732d9d8.
I assume this was to get gen3 runners up and running.

Signed-off-by: Tristan Partin <tristan@neon.tech>
2024-10-10 12:40:30 -05:00
John Spray
07c714343f tests: allow a log warning in test_cli_start_stop_multi (#9320)
## Problem

This test restarts services in an undefined order (whatever neon_local
does), which means we should be tolerant of warnings that come from
restarting the storage controller while a pageserver is running.

We can see failures with warnings from dropped requests, e.g.
https://neon-github-public-dev.s3.amazonaws.com/reports/pr-9307/11229000712/index.html#/testresult/d33d5cb206331e28
```
 WARN request{method=GET path=/v1/location_config request_id=b7dbda15-6efb-4610-8b19-a3772b65455f}: request was dropped before completing\n')
```

## Summary of changes

- allow-list the `request was dropped before completing` message on
pageservers before restarting services
2024-10-10 17:06:42 +01:00
Tristan Partin
264c34dfb7 Move path-related fixtures into their own module (#9304)
neon_fixtures.py has grown into quite a beast.

Signed-off-by: Tristan Partin <tristan@neon.tech>
2024-10-10 10:26:23 -05:00
Erik Grinaker
9dd80b9b4c storage_scrubber: fix faulty assertion when no timelines (#9345)
When there are no timelines in remote storage, the storage scrubber
would incorrectly trip an assertion with "Must be set if results are
present", referring to the last processed tenant ID. When there are no
timelines we don't expect there to be a tenant ID either.

The assertion was introduced in 37aa6fd.

Only apply the assertion when any timelines are present.
2024-10-10 09:09:53 -04:00
329 changed files with 9931 additions and 5864 deletions

View File

@@ -183,7 +183,7 @@ runs:
uses: actions/cache@v4
with:
path: ~/.cache/pypoetry/virtualenvs
key: v2-${{ runner.os }}-${{ runner.arch }}-python-deps-${{ hashFiles('poetry.lock') }}
key: v2-${{ runner.os }}-${{ runner.arch }}-python-deps-bookworm-${{ hashFiles('poetry.lock') }}
- name: Store Allure test stat in the DB (new)
if: ${{ !cancelled() && inputs.store-test-results-into-db == 'true' }}

View File

@@ -88,7 +88,7 @@ runs:
uses: actions/cache@v4
with:
path: ~/.cache/pypoetry/virtualenvs
key: v2-${{ runner.os }}-${{ runner.arch }}-python-deps-${{ hashFiles('poetry.lock') }}
key: v2-${{ runner.os }}-${{ runner.arch }}-python-deps-bookworm-${{ hashFiles('poetry.lock') }}
- name: Install Python deps
shell: bash -euxo pipefail {0}
@@ -218,6 +218,9 @@ runs:
name: compatibility-snapshot-${{ runner.arch }}-${{ inputs.build_type }}-pg${{ inputs.pg_version }}
# Directory is created by test_compatibility.py::test_create_snapshot, keep the path in sync with the test
path: /tmp/test_output/compatibility_snapshot_pg${{ inputs.pg_version }}/
# The lack of compatibility snapshot shouldn't fail the job
# (for example if we didn't run the test for non build-and-test workflow)
skip-if-does-not-exist: true
- name: Upload test results
if: ${{ !cancelled() }}

View File

@@ -7,6 +7,10 @@ inputs:
path:
description: "A directory or file to upload"
required: true
skip-if-does-not-exist:
description: "Allow to skip if path doesn't exist, fail otherwise"
default: false
required: false
prefix:
description: "S3 prefix. Default is '${GITHUB_SHA}/${GITHUB_RUN_ID}/${GITHUB_RUN_ATTEMPT}'"
required: false
@@ -15,10 +19,12 @@ runs:
using: "composite"
steps:
- name: Prepare artifact
id: prepare-artifact
shell: bash -euxo pipefail {0}
env:
SOURCE: ${{ inputs.path }}
ARCHIVE: /tmp/uploads/${{ inputs.name }}.tar.zst
SKIP_IF_DOES_NOT_EXIST: ${{ inputs.skip-if-does-not-exist }}
run: |
mkdir -p $(dirname $ARCHIVE)
@@ -33,14 +39,22 @@ runs:
elif [ -f ${SOURCE} ]; then
time tar -cf ${ARCHIVE} --zstd ${SOURCE}
elif ! ls ${SOURCE} > /dev/null 2>&1; then
echo >&2 "${SOURCE} does not exist"
exit 2
if [ "${SKIP_IF_DOES_NOT_EXIST}" = "true" ]; then
echo 'SKIPPED=true' >> $GITHUB_OUTPUT
exit 0
else
echo >&2 "${SOURCE} does not exist"
exit 2
fi
else
echo >&2 "${SOURCE} is neither a directory nor a file, do not know how to handle it"
exit 3
fi
echo 'SKIPPED=false' >> $GITHUB_OUTPUT
- name: Upload artifact
if: ${{ steps.prepare-artifact.outputs.SKIPPED == 'false' }}
shell: bash -euxo pipefail {0}
env:
SOURCE: ${{ inputs.path }}

View File

@@ -27,7 +27,7 @@ jobs:
runs-on: [ self-hosted, us-east-2, x64 ]
container:
image: neondatabase/build-tools:pinned
image: neondatabase/build-tools:pinned-bookworm
credentials:
username: ${{ secrets.NEON_DOCKERHUB_USERNAME }}
password: ${{ secrets.NEON_DOCKERHUB_PASSWORD }}

View File

@@ -124,28 +124,28 @@ jobs:
uses: actions/cache@v4
with:
path: pg_install/v14
key: v1-${{ runner.os }}-${{ runner.arch }}-${{ inputs.build-type }}-pg-${{ steps.pg_v14_rev.outputs.pg_rev }}-${{ hashFiles('Makefile', 'Dockerfile.build-tools') }}
key: v1-${{ runner.os }}-${{ runner.arch }}-${{ inputs.build-type }}-pg-${{ steps.pg_v14_rev.outputs.pg_rev }}-bookworm-${{ hashFiles('Makefile', 'build-tools.Dockerfile') }}
- name: Cache postgres v15 build
id: cache_pg_15
uses: actions/cache@v4
with:
path: pg_install/v15
key: v1-${{ runner.os }}-${{ runner.arch }}-${{ inputs.build-type }}-pg-${{ steps.pg_v15_rev.outputs.pg_rev }}-${{ hashFiles('Makefile', 'Dockerfile.build-tools') }}
key: v1-${{ runner.os }}-${{ runner.arch }}-${{ inputs.build-type }}-pg-${{ steps.pg_v15_rev.outputs.pg_rev }}-bookworm-${{ hashFiles('Makefile', 'build-tools.Dockerfile') }}
- name: Cache postgres v16 build
id: cache_pg_16
uses: actions/cache@v4
with:
path: pg_install/v16
key: v1-${{ runner.os }}-${{ runner.arch }}-${{ inputs.build-type }}-pg-${{ steps.pg_v16_rev.outputs.pg_rev }}-${{ hashFiles('Makefile', 'Dockerfile.build-tools') }}
key: v1-${{ runner.os }}-${{ runner.arch }}-${{ inputs.build-type }}-pg-${{ steps.pg_v16_rev.outputs.pg_rev }}-bookworm-${{ hashFiles('Makefile', 'build-tools.Dockerfile') }}
- name: Cache postgres v17 build
id: cache_pg_17
uses: actions/cache@v4
with:
path: pg_install/v17
key: v1-${{ runner.os }}-${{ runner.arch }}-${{ inputs.build-type }}-pg-${{ steps.pg_v17_rev.outputs.pg_rev }}-${{ hashFiles('Makefile', 'Dockerfile.build-tools') }}
key: v1-${{ runner.os }}-${{ runner.arch }}-${{ inputs.build-type }}-pg-${{ steps.pg_v17_rev.outputs.pg_rev }}-bookworm-${{ hashFiles('Makefile', 'build-tools.Dockerfile') }}
- name: Build postgres v14
if: steps.cache_pg_14.outputs.cache-hit != 'true'

View File

@@ -83,7 +83,7 @@ jobs:
runs-on: ${{ matrix.RUNNER }}
container:
image: neondatabase/build-tools:pinned
image: neondatabase/build-tools:pinned-bookworm
credentials:
username: ${{ secrets.NEON_DOCKERHUB_USERNAME }}
password: ${{ secrets.NEON_DOCKERHUB_PASSWORD }}
@@ -178,7 +178,7 @@ jobs:
runs-on: [ self-hosted, us-east-2, x64 ]
container:
image: neondatabase/build-tools:pinned
image: neondatabase/build-tools:pinned-bookworm
credentials:
username: ${{ secrets.NEON_DOCKERHUB_USERNAME }}
password: ${{ secrets.NEON_DOCKERHUB_PASSWORD }}
@@ -280,7 +280,7 @@ jobs:
region_id_default=${{ env.DEFAULT_REGION_ID }}
runner_default='["self-hosted", "us-east-2", "x64"]'
runner_azure='["self-hosted", "eastus2", "x64"]'
image_default="neondatabase/build-tools:pinned"
image_default="neondatabase/build-tools:pinned-bookworm"
matrix='{
"pg_version" : [
16
@@ -299,9 +299,9 @@ jobs:
"include": [{ "pg_version": 16, "region_id": "'"$region_id_default"'", "platform": "neonvm-captest-freetier", "db_size": "3gb" ,"runner": '"$runner_default"', "image": "'"$image_default"'" },
{ "pg_version": 16, "region_id": "'"$region_id_default"'", "platform": "neonvm-captest-new", "db_size": "10gb","runner": '"$runner_default"', "image": "'"$image_default"'" },
{ "pg_version": 16, "region_id": "'"$region_id_default"'", "platform": "neonvm-captest-new", "db_size": "50gb","runner": '"$runner_default"', "image": "'"$image_default"'" },
{ "pg_version": 16, "region_id": "azure-eastus2", "platform": "neonvm-azure-captest-freetier", "db_size": "3gb" ,"runner": '"$runner_azure"', "image": "neondatabase/build-tools:pinned" },
{ "pg_version": 16, "region_id": "azure-eastus2", "platform": "neonvm-azure-captest-new", "db_size": "10gb","runner": '"$runner_azure"', "image": "neondatabase/build-tools:pinned" },
{ "pg_version": 16, "region_id": "azure-eastus2", "platform": "neonvm-azure-captest-new", "db_size": "50gb","runner": '"$runner_azure"', "image": "neondatabase/build-tools:pinned" },
{ "pg_version": 16, "region_id": "azure-eastus2", "platform": "neonvm-azure-captest-freetier", "db_size": "3gb" ,"runner": '"$runner_azure"', "image": "neondatabase/build-tools:pinned-bookworm" },
{ "pg_version": 16, "region_id": "azure-eastus2", "platform": "neonvm-azure-captest-new", "db_size": "10gb","runner": '"$runner_azure"', "image": "neondatabase/build-tools:pinned-bookworm" },
{ "pg_version": 16, "region_id": "azure-eastus2", "platform": "neonvm-azure-captest-new", "db_size": "50gb","runner": '"$runner_azure"', "image": "neondatabase/build-tools:pinned-bookworm" },
{ "pg_version": 16, "region_id": "'"$region_id_default"'", "platform": "neonvm-captest-sharding-reuse", "db_size": "50gb","runner": '"$runner_default"', "image": "'"$image_default"'" }]
}'
@@ -665,7 +665,7 @@ jobs:
runs-on: [ self-hosted, us-east-2, x64 ]
container:
image: neondatabase/build-tools:pinned
image: neondatabase/build-tools:pinned-bookworm
credentials:
username: ${{ secrets.NEON_DOCKERHUB_USERNAME }}
password: ${{ secrets.NEON_DOCKERHUB_PASSWORD }}
@@ -772,7 +772,7 @@ jobs:
runs-on: [ self-hosted, us-east-2, x64 ]
container:
image: neondatabase/build-tools:pinned
image: neondatabase/build-tools:pinned-bookworm
credentials:
username: ${{ secrets.NEON_DOCKERHUB_USERNAME }}
password: ${{ secrets.NEON_DOCKERHUB_PASSWORD }}
@@ -877,7 +877,7 @@ jobs:
runs-on: [ self-hosted, us-east-2, x64 ]
container:
image: neondatabase/build-tools:pinned
image: neondatabase/build-tools:pinned-bookworm
credentials:
username: ${{ secrets.NEON_DOCKERHUB_USERNAME }}
password: ${{ secrets.NEON_DOCKERHUB_PASSWORD }}

View File

@@ -19,9 +19,16 @@ defaults:
run:
shell: bash -euo pipefail {0}
concurrency:
group: build-build-tools-image-${{ inputs.image-tag }}
cancel-in-progress: false
# The initial idea was to prevent the waste of resources by not re-building the `build-tools` image
# for the same tag in parallel workflow runs, and queue them to be skipped once we have
# the first image pushed to Docker registry, but GitHub's concurrency mechanism is not working as expected.
# GitHub can't have more than 1 job in a queue and removes the previous one, it causes failures if the dependent jobs.
#
# Ref https://github.com/orgs/community/discussions/41518
#
# concurrency:
# group: build-build-tools-image-${{ inputs.image-tag }}
# cancel-in-progress: false
# No permission for GITHUB_TOKEN by default; the **minimal required** set of permissions should be granted in each job.
permissions: {}
@@ -36,6 +43,7 @@ jobs:
strategy:
matrix:
debian-version: [ bullseye, bookworm ]
arch: [ x64, arm64 ]
runs-on: ${{ fromJson(format('["self-hosted", "{0}"]', matrix.arch == 'arm64' && 'large-arm64' || 'large')) }}
@@ -74,22 +82,22 @@ jobs:
- uses: docker/build-push-action@v6
with:
file: build-tools.Dockerfile
context: .
provenance: false
push: true
pull: true
file: Dockerfile.build-tools
cache-from: type=registry,ref=cache.neon.build/build-tools:cache-${{ matrix.arch }}
cache-to: ${{ github.ref_name == 'main' && format('type=registry,ref=cache.neon.build/build-tools:cache-{0},mode=max', matrix.arch) || '' }}
tags: neondatabase/build-tools:${{ inputs.image-tag }}-${{ matrix.arch }}
build-args: |
DEBIAN_VERSION=${{ matrix.debian-version }}
cache-from: type=registry,ref=cache.neon.build/build-tools:cache-${{ matrix.debian-version }}-${{ matrix.arch }}
cache-to: ${{ github.ref_name == 'main' && format('type=registry,ref=cache.neon.build/build-tools:cache-{0}-{1},mode=max', matrix.debian-version, matrix.arch) || '' }}
tags: |
neondatabase/build-tools:${{ inputs.image-tag }}-${{ matrix.debian-version }}-${{ matrix.arch }}
merge-images:
needs: [ build-image ]
runs-on: ubuntu-22.04
env:
IMAGE_TAG: ${{ inputs.image-tag }}
steps:
- uses: docker/login-action@v3
with:
@@ -97,7 +105,17 @@ jobs:
password: ${{ secrets.NEON_DOCKERHUB_PASSWORD }}
- name: Create multi-arch image
env:
DEFAULT_DEBIAN_VERSION: bullseye
IMAGE_TAG: ${{ inputs.image-tag }}
run: |
docker buildx imagetools create -t neondatabase/build-tools:${IMAGE_TAG} \
neondatabase/build-tools:${IMAGE_TAG}-x64 \
neondatabase/build-tools:${IMAGE_TAG}-arm64
for debian_version in bullseye bookworm; do
tags=("-t" "neondatabase/build-tools:${IMAGE_TAG}-${debian_version}")
if [ "${debian_version}" == "${DEFAULT_DEBIAN_VERSION}" ]; then
tags+=("-t" "neondatabase/build-tools:${IMAGE_TAG}")
fi
docker buildx imagetools create "${tags[@]}" \
neondatabase/build-tools:${IMAGE_TAG}-${debian_version}-x64 \
neondatabase/build-tools:${IMAGE_TAG}-${debian_version}-arm64
done

View File

@@ -92,7 +92,7 @@ jobs:
needs: [ check-permissions, build-build-tools-image ]
runs-on: [ self-hosted, small ]
container:
image: ${{ needs.build-build-tools-image.outputs.image }}
image: ${{ needs.build-build-tools-image.outputs.image }}-bookworm
credentials:
username: ${{ secrets.NEON_DOCKERHUB_USERNAME }}
password: ${{ secrets.NEON_DOCKERHUB_PASSWORD }}
@@ -106,7 +106,7 @@ jobs:
uses: actions/cache@v4
with:
path: ~/.cache/pypoetry/virtualenvs
key: v2-${{ runner.os }}-${{ runner.arch }}-python-deps-${{ hashFiles('poetry.lock') }}
key: v2-${{ runner.os }}-${{ runner.arch }}-python-deps-bookworm-${{ hashFiles('poetry.lock') }}
- name: Install Python deps
run: ./scripts/pysync
@@ -120,6 +120,24 @@ jobs:
- name: Run mypy to check types
run: poetry run mypy .
check-codestyle-jsonnet:
needs: [ check-permissions, build-build-tools-image ]
runs-on: [ self-hosted, small ]
container:
image: ${{ needs.build-build-tools-image.outputs.image }}
credentials:
username: ${{ secrets.NEON_DOCKERHUB_USERNAME }}
password: ${{ secrets.NEON_DOCKERHUB_PASSWORD }}
options: --init
steps:
- name: Checkout
uses: actions/checkout@v4
- name: Check Jsonnet code formatting
run: |
make -C compute jsonnetfmt-test
# Check that the vendor/postgres-* submodules point to the
# corresponding REL_*_STABLE_neon branches.
check-submodules:
@@ -181,7 +199,7 @@ jobs:
runs-on: ${{ fromJson(format('["self-hosted", "{0}"]', matrix.arch == 'arm64' && 'small-arm64' || 'small')) }}
container:
image: ${{ needs.build-build-tools-image.outputs.image }}
image: ${{ needs.build-build-tools-image.outputs.image }}-bookworm
credentials:
username: ${{ secrets.NEON_DOCKERHUB_USERNAME }}
password: ${{ secrets.NEON_DOCKERHUB_PASSWORD }}
@@ -193,16 +211,15 @@ jobs:
with:
submodules: true
# Disabled for now
# - name: Restore cargo deps cache
# id: cache_cargo
# uses: actions/cache@v4
# with:
# path: |
# !~/.cargo/registry/src
# ~/.cargo/git/
# target/
# key: v1-${{ runner.os }}-${{ runner.arch }}-cargo-clippy-${{ hashFiles('rust-toolchain.toml') }}-${{ hashFiles('Cargo.lock') }}
- name: Cache cargo deps
uses: actions/cache@v4
with:
path: |
~/.cargo/registry
!~/.cargo/registry/src
~/.cargo/git
target
key: v1-${{ runner.os }}-${{ runner.arch }}-cargo-${{ hashFiles('./Cargo.lock') }}-${{ hashFiles('./rust-toolchain.toml') }}-rust
# Some of our rust modules use FFI and need those to be checked
- name: Get postgres headers
@@ -262,7 +279,7 @@ jobs:
uses: ./.github/workflows/_build-and-test-locally.yml
with:
arch: ${{ matrix.arch }}
build-tools-image: ${{ needs.build-build-tools-image.outputs.image }}
build-tools-image: ${{ needs.build-build-tools-image.outputs.image }}-bookworm
build-tag: ${{ needs.tag.outputs.build-tag }}
build-type: ${{ matrix.build-type }}
# Run tests on all Postgres versions in release builds and only on the latest version in debug builds
@@ -277,7 +294,7 @@ jobs:
needs: [ check-permissions, build-build-tools-image ]
runs-on: [ self-hosted, small ]
container:
image: ${{ needs.build-build-tools-image.outputs.image }}
image: ${{ needs.build-build-tools-image.outputs.image }}-bookworm
credentials:
username: ${{ secrets.NEON_DOCKERHUB_USERNAME }}
password: ${{ secrets.NEON_DOCKERHUB_PASSWORD }}
@@ -290,7 +307,7 @@ jobs:
uses: actions/cache@v4
with:
path: ~/.cache/pypoetry/virtualenvs
key: v1-${{ runner.os }}-${{ runner.arch }}-python-deps-${{ hashFiles('poetry.lock') }}
key: v2-${{ runner.os }}-${{ runner.arch }}-python-deps-bookworm-${{ hashFiles('poetry.lock') }}
- name: Install Python deps
run: ./scripts/pysync
@@ -310,7 +327,7 @@ jobs:
needs: [ check-permissions, build-and-test-locally, build-build-tools-image, get-benchmarks-durations ]
runs-on: [ self-hosted, small ]
container:
image: ${{ needs.build-build-tools-image.outputs.image }}
image: ${{ needs.build-build-tools-image.outputs.image }}-bookworm
credentials:
username: ${{ secrets.NEON_DOCKERHUB_USERNAME }}
password: ${{ secrets.NEON_DOCKERHUB_PASSWORD }}
@@ -368,7 +385,7 @@ jobs:
runs-on: [ self-hosted, small ]
container:
image: ${{ needs.build-build-tools-image.outputs.image }}
image: ${{ needs.build-build-tools-image.outputs.image }}-bookworm
credentials:
username: ${{ secrets.NEON_DOCKERHUB_USERNAME }}
password: ${{ secrets.NEON_DOCKERHUB_PASSWORD }}
@@ -416,7 +433,7 @@ jobs:
needs: [ check-permissions, build-build-tools-image, build-and-test-locally ]
runs-on: [ self-hosted, small ]
container:
image: ${{ needs.build-build-tools-image.outputs.image }}
image: ${{ needs.build-build-tools-image.outputs.image }}-bookworm
credentials:
username: ${{ secrets.NEON_DOCKERHUB_USERNAME }}
password: ${{ secrets.NEON_DOCKERHUB_PASSWORD }}
@@ -560,15 +577,16 @@ jobs:
ADDITIONAL_RUSTFLAGS=${{ matrix.arch == 'arm64' && '-Ctarget-feature=+lse -Ctarget-cpu=neoverse-n1' || '' }}
GIT_VERSION=${{ github.event.pull_request.head.sha || github.sha }}
BUILD_TAG=${{ needs.tag.outputs.build-tag }}
TAG=${{ needs.build-build-tools-image.outputs.image-tag }}
TAG=${{ needs.build-build-tools-image.outputs.image-tag }}-bookworm
DEBIAN_VERSION=bookworm
provenance: false
push: true
pull: true
file: Dockerfile
cache-from: type=registry,ref=cache.neon.build/neon:cache-${{ matrix.arch }}
cache-to: ${{ github.ref_name == 'main' && format('type=registry,ref=cache.neon.build/neon:cache-{0},mode=max', matrix.arch) || '' }}
cache-from: type=registry,ref=cache.neon.build/neon:cache-bookworm-${{ matrix.arch }}
cache-to: ${{ github.ref_name == 'main' && format('type=registry,ref=cache.neon.build/neon:cache-{0}-{1},mode=max', 'bookworm', matrix.arch) || '' }}
tags: |
neondatabase/neon:${{ needs.tag.outputs.build-tag }}-${{ matrix.arch }}
neondatabase/neon:${{ needs.tag.outputs.build-tag }}-bookworm-${{ matrix.arch }}
neon-image:
needs: [ neon-image-arch, tag ]
@@ -583,8 +601,9 @@ jobs:
- name: Create multi-arch image
run: |
docker buildx imagetools create -t neondatabase/neon:${{ needs.tag.outputs.build-tag }} \
neondatabase/neon:${{ needs.tag.outputs.build-tag }}-x64 \
neondatabase/neon:${{ needs.tag.outputs.build-tag }}-arm64
-t neondatabase/neon:${{ needs.tag.outputs.build-tag }}-bookworm \
neondatabase/neon:${{ needs.tag.outputs.build-tag }}-bookworm-x64 \
neondatabase/neon:${{ needs.tag.outputs.build-tag }}-bookworm-arm64
- uses: docker/login-action@v3
with:
@@ -605,17 +624,16 @@ jobs:
version:
# Much data was already generated on old PG versions with bullseye's
# libraries, the locales of which can cause data incompatibilities.
# However, new PG versions should check if they can be built on newer
# images, as that reduces the support burden of old and ancient
# distros.
# However, new PG versions should be build on newer images,
# as that reduces the support burden of old and ancient distros.
- pg: v14
debian: bullseye-slim
debian: bullseye
- pg: v15
debian: bullseye-slim
debian: bullseye
- pg: v16
debian: bullseye-slim
debian: bullseye
- pg: v17
debian: bookworm-slim
debian: bookworm
arch: [ x64, arm64 ]
runs-on: ${{ fromJson(format('["self-hosted", "{0}"]', matrix.arch == 'arm64' && 'large-arm64' || 'large')) }}
@@ -660,16 +678,16 @@ jobs:
GIT_VERSION=${{ github.event.pull_request.head.sha || github.sha }}
PG_VERSION=${{ matrix.version.pg }}
BUILD_TAG=${{ needs.tag.outputs.build-tag }}
TAG=${{ needs.build-build-tools-image.outputs.image-tag }}
DEBIAN_FLAVOR=${{ matrix.version.debian }}
TAG=${{ needs.build-build-tools-image.outputs.image-tag }}-${{ matrix.version.debian }}
DEBIAN_VERSION=${{ matrix.version.debian }}
provenance: false
push: true
pull: true
file: compute/Dockerfile.compute-node
cache-from: type=registry,ref=cache.neon.build/compute-node-${{ matrix.version.pg }}:cache-${{ matrix.arch }}
cache-to: ${{ github.ref_name == 'main' && format('type=registry,ref=cache.neon.build/compute-node-{0}:cache-{1},mode=max', matrix.version.pg, matrix.arch) || '' }}
file: compute/compute-node.Dockerfile
cache-from: type=registry,ref=cache.neon.build/compute-node-${{ matrix.version.pg }}:cache-${{ matrix.version.debian }}-${{ matrix.arch }}
cache-to: ${{ github.ref_name == 'main' && format('type=registry,ref=cache.neon.build/compute-node-{0}:cache-{1}-{2},mode=max', matrix.version.pg, matrix.version.debian, matrix.arch) || '' }}
tags: |
neondatabase/compute-node-${{ matrix.version.pg }}:${{ needs.tag.outputs.build-tag }}-${{ matrix.arch }}
neondatabase/compute-node-${{ matrix.version.pg }}:${{ needs.tag.outputs.build-tag }}-${{ matrix.version.debian }}-${{ matrix.arch }}
- name: Build neon extensions test image
if: matrix.version.pg == 'v16'
@@ -680,17 +698,17 @@ jobs:
GIT_VERSION=${{ github.event.pull_request.head.sha || github.sha }}
PG_VERSION=${{ matrix.version.pg }}
BUILD_TAG=${{ needs.tag.outputs.build-tag }}
TAG=${{ needs.build-build-tools-image.outputs.image-tag }}
DEBIAN_FLAVOR=${{ matrix.version.debian }}
TAG=${{ needs.build-build-tools-image.outputs.image-tag }}-${{ matrix.version.debian }}
DEBIAN_VERSION=${{ matrix.version.debian }}
provenance: false
push: true
pull: true
file: compute/Dockerfile.compute-node
file: compute/compute-node.Dockerfile
target: neon-pg-ext-test
cache-from: type=registry,ref=cache.neon.build/neon-test-extensions-${{ matrix.version.pg }}:cache-${{ matrix.arch }}
cache-to: ${{ github.ref_name == 'main' && format('type=registry,ref=cache.neon.build/neon-test-extensions-{0}:cache-{1},mode=max', matrix.version.pg, matrix.arch) || '' }}
cache-from: type=registry,ref=cache.neon.build/neon-test-extensions-${{ matrix.version.pg }}:cache-${{ matrix.version.debian }}-${{ matrix.arch }}
cache-to: ${{ github.ref_name == 'main' && format('type=registry,ref=cache.neon.build/neon-test-extensions-{0}:cache-{1}-{2},mode=max', matrix.version.pg, matrix.version.debian, matrix.arch) || '' }}
tags: |
neondatabase/neon-test-extensions-${{ matrix.version.pg }}:${{needs.tag.outputs.build-tag}}-${{ matrix.arch }}
neondatabase/neon-test-extensions-${{ matrix.version.pg }}:${{needs.tag.outputs.build-tag}}-${{ matrix.version.debian }}-${{ matrix.arch }}
- name: Build compute-tools image
# compute-tools are Postgres independent, so build it only once
@@ -705,14 +723,16 @@ jobs:
build-args: |
GIT_VERSION=${{ github.event.pull_request.head.sha || github.sha }}
BUILD_TAG=${{ needs.tag.outputs.build-tag }}
TAG=${{ needs.build-build-tools-image.outputs.image-tag }}
DEBIAN_FLAVOR=${{ matrix.version.debian }}
TAG=${{ needs.build-build-tools-image.outputs.image-tag }}-${{ matrix.version.debian }}
DEBIAN_VERSION=${{ matrix.version.debian }}
provenance: false
push: true
pull: true
file: compute/Dockerfile.compute-node
file: compute/compute-node.Dockerfile
cache-from: type=registry,ref=cache.neon.build/neon-test-extensions-${{ matrix.version.pg }}:cache-${{ matrix.version.debian }}-${{ matrix.arch }}
cache-to: ${{ github.ref_name == 'main' && format('type=registry,ref=cache.neon.build/compute-tools-{0}:cache-{1}-{2},mode=max', matrix.version.pg, matrix.version.debian, matrix.arch) || '' }}
tags: |
neondatabase/compute-tools:${{ needs.tag.outputs.build-tag }}-${{ matrix.arch }}
neondatabase/compute-tools:${{ needs.tag.outputs.build-tag }}-${{ matrix.version.debian }}-${{ matrix.arch }}
compute-node-image:
needs: [ compute-node-image-arch, tag ]
@@ -720,7 +740,16 @@ jobs:
strategy:
matrix:
version: [ v14, v15, v16, v17 ]
version:
# see the comment for `compute-node-image-arch` job
- pg: v14
debian: bullseye
- pg: v15
debian: bullseye
- pg: v16
debian: bullseye
- pg: v17
debian: bookworm
steps:
- uses: docker/login-action@v3
@@ -730,23 +759,26 @@ jobs:
- name: Create multi-arch compute-node image
run: |
docker buildx imagetools create -t neondatabase/compute-node-${{ matrix.version }}:${{ needs.tag.outputs.build-tag }} \
neondatabase/compute-node-${{ matrix.version }}:${{ needs.tag.outputs.build-tag }}-x64 \
neondatabase/compute-node-${{ matrix.version }}:${{ needs.tag.outputs.build-tag }}-arm64
docker buildx imagetools create -t neondatabase/compute-node-${{ matrix.version.pg }}:${{ needs.tag.outputs.build-tag }} \
-t neondatabase/compute-node-${{ matrix.version.pg }}:${{ needs.tag.outputs.build-tag }}-${{ matrix.version.debian }} \
neondatabase/compute-node-${{ matrix.version.pg }}:${{ needs.tag.outputs.build-tag }}-${{ matrix.version.debian }}-x64 \
neondatabase/compute-node-${{ matrix.version.pg }}:${{ needs.tag.outputs.build-tag }}-${{ matrix.version.debian }}-arm64
- name: Create multi-arch neon-test-extensions image
if: matrix.version == 'v16'
if: matrix.version.pg == 'v16'
run: |
docker buildx imagetools create -t neondatabase/neon-test-extensions-${{ matrix.version }}:${{ needs.tag.outputs.build-tag }} \
neondatabase/neon-test-extensions-${{ matrix.version }}:${{ needs.tag.outputs.build-tag }}-x64 \
neondatabase/neon-test-extensions-${{ matrix.version }}:${{ needs.tag.outputs.build-tag }}-arm64
docker buildx imagetools create -t neondatabase/neon-test-extensions-${{ matrix.version.pg }}:${{ needs.tag.outputs.build-tag }} \
-t neondatabase/neon-test-extensions-${{ matrix.version.pg }}:${{ needs.tag.outputs.build-tag }}-${{ matrix.version.debian }} \
neondatabase/neon-test-extensions-${{ matrix.version.pg }}:${{ needs.tag.outputs.build-tag }}-${{ matrix.version.debian }}-x64 \
neondatabase/neon-test-extensions-${{ matrix.version.pg }}:${{ needs.tag.outputs.build-tag }}-${{ matrix.version.debian }}-arm64
- name: Create multi-arch compute-tools image
if: matrix.version == 'v17'
if: matrix.version.pg == 'v16'
run: |
docker buildx imagetools create -t neondatabase/compute-tools:${{ needs.tag.outputs.build-tag }} \
neondatabase/compute-tools:${{ needs.tag.outputs.build-tag }}-x64 \
neondatabase/compute-tools:${{ needs.tag.outputs.build-tag }}-arm64
-t neondatabase/compute-tools:${{ needs.tag.outputs.build-tag }}-${{ matrix.version.debian }} \
neondatabase/compute-tools:${{ needs.tag.outputs.build-tag }}-${{ matrix.version.debian }}-x64 \
neondatabase/compute-tools:${{ needs.tag.outputs.build-tag }}-${{ matrix.version.debian }}-arm64
- uses: docker/login-action@v3
with:
@@ -754,13 +786,13 @@ jobs:
username: ${{ secrets.AWS_ACCESS_KEY_DEV }}
password: ${{ secrets.AWS_SECRET_KEY_DEV }}
- name: Push multi-arch compute-node-${{ matrix.version }} image to ECR
- name: Push multi-arch compute-node-${{ matrix.version.pg }} image to ECR
run: |
docker buildx imagetools create -t 369495373322.dkr.ecr.eu-central-1.amazonaws.com/compute-node-${{ matrix.version }}:${{ needs.tag.outputs.build-tag }} \
neondatabase/compute-node-${{ matrix.version }}:${{ needs.tag.outputs.build-tag }}
docker buildx imagetools create -t 369495373322.dkr.ecr.eu-central-1.amazonaws.com/compute-node-${{ matrix.version.pg }}:${{ needs.tag.outputs.build-tag }} \
neondatabase/compute-node-${{ matrix.version.pg }}:${{ needs.tag.outputs.build-tag }}
- name: Push multi-arch compute-tools image to ECR
if: matrix.version == 'v17'
if: matrix.version.pg == 'v16'
run: |
docker buildx imagetools create -t 369495373322.dkr.ecr.eu-central-1.amazonaws.com/compute-tools:${{ needs.tag.outputs.build-tag }} \
neondatabase/compute-tools:${{ needs.tag.outputs.build-tag }}
@@ -771,7 +803,16 @@ jobs:
strategy:
fail-fast: false
matrix:
version: [ v14, v15, v16, v17 ]
version:
# see the comment for `compute-node-image-arch` job
- pg: v14
debian: bullseye
- pg: v15
debian: bullseye
- pg: v16
debian: bullseye
- pg: v17
debian: bookworm
env:
VM_BUILDER_VERSION: v0.35.0
@@ -793,18 +834,18 @@ jobs:
# it won't have the proper authentication (written at v0.6.0)
- name: Pulling compute-node image
run: |
docker pull neondatabase/compute-node-${{ matrix.version }}:${{ needs.tag.outputs.build-tag }}
docker pull neondatabase/compute-node-${{ matrix.version.pg }}:${{ needs.tag.outputs.build-tag }}
- name: Build vm image
run: |
./vm-builder \
-spec=compute/vm-image-spec.yaml \
-src=neondatabase/compute-node-${{ matrix.version }}:${{ needs.tag.outputs.build-tag }} \
-dst=neondatabase/vm-compute-node-${{ matrix.version }}:${{ needs.tag.outputs.build-tag }}
-spec=compute/vm-image-spec-${{ matrix.version.debian }}.yaml \
-src=neondatabase/compute-node-${{ matrix.version.pg }}:${{ needs.tag.outputs.build-tag }} \
-dst=neondatabase/vm-compute-node-${{ matrix.version.pg }}:${{ needs.tag.outputs.build-tag }}
- name: Pushing vm-compute-node image
run: |
docker push neondatabase/vm-compute-node-${{ matrix.version }}:${{ needs.tag.outputs.build-tag }}
docker push neondatabase/vm-compute-node-${{ matrix.version.pg }}:${{ needs.tag.outputs.build-tag }}
test-images:
needs: [ check-permissions, tag, neon-image, compute-node-image ]
@@ -1059,7 +1100,6 @@ jobs:
run: |
if [[ "$GITHUB_REF_NAME" == "main" ]]; then
gh workflow --repo neondatabase/infra run deploy-dev.yml --ref main -f branch=main -f dockerTag=${{needs.tag.outputs.build-tag}} -f deployPreprodRegion=false
gh workflow --repo neondatabase/azure run deploy.yml -f dockerTag=${{needs.tag.outputs.build-tag}}
elif [[ "$GITHUB_REF_NAME" == "release" ]]; then
gh workflow --repo neondatabase/infra run deploy-dev.yml --ref main \
-f deployPgSniRouter=false \

View File

@@ -31,7 +31,7 @@ jobs:
id: get-build-tools-tag
env:
IMAGE_TAG: |
${{ hashFiles('Dockerfile.build-tools',
${{ hashFiles('build-tools.Dockerfile',
'.github/workflows/check-build-tools-image.yml',
'.github/workflows/build-build-tools-image.yml') }}
run: |

View File

@@ -31,7 +31,7 @@ jobs:
runs-on: us-east-2
container:
image: neondatabase/build-tools:pinned
image: neondatabase/build-tools:pinned-bookworm
options: --init
steps:

View File

@@ -155,7 +155,7 @@ jobs:
github.ref_name == 'main'
runs-on: [ self-hosted, large ]
container:
image: ${{ needs.build-build-tools-image.outputs.image }}
image: ${{ needs.build-build-tools-image.outputs.image }}-bookworm
credentials:
username: ${{ secrets.NEON_DOCKERHUB_USERNAME }}
password: ${{ secrets.NEON_DOCKERHUB_PASSWORD }}

View File

@@ -55,7 +55,7 @@ jobs:
runs-on: ubuntu-22.04
container:
image: ${{ needs.build-build-tools-image.outputs.image }}
image: ${{ needs.build-build-tools-image.outputs.image }}-bookworm
credentials:
username: ${{ secrets.NEON_DOCKERHUB_USERNAME }}
password: ${{ secrets.NEON_DOCKERHUB_PASSWORD }}
@@ -150,7 +150,7 @@ jobs:
runs-on: ubuntu-22.04
container:
image: ${{ needs.build-build-tools-image.outputs.image }}
image: ${{ needs.build-build-tools-image.outputs.image }}-bookworm
credentials:
username: ${{ secrets.NEON_DOCKERHUB_USERNAME }}
password: ${{ secrets.NEON_DOCKERHUB_PASSWORD }}

View File

@@ -71,7 +71,6 @@ jobs:
steps:
- uses: docker/login-action@v3
with:
username: ${{ secrets.NEON_DOCKERHUB_USERNAME }}
password: ${{ secrets.NEON_DOCKERHUB_PASSWORD }}
@@ -94,8 +93,22 @@ jobs:
az acr login --name=neoneastus2
- name: Tag build-tools with `${{ env.TO_TAG }}` in Docker Hub, ECR, and ACR
env:
DEFAULT_DEBIAN_VERSION: bullseye
run: |
docker buildx imagetools create -t 369495373322.dkr.ecr.eu-central-1.amazonaws.com/build-tools:${TO_TAG} \
-t neoneastus2.azurecr.io/neondatabase/build-tools:${TO_TAG} \
-t neondatabase/build-tools:${TO_TAG} \
neondatabase/build-tools:${FROM_TAG}
for debian_version in bullseye bookworm; do
tags=()
tags+=("-t" "neondatabase/build-tools:${TO_TAG}-${debian_version}")
tags+=("-t" "369495373322.dkr.ecr.eu-central-1.amazonaws.com/build-tools:${TO_TAG}-${debian_version}")
tags+=("-t" "neoneastus2.azurecr.io/neondatabase/build-tools:${TO_TAG}-${debian_version}")
if [ "${debian_version}" == "${DEFAULT_DEBIAN_VERSION}" ]; then
tags+=("-t" "neondatabase/build-tools:${TO_TAG}")
tags+=("-t" "369495373322.dkr.ecr.eu-central-1.amazonaws.com/build-tools:${TO_TAG}")
tags+=("-t" "neoneastus2.azurecr.io/neondatabase/build-tools:${TO_TAG}")
fi
docker buildx imagetools create "${tags[@]}" \
neondatabase/build-tools:${FROM_TAG}-${debian_version}
done

View File

@@ -33,7 +33,7 @@ jobs:
actions: read
steps:
- name: Export GH Workflow Stats
uses: fedordikarev/gh-workflow-stats-action@v0.1.2
uses: neondatabase/gh-workflow-stats-action@v0.1.4
with:
DB_URI: ${{ secrets.GH_REPORT_STATS_DB_RW_CONNSTR }}
DB_TABLE: "gh_workflow_stats_neon"

View File

@@ -112,7 +112,7 @@ jobs:
# This isn't exhaustive, just the paths that are most directly compute-related.
# For example, compute_ctl also depends on libs/utils, but we don't trigger
# an e2e run on that.
vendor/*|pgxn/*|compute_tools/*|libs/vm_monitor/*|compute/Dockerfile.compute-node)
vendor/*|pgxn/*|compute_tools/*|libs/vm_monitor/*|compute/compute-node.Dockerfile)
platforms=$(echo "${platforms}" | jq --compact-output '. += ["k8s-neonvm"] | unique')
;;
*)

230
Cargo.lock generated
View File

@@ -148,9 +148,9 @@ dependencies = [
[[package]]
name = "asn1-rs"
version = "0.5.2"
version = "0.6.2"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "7f6fd5ddaf0351dff5b8da21b2fb4ff8e08ddd02857f0bf69c47639106c0fff0"
checksum = "5493c3bedbacf7fd7382c6346bbd66687d12bbaad3a89a2d2c303ee6cf20b048"
dependencies = [
"asn1-rs-derive",
"asn1-rs-impl",
@@ -164,25 +164,25 @@ dependencies = [
[[package]]
name = "asn1-rs-derive"
version = "0.4.0"
version = "0.5.1"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "726535892e8eae7e70657b4c8ea93d26b8553afb1ce617caee529ef96d7dee6c"
checksum = "965c2d33e53cb6b267e148a4cb0760bc01f4904c1cd4bb4002a085bb016d1490"
dependencies = [
"proc-macro2",
"quote",
"syn 1.0.109",
"syn 2.0.52",
"synstructure",
]
[[package]]
name = "asn1-rs-impl"
version = "0.1.0"
version = "0.2.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "2777730b2039ac0f95f093556e61b6d26cebed5393ca6f152717777cec3a42ed"
checksum = "7b18050c2cd6fe86c3a76584ef5e0baf286d038cda203eb6223df2cc413565f7"
dependencies = [
"proc-macro2",
"quote",
"syn 1.0.109",
"syn 2.0.52",
]
[[package]]
@@ -310,6 +310,33 @@ dependencies = [
"zeroize",
]
[[package]]
name = "aws-lc-rs"
version = "1.9.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "2f95446d919226d587817a7d21379e6eb099b97b45110a7f272a444ca5c54070"
dependencies = [
"aws-lc-sys",
"mirai-annotations",
"paste",
"zeroize",
]
[[package]]
name = "aws-lc-sys"
version = "0.21.2"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "b3ddc4a5b231dd6958b140ff3151b6412b3f4321fab354f399eec8f14b06df62"
dependencies = [
"bindgen 0.69.5",
"cc",
"cmake",
"dunce",
"fs_extra",
"libc",
"paste",
]
[[package]]
name = "aws-runtime"
version = "1.4.3"
@@ -595,7 +622,7 @@ dependencies = [
"once_cell",
"pin-project-lite",
"pin-utils",
"rustls 0.21.11",
"rustls 0.21.12",
"tokio",
"tracing",
]
@@ -915,6 +942,29 @@ dependencies = [
"serde",
]
[[package]]
name = "bindgen"
version = "0.69.5"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "271383c67ccabffb7381723dea0672a673f292304fcb45c01cc648c7a8d58088"
dependencies = [
"bitflags 2.4.1",
"cexpr",
"clang-sys",
"itertools 0.10.5",
"lazy_static",
"lazycell",
"log",
"prettyplease",
"proc-macro2",
"quote",
"regex",
"rustc-hash",
"shlex",
"syn 2.0.52",
"which",
]
[[package]]
name = "bindgen"
version = "0.70.1"
@@ -924,7 +974,7 @@ dependencies = [
"bitflags 2.4.1",
"cexpr",
"clang-sys",
"itertools 0.12.1",
"itertools 0.10.5",
"log",
"prettyplease",
"proc-macro2",
@@ -1038,12 +1088,13 @@ checksum = "37b2a672a2cb129a2e41c10b1224bb368f9f37a2b16b612598138befd7b37eb5"
[[package]]
name = "cc"
version = "1.0.83"
version = "1.1.30"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "f1174fb0b6ec23863f8b971027804a42614e347eafb0a95bf0b12cdae21fc4d0"
checksum = "b16803a61b81d9eabb7eae2588776c4c1e584b738ede45fdbb4c972cec1e9945"
dependencies = [
"jobserver",
"libc",
"shlex",
]
[[package]]
@@ -1169,6 +1220,15 @@ version = "0.5.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "2da6da31387c7e4ef160ffab6d5e7f00c42626fe39aea70a7b0f1773f7dd6c1b"
[[package]]
name = "cmake"
version = "0.1.51"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "fb1e43aa7fd152b1f968787f7dbcdeb306d1867ff373c69955211876c053f91a"
dependencies = [
"cc",
]
[[package]]
name = "colorchoice"
version = "1.0.0"
@@ -1624,9 +1684,9 @@ dependencies = [
[[package]]
name = "der-parser"
version = "8.2.0"
version = "9.0.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "dbd676fbbab537128ef0278adb5576cf363cff6aa22a7b24effe97347cfab61e"
checksum = "5cd0a5c643689626bec213c4d8bd4d96acc8ffdb4ad4bb6bc16abf27d5f4b553"
dependencies = [
"asn1-rs",
"displaydoc",
@@ -1755,6 +1815,12 @@ dependencies = [
"syn 2.0.52",
]
[[package]]
name = "dunce"
version = "1.0.5"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "92773504d58c093f6de2459af4af33faa518c13451eb8f2b5698ed3d36e7c813"
[[package]]
name = "dyn-clone"
version = "1.0.14"
@@ -2059,6 +2125,12 @@ dependencies = [
"tokio-util",
]
[[package]]
name = "fs_extra"
version = "1.3.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "42703706b716c37f96a77aea830392ad231f44c9e9a67872fa5548707e11b11c"
[[package]]
name = "fsevent-sys"
version = "4.1.0"
@@ -2412,6 +2484,15 @@ dependencies = [
"digest",
]
[[package]]
name = "home"
version = "0.5.9"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "e3d1354bf6b7235cb4a0576c2619fd4ed18183f689b12b006a0ee7329eeff9a5"
dependencies = [
"windows-sys 0.52.0",
]
[[package]]
name = "hostname"
version = "0.4.0"
@@ -2581,7 +2662,7 @@ dependencies = [
"http 0.2.9",
"hyper 0.14.30",
"log",
"rustls 0.21.11",
"rustls 0.21.12",
"rustls-native-certs 0.6.2",
"tokio",
"tokio-rustls 0.24.0",
@@ -2695,6 +2776,7 @@ checksum = "ad227c3af19d4914570ad36d30409928b75967c298feb9ea1969db3a610bb14e"
dependencies = [
"equivalent",
"hashbrown 0.14.5",
"serde",
]
[[package]]
@@ -2794,15 +2876,15 @@ dependencies = [
[[package]]
name = "itoa"
version = "1.0.6"
version = "1.0.11"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "453ad9f582a441959e5f0d088b02ce04cfe8d51a8eaf077f12ac6d3e94164ca6"
checksum = "49f1f14873335454500d59611f1cf4a4b0f786f9ac11f4312a78e4cf2566695b"
[[package]]
name = "jobserver"
version = "0.1.26"
version = "0.1.32"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "936cfd212a0155903bcbc060e316fb6cc7cbf2e1907329391ebadc1fe0ce77c2"
checksum = "48d1dbcbbeb6a7fec7e059840aa538bd62aaccf972c7346c4d9d2059312853d0"
dependencies = [
"libc",
]
@@ -2906,6 +2988,12 @@ dependencies = [
"spin",
]
[[package]]
name = "lazycell"
version = "1.3.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "830d08ce1d1d941e6b30645f1a0eb5643013d835ce3779a5fc208261dbe10f55"
[[package]]
name = "libc"
version = "0.2.150"
@@ -3136,6 +3224,12 @@ dependencies = [
"windows-sys 0.48.0",
]
[[package]]
name = "mirai-annotations"
version = "1.12.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "c9be0862c1b3f26a88803c4a49de6889c10e608b3ee9344e6ef5b45fb37ad3d1"
[[package]]
name = "multimap"
version = "0.8.3"
@@ -3355,9 +3449,9 @@ dependencies = [
[[package]]
name = "oid-registry"
version = "0.6.1"
version = "0.7.1"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "9bedf36ffb6ba96c2eb7144ef6270557b52e54b20c0a8e1eb2ff99a6c6959bff"
checksum = "a8d8034d9489cdaf79228eb9f6a3b8d7bb32ba00d6645ebd48eef4077ceb5bd9"
dependencies = [
"asn1-rs",
]
@@ -4052,14 +4146,14 @@ dependencies = [
"bytes",
"once_cell",
"pq_proto",
"rustls 0.22.4",
"rustls 0.23.7",
"rustls-pemfile 2.1.1",
"serde",
"thiserror",
"tokio",
"tokio-postgres",
"tokio-postgres-rustls",
"tokio-rustls 0.25.0",
"tokio-rustls 0.26.0",
"tokio-util",
"tracing",
]
@@ -4081,7 +4175,7 @@ name = "postgres_ffi"
version = "0.1.0"
dependencies = [
"anyhow",
"bindgen",
"bindgen 0.70.1",
"bytes",
"crc32c",
"env_logger",
@@ -4218,7 +4312,7 @@ checksum = "0c1318b19085f08681016926435853bbf7858f9c082d0999b80550ff5d9abe15"
dependencies = [
"bytes",
"heck 0.5.0",
"itertools 0.12.1",
"itertools 0.10.5",
"log",
"multimap",
"once_cell",
@@ -4238,7 +4332,7 @@ source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "e9552f850d5f0964a4e4d0bf306459ac29323ddfbae05e35a7c0d35cb0803cc5"
dependencies = [
"anyhow",
"itertools 0.12.1",
"itertools 0.10.5",
"proc-macro2",
"quote",
"syn 2.0.52",
@@ -4296,6 +4390,7 @@ dependencies = [
"indexmap 2.0.1",
"ipnet",
"itertools 0.10.5",
"itoa",
"jose-jwa",
"jose-jwk",
"lasso",
@@ -4325,8 +4420,8 @@ dependencies = [
"rsa",
"rstest",
"rustc-hash",
"rustls 0.22.4",
"rustls-native-certs 0.7.0",
"rustls 0.23.7",
"rustls-native-certs 0.8.0",
"rustls-pemfile 2.1.1",
"scopeguard",
"serde",
@@ -4343,7 +4438,7 @@ dependencies = [
"tokio",
"tokio-postgres",
"tokio-postgres-rustls",
"tokio-rustls 0.25.0",
"tokio-rustls 0.26.0",
"tokio-tungstenite",
"tokio-util",
"tracing",
@@ -4507,12 +4602,13 @@ dependencies = [
[[package]]
name = "rcgen"
version = "0.12.1"
version = "0.13.1"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "48406db8ac1f3cbc7dcdb56ec355343817958a356ff430259bb07baf7607e1e1"
checksum = "54077e1872c46788540de1ea3d7f4ccb1983d12f9aa909b234468676c1a36779"
dependencies = [
"pem",
"ring",
"rustls-pki-types",
"time",
"yasna",
]
@@ -4646,9 +4742,10 @@ dependencies = [
"camino-tempfile",
"futures",
"futures-util",
"http-body-util",
"http-types",
"humantime-serde",
"hyper 0.14.30",
"hyper 1.4.1",
"itertools 0.10.5",
"metrics",
"once_cell",
@@ -4690,7 +4787,7 @@ dependencies = [
"once_cell",
"percent-encoding",
"pin-project-lite",
"rustls 0.21.11",
"rustls 0.21.12",
"rustls-pemfile 1.0.2",
"serde",
"serde_json",
@@ -4988,9 +5085,9 @@ dependencies = [
[[package]]
name = "rustls"
version = "0.21.11"
version = "0.21.12"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "7fecbfb7b1444f477b345853b1fce097a2c6fb637b2bfb87e6bc5db0f043fae4"
checksum = "3f56a14d1f48b391359b22f731fd4bd7e43c97f3c50eee276f3aa09c94784d3e"
dependencies = [
"log",
"ring",
@@ -5018,6 +5115,7 @@ version = "0.23.7"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "ebbbdb961df0ad3f2652da8f3fdc4b36122f568f968f45ad3316f26c025c677b"
dependencies = [
"aws-lc-rs",
"log",
"once_cell",
"ring",
@@ -5086,9 +5184,9 @@ dependencies = [
[[package]]
name = "rustls-pki-types"
version = "1.3.1"
version = "1.10.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "5ede67b28608b4c60685c7d54122d4400d90f62b40caee7700e700380a390fa8"
checksum = "16f1201b3c9a7ee8039bcadc17b7e605e2945b27eee7631788c1bd2b0643674b"
[[package]]
name = "rustls-webpki"
@@ -5106,6 +5204,7 @@ version = "0.102.2"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "faaa0a62740bedb9b2ef5afa303da42764c012f743917351dc9a237ea1663610"
dependencies = [
"aws-lc-rs",
"ring",
"rustls-pki-types",
"untrusted",
@@ -5309,7 +5408,7 @@ checksum = "00421ed8fa0c995f07cde48ba6c89e80f2b312f74ff637326f392fbfd23abe02"
dependencies = [
"httpdate",
"reqwest 0.12.4",
"rustls 0.21.11",
"rustls 0.21.12",
"sentry-backtrace",
"sentry-contexts",
"sentry-core",
@@ -5804,8 +5903,8 @@ dependencies = [
"postgres_ffi",
"remote_storage",
"reqwest 0.12.4",
"rustls 0.22.4",
"rustls-native-certs 0.7.0",
"rustls 0.23.7",
"rustls-native-certs 0.8.0",
"serde",
"serde_json",
"storage_controller_client",
@@ -5927,14 +6026,13 @@ checksum = "a7065abeca94b6a8a577f9bd45aa0867a2238b74e8eb67cf10d492bc39351394"
[[package]]
name = "synstructure"
version = "0.12.6"
version = "0.13.1"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "f36bdaa60a83aca3921b5259d5400cbf5e90fc51931376a9bd4a0eb79aa7210f"
checksum = "c8af7666ab7b6390ab78131fb5b0fce11d6b7a6951602017c35fa82800708971"
dependencies = [
"proc-macro2",
"quote",
"syn 1.0.109",
"unicode-xid",
"syn 2.0.52",
]
[[package]]
@@ -6233,16 +6331,15 @@ dependencies = [
[[package]]
name = "tokio-postgres-rustls"
version = "0.11.1"
version = "0.12.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "0ea13f22eda7127c827983bdaf0d7fff9df21c8817bab02815ac277a21143677"
checksum = "04fb792ccd6bbcd4bba408eb8a292f70fc4a3589e5d793626f45190e6454b6ab"
dependencies = [
"futures",
"ring",
"rustls 0.22.4",
"rustls 0.23.7",
"tokio",
"tokio-postgres",
"tokio-rustls 0.25.0",
"tokio-rustls 0.26.0",
"x509-certificate",
]
@@ -6252,7 +6349,7 @@ version = "0.24.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "e0d409377ff5b1e3ca6437aa86c1eb7d40c134bfec254e44c830defa92669db5"
dependencies = [
"rustls 0.21.11",
"rustls 0.21.12",
"tokio",
]
@@ -6675,16 +6772,15 @@ checksum = "8ecb6da28b8a351d773b68d5825ac39017e680750f980f3a1a85cd8dd28a47c1"
[[package]]
name = "ureq"
version = "2.9.7"
version = "2.10.1"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "d11a831e3c0b56e438a28308e7c810799e3c118417f342d30ecec080105395cd"
checksum = "b74fc6b57825be3373f7054754755f03ac3a8f5d70015ccad699ba2029956f4a"
dependencies = [
"base64 0.22.1",
"log",
"once_cell",
"rustls 0.22.4",
"rustls 0.23.7",
"rustls-pki-types",
"rustls-webpki 0.102.2",
"url",
"webpki-roots 0.26.1",
]
@@ -6873,7 +6969,7 @@ name = "walproposer"
version = "0.1.0"
dependencies = [
"anyhow",
"bindgen",
"bindgen 0.70.1",
"postgres_ffi",
"utils",
]
@@ -7048,6 +7144,18 @@ dependencies = [
"rustls-pki-types",
]
[[package]]
name = "which"
version = "4.4.2"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "87ba24419a2078cd2b0f2ede2691b6c66d8e47836da3b6db8265ebad47afbfc7"
dependencies = [
"either",
"home",
"once_cell",
"rustix",
]
[[package]]
name = "whoami"
version = "1.5.1"
@@ -7292,7 +7400,6 @@ dependencies = [
"digest",
"either",
"fail",
"futures",
"futures-channel",
"futures-executor",
"futures-io",
@@ -7307,7 +7414,8 @@ dependencies = [
"hyper 1.4.1",
"hyper-util",
"indexmap 1.9.3",
"itertools 0.12.1",
"indexmap 2.0.1",
"itertools 0.10.5",
"lazy_static",
"libc",
"log",
@@ -7328,6 +7436,8 @@ dependencies = [
"regex-automata 0.4.3",
"regex-syntax 0.8.2",
"reqwest 0.12.4",
"rustls 0.23.7",
"rustls-webpki 0.102.2",
"scopeguard",
"serde",
"serde_json",
@@ -7336,7 +7446,6 @@ dependencies = [
"smallvec",
"spki 0.7.3",
"subtle",
"syn 1.0.109",
"syn 2.0.52",
"sync_wrapper 0.1.2",
"tikv-jemalloc-sys",
@@ -7344,6 +7453,7 @@ dependencies = [
"time-macros",
"tokio",
"tokio-postgres",
"tokio-rustls 0.26.0",
"tokio-stream",
"tokio-util",
"toml_edit",
@@ -7379,9 +7489,9 @@ dependencies = [
[[package]]
name = "x509-parser"
version = "0.15.0"
version = "0.16.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "bab0c2f54ae1d92f4fcb99c0b7ccf0b1e3451cbd395e5f115ccbdbcb18d4f634"
checksum = "fcbc162f30700d6f3f82a24bf7cc62ffe7caea42c0b2cba8bf7f3ae50cf51f69"
dependencies = [
"asn1-rs",
"data-encoding",

View File

@@ -107,6 +107,7 @@ indexmap = "2"
indoc = "2"
ipnet = "2.9.0"
itertools = "0.10"
itoa = "1.0.11"
jsonwebtoken = "9"
lasso = "0.7"
libc = "0.2"
@@ -141,7 +142,7 @@ reqwest-retry = "0.5"
routerify = "3"
rpds = "0.13"
rustc-hash = "1.1.0"
rustls = "0.22"
rustls = "0.23"
rustls-pemfile = "2"
scopeguard = "1.1"
sysinfo = "0.29.2"
@@ -171,8 +172,8 @@ tikv-jemalloc-ctl = "0.5"
tokio = { version = "1.17", features = ["macros"] }
tokio-epoll-uring = { git = "https://github.com/neondatabase/tokio-epoll-uring.git" , branch = "main" }
tokio-io-timeout = "1.2.0"
tokio-postgres-rustls = "0.11.0"
tokio-rustls = "0.25"
tokio-postgres-rustls = "0.12.0"
tokio-rustls = "0.26"
tokio-stream = "0.1"
tokio-tar = "0.3"
tokio-util = { version = "0.7.10", features = ["io", "rt"] }
@@ -191,8 +192,8 @@ url = "2.2"
urlencoding = "2.1"
uuid = { version = "1.6.1", features = ["v4", "v7", "serde"] }
walkdir = "2.3.2"
rustls-native-certs = "0.7"
x509-parser = "0.15"
rustls-native-certs = "0.8"
x509-parser = "0.16"
whoami = "1.5.1"
## TODO replace this with tracing
@@ -243,7 +244,7 @@ workspace_hack = { version = "0.1", path = "./workspace_hack/" }
## Build dependencies
criterion = "0.5.1"
rcgen = "0.12"
rcgen = "0.13"
rstest = "0.18"
camino-tempfile = "1.0.2"
tonic-build = "0.12"

View File

@@ -7,6 +7,8 @@ ARG IMAGE=build-tools
ARG TAG=pinned
ARG DEFAULT_PG_VERSION=17
ARG STABLE_PG_VERSION=16
ARG DEBIAN_VERSION=bullseye
ARG DEBIAN_FLAVOR=${DEBIAN_VERSION}-slim
# Build Postgres
FROM $REPOSITORY/$IMAGE:$TAG AS pg-build
@@ -57,7 +59,7 @@ RUN set -e \
# Build final image
#
FROM debian:bullseye-slim
FROM debian:${DEBIAN_FLAVOR}
ARG DEFAULT_PG_VERSION
WORKDIR /data

View File

@@ -291,12 +291,13 @@ postgres-check: \
# This doesn't remove the effects of 'configure'.
.PHONY: clean
clean: postgres-clean neon-pg-clean-ext
$(MAKE) -C compute clean
$(CARGO_CMD_PREFIX) cargo clean
# This removes everything
.PHONY: distclean
distclean:
rm -rf $(POSTGRES_INSTALL_DIR)
$(RM) -r $(POSTGRES_INSTALL_DIR)
$(CARGO_CMD_PREFIX) cargo clean
.PHONY: fmt
@@ -328,7 +329,7 @@ postgres-%-pgindent: postgres-%-pg-bsd-indent postgres-%-typedefs.list
$(ROOT_PROJECT_DIR)/vendor/postgres-$*/src/tools/pgindent/pgindent --typedefs postgres-$*-typedefs-full.list \
$(ROOT_PROJECT_DIR)/vendor/postgres-$*/src/ \
--excludes $(ROOT_PROJECT_DIR)/vendor/postgres-$*/src/tools/pgindent/exclude_file_patterns
rm -f pg*.BAK
$(RM) pg*.BAK
# Indent pxgn/neon.
.PHONY: neon-pgindent

View File

@@ -31,7 +31,7 @@ See developer documentation in [SUMMARY.md](/docs/SUMMARY.md) for more informati
```bash
apt install build-essential libtool libreadline-dev zlib1g-dev flex bison libseccomp-dev \
libssl-dev clang pkg-config libpq-dev cmake postgresql-client protobuf-compiler \
libcurl4-openssl-dev openssl python3-poetry lsof libicu-dev
libprotobuf-dev libcurl4-openssl-dev openssl python3-poetry lsof libicu-dev
```
* On Fedora, these packages are needed:
```bash

View File

@@ -1,12 +1,7 @@
FROM debian:bullseye-slim
ARG DEBIAN_VERSION=bullseye
# Use ARG as a build-time environment variable here to allow.
# It's not supposed to be set outside.
# Alternatively it can be obtained using the following command
# ```
# . /etc/os-release && echo "${VERSION_CODENAME}"
# ```
ARG DEBIAN_VERSION_CODENAME=bullseye
FROM debian:${DEBIAN_VERSION}-slim
ARG DEBIAN_VERSION
# Add nonroot user
RUN useradd -ms /bin/bash nonroot -b /home
@@ -32,6 +27,7 @@ RUN set -e \
gnupg \
gzip \
jq \
jsonnet \
libcurl4-openssl-dev \
libbz2-dev \
libffi-dev \
@@ -42,14 +38,14 @@ RUN set -e \
libseccomp-dev \
libsqlite3-dev \
libssl-dev \
libstdc++-10-dev \
$([[ "${DEBIAN_VERSION}" = "bullseye" ]] && libstdc++-10-dev || libstdc++-11-dev) \
libtool \
libxml2-dev \
libxmlsec1-dev \
libxxhash-dev \
lsof \
make \
netcat \
netcat-openbsd \
net-tools \
openssh-client \
parallel \
@@ -76,9 +72,9 @@ RUN curl -sL "https://github.com/peak/s5cmd/releases/download/v${S5CMD_VERSION}/
&& mv s5cmd /usr/local/bin/s5cmd
# LLVM
ENV LLVM_VERSION=18
ENV LLVM_VERSION=19
RUN curl -fsSL 'https://apt.llvm.org/llvm-snapshot.gpg.key' | apt-key add - \
&& echo "deb http://apt.llvm.org/${DEBIAN_VERSION_CODENAME}/ llvm-toolchain-${DEBIAN_VERSION_CODENAME}-${LLVM_VERSION} main" > /etc/apt/sources.list.d/llvm.stable.list \
&& echo "deb http://apt.llvm.org/${DEBIAN_VERSION}/ llvm-toolchain-${DEBIAN_VERSION}-${LLVM_VERSION} main" > /etc/apt/sources.list.d/llvm.stable.list \
&& apt update \
&& apt install -y clang-${LLVM_VERSION} llvm-${LLVM_VERSION} \
&& bash -c 'for f in /usr/bin/clang*-${LLVM_VERSION} /usr/bin/llvm*-${LLVM_VERSION}; do ln -s "${f}" "${f%-${LLVM_VERSION}}"; done' \
@@ -86,7 +82,7 @@ RUN curl -fsSL 'https://apt.llvm.org/llvm-snapshot.gpg.key' | apt-key add - \
# Install docker
RUN curl -fsSL https://download.docker.com/linux/ubuntu/gpg | gpg --dearmor -o /usr/share/keyrings/docker-archive-keyring.gpg \
&& echo "deb [arch=$(dpkg --print-architecture) signed-by=/usr/share/keyrings/docker-archive-keyring.gpg] https://download.docker.com/linux/debian ${DEBIAN_VERSION_CODENAME} stable" > /etc/apt/sources.list.d/docker.list \
&& echo "deb [arch=$(dpkg --print-architecture) signed-by=/usr/share/keyrings/docker-archive-keyring.gpg] https://download.docker.com/linux/debian ${DEBIAN_VERSION} stable" > /etc/apt/sources.list.d/docker.list \
&& apt update \
&& apt install -y docker-ce docker-ce-cli \
&& rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/*
@@ -103,7 +99,7 @@ RUN curl "https://awscli.amazonaws.com/awscli-exe-linux-$(uname -m).zip" -o "aws
&& rm awscliv2.zip
# Mold: A Modern Linker
ENV MOLD_VERSION=v2.33.0
ENV MOLD_VERSION=v2.34.1
RUN set -e \
&& git clone https://github.com/rui314/mold.git \
&& mkdir mold/build \
@@ -146,7 +142,7 @@ RUN wget -O /tmp/openssl-${OPENSSL_VERSION}.tar.gz https://www.openssl.org/sourc
# Use the same version of libicu as the compute nodes so that
# clusters created using inidb on pageserver can be used by computes.
#
# TODO: at this time, Dockerfile.compute-node uses the debian bullseye libicu
# TODO: at this time, compute-node.Dockerfile uses the debian bullseye libicu
# package, which is 67.1. We're duplicating that knowledge here, and also, technically,
# Debian has a few patches on top of 67.1 that we're not adding here.
ENV ICU_VERSION=67.1
@@ -196,7 +192,7 @@ WORKDIR /home/nonroot
# Rust
# Please keep the version of llvm (installed above) in sync with rust llvm (`rustc --version --verbose | grep LLVM`)
ENV RUSTC_VERSION=1.81.0
ENV RUSTC_VERSION=1.82.0
ENV RUSTUP_HOME="/home/nonroot/.rustup"
ENV PATH="/home/nonroot/.cargo/bin:${PATH}"
ARG RUSTFILT_VERSION=0.2.1

5
compute/.gitignore vendored Normal file
View File

@@ -0,0 +1,5 @@
# sql_exporter config files generated from Jsonnet
etc/neon_collector.yml
etc/neon_collector_autoscaling.yml
etc/sql_exporter.yml
etc/sql_exporter_autoscaling.yml

49
compute/Makefile Normal file
View File

@@ -0,0 +1,49 @@
jsonnet_files = $(wildcard \
etc/*.jsonnet \
etc/sql_exporter/*.libsonnet)
.PHONY: all
all: neon_collector.yml neon_collector_autoscaling.yml sql_exporter.yml sql_exporter_autoscaling.yml
neon_collector.yml: $(jsonnet_files)
JSONNET_PATH=jsonnet:etc jsonnet \
--output-file etc/$@ \
--ext-str pg_version=$(PG_VERSION) \
etc/neon_collector.jsonnet
neon_collector_autoscaling.yml: $(jsonnet_files)
JSONNET_PATH=jsonnet:etc jsonnet \
--output-file etc/$@ \
--ext-str pg_version=$(PG_VERSION) \
etc/neon_collector_autoscaling.jsonnet
sql_exporter.yml: $(jsonnet_files)
JSONNET_PATH=etc jsonnet \
--output-file etc/$@ \
--tla-str collector_name=neon_collector \
--tla-str collector_file=neon_collector.yml \
etc/sql_exporter.jsonnet
sql_exporter_autoscaling.yml: $(jsonnet_files)
JSONNET_PATH=etc jsonnet \
--output-file etc/$@ \
--tla-str collector_name=neon_collector_autoscaling \
--tla-str collector_file=neon_collector_autoscaling.yml \
--tla-str application_name=sql_exporter_autoscaling \
etc/sql_exporter.jsonnet
.PHONY: clean
clean:
$(RM) \
etc/neon_collector.yml \
etc/neon_collector_autoscaling.yml \
etc/sql_exporter.yml \
etc/sql_exporter_autoscaling.yml
.PHONY: jsonnetfmt-test
jsonnetfmt-test:
jsonnetfmt --test $(jsonnet_files)
.PHONY: jsonnetfmt-format
jsonnetfmt-format:
jsonnetfmt --in-place $(jsonnet_files)

View File

@@ -1,7 +1,7 @@
This directory contains files that are needed to build the compute
images, or included in the compute images.
Dockerfile.compute-node
compute-node.Dockerfile
To build the compute image
vm-image-spec.yaml
@@ -14,8 +14,8 @@ etc/
patches/
Some extensions need to be patched to work with Neon. This
directory contains such patches. They are applied to the extension
sources in Dockerfile.compute-node
sources in compute-node.Dockerfile
In addition to these, postgres itself, the neon postgres extension,
and compute_ctl are built and copied into the compute image by
Dockerfile.compute-node.
compute-node.Dockerfile.

View File

@@ -3,7 +3,8 @@ ARG REPOSITORY=neondatabase
ARG IMAGE=build-tools
ARG TAG=pinned
ARG BUILD_TAG
ARG DEBIAN_FLAVOR=bullseye-slim
ARG DEBIAN_VERSION=bullseye
ARG DEBIAN_FLAVOR=${DEBIAN_VERSION}-slim
#########################################################################################
#
@@ -11,19 +12,23 @@ ARG DEBIAN_FLAVOR=bullseye-slim
#
#########################################################################################
FROM debian:$DEBIAN_FLAVOR AS build-deps
ARG DEBIAN_FLAVOR
ARG DEBIAN_VERSION
RUN case $DEBIAN_FLAVOR in \
RUN case $DEBIAN_VERSION in \
# Version-specific installs for Bullseye (PG14-PG16):
# The h3_pg extension needs a cmake 3.20+, but Debian bullseye has 3.18.
# Install newer version (3.25) from backports.
bullseye*) \
# libstdc++-10-dev is required for plv8
bullseye) \
echo "deb http://deb.debian.org/debian bullseye-backports main" > /etc/apt/sources.list.d/bullseye-backports.list; \
VERSION_INSTALLS="cmake/bullseye-backports cmake-data/bullseye-backports"; \
VERSION_INSTALLS="cmake/bullseye-backports cmake-data/bullseye-backports libstdc++-10-dev"; \
;; \
# Version-specific installs for Bookworm (PG17):
bookworm*) \
VERSION_INSTALLS="cmake"; \
bookworm) \
VERSION_INSTALLS="cmake libstdc++-12-dev"; \
;; \
*) \
echo "Unknown Debian version ${DEBIAN_VERSION}" && exit 1 \
;; \
esac && \
apt update && \
@@ -223,18 +228,33 @@ FROM build-deps AS plv8-build
ARG PG_VERSION
COPY --from=pg-build /usr/local/pgsql/ /usr/local/pgsql/
RUN case "${PG_VERSION}" in "v17") \
echo "v17 extensions are not supported yet. Quit" && exit 0;; \
esac && \
apt update && \
RUN apt update && \
apt install --no-install-recommends -y ninja-build python3-dev libncurses5 binutils clang
RUN case "${PG_VERSION}" in "v17") \
echo "v17 extensions are not supported yet. Quit" && exit 0;; \
# plv8 3.2.3 supports v17
# last release v3.2.3 - Sep 7, 2024
#
# clone the repo instead of downloading the release tarball because plv8 has submodule dependencies
# and the release tarball doesn't include them
#
# Use new version only for v17
# because since v3.2, plv8 doesn't include plcoffee and plls extensions
ENV PLV8_TAG=v3.2.3
RUN case "${PG_VERSION}" in \
"v17") \
export PLV8_TAG=v3.2.3 \
;; \
"v14" | "v15" | "v16") \
export PLV8_TAG=v3.1.10 \
;; \
*) \
echo "unexpected PostgreSQL version" && exit 1 \
;; \
esac && \
wget https://github.com/plv8/plv8/archive/refs/tags/v3.1.10.tar.gz -O plv8.tar.gz && \
echo "7096c3290928561f0d4901b7a52794295dc47f6303102fae3f8e42dd575ad97d plv8.tar.gz" | sha256sum --check && \
mkdir plv8-src && cd plv8-src && tar xzf ../plv8.tar.gz --strip-components=1 -C . && \
git clone --recurse-submodules --depth 1 --branch ${PLV8_TAG} https://github.com/plv8/plv8.git plv8-src && \
tar -czf plv8.tar.gz --exclude .git plv8-src && \
cd plv8-src && \
# generate and copy upgrade scripts
mkdir -p upgrade && ./generate_upgrade.sh 3.1.10 && \
cp upgrade/* /usr/local/pgsql/share/extension/ && \
@@ -244,8 +264,17 @@ RUN case "${PG_VERSION}" in "v17") \
find /usr/local/pgsql/ -name "plv8-*.so" | xargs strip && \
# don't break computes with installed old version of plv8
cd /usr/local/pgsql/lib/ && \
ln -s plv8-3.1.10.so plv8-3.1.5.so && \
ln -s plv8-3.1.10.so plv8-3.1.8.so && \
case "${PG_VERSION}" in \
"v17") \
ln -s plv8-3.2.3.so plv8-3.1.8.so && \
ln -s plv8-3.2.3.so plv8-3.1.5.so && \
ln -s plv8-3.2.3.so plv8-3.1.10.so \
;; \
"v14" | "v15" | "v16") \
ln -s plv8-3.1.10.so plv8-3.1.5.so && \
ln -s plv8-3.1.10.so plv8-3.1.8.so \
;; \
esac && \
echo 'trusted = true' >> /usr/local/pgsql/share/extension/plv8.control && \
echo 'trusted = true' >> /usr/local/pgsql/share/extension/plcoffee.control && \
echo 'trusted = true' >> /usr/local/pgsql/share/extension/plls.control
@@ -323,11 +352,11 @@ COPY compute/patches/pgvector.patch /pgvector.patch
# By default, pgvector Makefile uses `-march=native`. We don't want that,
# because we build the images on different machines than where we run them.
# Pass OPTFLAGS="" to remove it.
RUN case "${PG_VERSION}" in "v17") \
echo "v17 extensions are not supported yet. Quit" && exit 0;; \
esac && \
wget https://github.com/pgvector/pgvector/archive/refs/tags/v0.7.2.tar.gz -O pgvector.tar.gz && \
echo "617fba855c9bcb41a2a9bc78a78567fd2e147c72afd5bf9d37b31b9591632b30 pgvector.tar.gz" | sha256sum --check && \
#
# vector 0.7.4 supports v17
# last release v0.7.4 - Aug 5, 2024
RUN wget https://github.com/pgvector/pgvector/archive/refs/tags/v0.7.4.tar.gz -O pgvector.tar.gz && \
echo "0341edf89b1924ae0d552f617e14fb7f8867c0194ed775bcc44fa40288642583 pgvector.tar.gz" | sha256sum --check && \
mkdir pgvector-src && cd pgvector-src && tar xzf ../pgvector.tar.gz --strip-components=1 -C . && \
patch -p1 < /pgvector.patch && \
make -j $(getconf _NPROCESSORS_ONLN) OPTFLAGS="" PG_CONFIG=/usr/local/pgsql/bin/pg_config && \
@@ -345,7 +374,7 @@ ARG PG_VERSION
COPY --from=pg-build /usr/local/pgsql/ /usr/local/pgsql/
# not version-specific
# doesn't use releases, last commit f3d82fd - Mar 2, 2023
# doesn't use releases, last commit f3d82fd - Mar 2, 2023
RUN wget https://github.com/michelp/pgjwt/archive/f3d82fd30151e754e19ce5d6a06c71c20689ce3d.tar.gz -O pgjwt.tar.gz && \
echo "dae8ed99eebb7593b43013f6532d772b12dfecd55548d2673f2dfd0163f6d2b9 pgjwt.tar.gz" | sha256sum --check && \
mkdir pgjwt-src && cd pgjwt-src && tar xzf ../pgjwt.tar.gz --strip-components=1 -C . && \
@@ -362,11 +391,10 @@ FROM build-deps AS hypopg-pg-build
ARG PG_VERSION
COPY --from=pg-build /usr/local/pgsql/ /usr/local/pgsql/
RUN case "${PG_VERSION}" in "v17") \
echo "v17 extensions are not supported yet. Quit" && exit 0;; \
esac && \
wget https://github.com/HypoPG/hypopg/archive/refs/tags/1.4.0.tar.gz -O hypopg.tar.gz && \
echo "0821011743083226fc9b813c1f2ef5897a91901b57b6bea85a78e466187c6819 hypopg.tar.gz" | sha256sum --check && \
# HypoPG 1.4.1 supports v17
# last release 1.4.1 - Apr 28, 2024
RUN wget https://github.com/HypoPG/hypopg/archive/refs/tags/1.4.1.tar.gz -O hypopg.tar.gz && \
echo "9afe6357fd389d8d33fad81703038ce520b09275ec00153c6c89282bcdedd6bc hypopg.tar.gz" | sha256sum --check && \
mkdir hypopg-src && cd hypopg-src && tar xzf ../hypopg.tar.gz --strip-components=1 -C . && \
make -j $(getconf _NPROCESSORS_ONLN) PG_CONFIG=/usr/local/pgsql/bin/pg_config && \
make -j $(getconf _NPROCESSORS_ONLN) install PG_CONFIG=/usr/local/pgsql/bin/pg_config && \
@@ -403,6 +431,9 @@ COPY --from=pg-build /usr/local/pgsql/ /usr/local/pgsql/
COPY compute/patches/rum.patch /rum.patch
# maybe version-specific
# support for v17 is unknown
# last release 1.3.13 - Sep 19, 2022
RUN case "${PG_VERSION}" in "v17") \
echo "v17 extensions are not supported yet. Quit" && exit 0;; \
esac && \
@@ -424,11 +455,10 @@ FROM build-deps AS pgtap-pg-build
ARG PG_VERSION
COPY --from=pg-build /usr/local/pgsql/ /usr/local/pgsql/
RUN case "${PG_VERSION}" in "v17") \
echo "v17 extensions are not supported yet. Quit" && exit 0;; \
esac && \
wget https://github.com/theory/pgtap/archive/refs/tags/v1.2.0.tar.gz -O pgtap.tar.gz && \
echo "9c7c3de67ea41638e14f06da5da57bac6f5bd03fea05c165a0ec862205a5c052 pgtap.tar.gz" | sha256sum --check && \
# pgtap 1.3.3 supports v17
# last release v1.3.3 - Apr 8, 2024
RUN wget https://github.com/theory/pgtap/archive/refs/tags/v1.3.3.tar.gz -O pgtap.tar.gz && \
echo "325ea79d0d2515bce96bce43f6823dcd3effbd6c54cb2a4d6c2384fffa3a14c7 pgtap.tar.gz" | sha256sum --check && \
mkdir pgtap-src && cd pgtap-src && tar xzf ../pgtap.tar.gz --strip-components=1 -C . && \
make -j $(getconf _NPROCESSORS_ONLN) PG_CONFIG=/usr/local/pgsql/bin/pg_config && \
make -j $(getconf _NPROCESSORS_ONLN) install PG_CONFIG=/usr/local/pgsql/bin/pg_config && \
@@ -501,11 +531,10 @@ FROM build-deps AS plpgsql-check-pg-build
ARG PG_VERSION
COPY --from=pg-build /usr/local/pgsql/ /usr/local/pgsql/
RUN case "${PG_VERSION}" in "v17") \
echo "v17 extensions are not supported yet. Quit" && exit 0;; \
esac && \
wget https://github.com/okbob/plpgsql_check/archive/refs/tags/v2.5.3.tar.gz -O plpgsql_check.tar.gz && \
echo "6631ec3e7fb3769eaaf56e3dfedb829aa761abf163d13dba354b4c218508e1c0 plpgsql_check.tar.gz" | sha256sum --check && \
# plpgsql_check v2.7.11 supports v17
# last release v2.7.11 - Sep 16, 2024
RUN wget https://github.com/okbob/plpgsql_check/archive/refs/tags/v2.7.11.tar.gz -O plpgsql_check.tar.gz && \
echo "208933f8dbe8e0d2628eb3851e9f52e6892b8e280c63700c0f1ce7883625d172 plpgsql_check.tar.gz" | sha256sum --check && \
mkdir plpgsql_check-src && cd plpgsql_check-src && tar xzf ../plpgsql_check.tar.gz --strip-components=1 -C . && \
make -j $(getconf _NPROCESSORS_ONLN) PG_CONFIG=/usr/local/pgsql/bin/pg_config USE_PGXS=1 && \
make -j $(getconf _NPROCESSORS_ONLN) install PG_CONFIG=/usr/local/pgsql/bin/pg_config USE_PGXS=1 && \
@@ -523,18 +552,19 @@ COPY --from=pg-build /usr/local/pgsql/ /usr/local/pgsql/
ARG PG_VERSION
ENV PATH="/usr/local/pgsql/bin:$PATH"
RUN case "${PG_VERSION}" in "v17") \
echo "v17 extensions are not supported yet. Quit" && exit 0;; \
esac && \
case "${PG_VERSION}" in \
RUN case "${PG_VERSION}" in \
"v14" | "v15") \
export TIMESCALEDB_VERSION=2.10.1 \
export TIMESCALEDB_CHECKSUM=6fca72a6ed0f6d32d2b3523951ede73dc5f9b0077b38450a029a5f411fdb8c73 \
;; \
*) \
"v16") \
export TIMESCALEDB_VERSION=2.13.0 \
export TIMESCALEDB_CHECKSUM=584a351c7775f0e067eaa0e7277ea88cab9077cc4c455cbbf09a5d9723dce95d \
;; \
"v17") \
export TIMESCALEDB_VERSION=2.17.0 \
export TIMESCALEDB_CHECKSUM=155bf64391d3558c42f31ca0e523cfc6252921974f75298c9039ccad1c89811a \
;; \
esac && \
wget https://github.com/timescale/timescaledb/archive/refs/tags/${TIMESCALEDB_VERSION}.tar.gz -O timescaledb.tar.gz && \
echo "${TIMESCALEDB_CHECKSUM} timescaledb.tar.gz" | sha256sum --check && \
@@ -557,10 +587,8 @@ COPY --from=pg-build /usr/local/pgsql/ /usr/local/pgsql/
ARG PG_VERSION
ENV PATH="/usr/local/pgsql/bin:$PATH"
RUN case "${PG_VERSION}" in "v17") \
echo "v17 extensions are not supported yet. Quit" && exit 0;; \
esac && \
case "${PG_VERSION}" in \
# version-specific, has separate releases for each version
RUN case "${PG_VERSION}" in \
"v14") \
export PG_HINT_PLAN_VERSION=14_1_4_1 \
export PG_HINT_PLAN_CHECKSUM=c3501becf70ead27f70626bce80ea401ceac6a77e2083ee5f3ff1f1444ec1ad1 \
@@ -574,7 +602,8 @@ RUN case "${PG_VERSION}" in "v17") \
export PG_HINT_PLAN_CHECKSUM=fc85a9212e7d2819d4ae4ac75817481101833c3cfa9f0fe1f980984e12347d00 \
;; \
"v17") \
echo "TODO: PG17 pg_hint_plan support" && exit 0 \
export PG_HINT_PLAN_VERSION=17_1_7_0 \
export PG_HINT_PLAN_CHECKSUM=06dd306328c67a4248f48403c50444f30959fb61ebe963248dbc2afb396fe600 \
;; \
*) \
echo "Export the valid PG_HINT_PLAN_VERSION variable" && exit 1 \
@@ -598,6 +627,10 @@ FROM build-deps AS pg-cron-pg-build
ARG PG_VERSION
COPY --from=pg-build /usr/local/pgsql/ /usr/local/pgsql/
# 1.6.4 available, supports v17
# This is an experimental extension that we do not support on prod yet.
# !Do not remove!
# We set it in shared_preload_libraries and computes will fail to start if library is not found.
ENV PATH="/usr/local/pgsql/bin/:$PATH"
RUN case "${PG_VERSION}" in "v17") \
echo "v17 extensions are not supported yet. Quit" && exit 0;; \
@@ -619,23 +652,37 @@ FROM build-deps AS rdkit-pg-build
ARG PG_VERSION
COPY --from=pg-build /usr/local/pgsql/ /usr/local/pgsql/
RUN case "${PG_VERSION}" in "v17") \
echo "v17 extensions are not supported yet. Quit" && exit 0;; \
esac && \
apt-get update && \
RUN apt-get update && \
apt-get install --no-install-recommends -y \
libboost-iostreams1.74-dev \
libboost-regex1.74-dev \
libboost-serialization1.74-dev \
libboost-system1.74-dev \
libeigen3-dev
libeigen3-dev \
libboost-all-dev
# rdkit Release_2024_09_1 supports v17
# last release Release_2024_09_1 - Sep 27, 2024
#
# Use new version only for v17
# because Release_2024_09_1 has some backward incompatible changes
# https://github.com/rdkit/rdkit/releases/tag/Release_2024_09_1
ENV PATH="/usr/local/pgsql/bin/:/usr/local/pgsql/:$PATH"
RUN case "${PG_VERSION}" in "v17") \
echo "v17 extensions are not supported yet. Quit" && exit 0;; \
RUN case "${PG_VERSION}" in \
"v17") \
export RDKIT_VERSION=Release_2024_09_1 \
export RDKIT_CHECKSUM=034c00d6e9de323506834da03400761ed8c3721095114369d06805409747a60f \
;; \
"v14" | "v15" | "v16") \
export RDKIT_VERSION=Release_2023_03_3 \
export RDKIT_CHECKSUM=bdbf9a2e6988526bfeb8c56ce3cdfe2998d60ac289078e2215374288185e8c8d \
;; \
*) \
echo "unexpected PostgreSQL version" && exit 1 \
;; \
esac && \
wget https://github.com/rdkit/rdkit/archive/refs/tags/Release_2023_03_3.tar.gz -O rdkit.tar.gz && \
echo "bdbf9a2e6988526bfeb8c56ce3cdfe2998d60ac289078e2215374288185e8c8d rdkit.tar.gz" | sha256sum --check && \
wget https://github.com/rdkit/rdkit/archive/refs/tags/${RDKIT_VERSION}.tar.gz -O rdkit.tar.gz && \
echo "${RDKIT_CHECKSUM} rdkit.tar.gz" | sha256sum --check && \
mkdir rdkit-src && cd rdkit-src && tar xzf ../rdkit.tar.gz --strip-components=1 -C . && \
cmake \
-D RDK_BUILD_CAIRO_SUPPORT=OFF \
@@ -674,12 +721,11 @@ FROM build-deps AS pg-uuidv7-pg-build
ARG PG_VERSION
COPY --from=pg-build /usr/local/pgsql/ /usr/local/pgsql/
# not version-specific
# last release v1.6.0 - Oct 9, 2024
ENV PATH="/usr/local/pgsql/bin/:$PATH"
RUN case "${PG_VERSION}" in "v17") \
echo "v17 extensions are not supported yet. Quit" && exit 0;; \
esac && \
wget https://github.com/fboulnois/pg_uuidv7/archive/refs/tags/v1.0.1.tar.gz -O pg_uuidv7.tar.gz && \
echo "0d0759ab01b7fb23851ecffb0bce27822e1868a4a5819bfd276101c716637a7a pg_uuidv7.tar.gz" | sha256sum --check && \
RUN wget https://github.com/fboulnois/pg_uuidv7/archive/refs/tags/v1.6.0.tar.gz -O pg_uuidv7.tar.gz && \
echo "0fa6c710929d003f6ce276a7de7a864e9d1667b2d78be3dc2c07f2409eb55867 pg_uuidv7.tar.gz" | sha256sum --check && \
mkdir pg_uuidv7-src && cd pg_uuidv7-src && tar xzf ../pg_uuidv7.tar.gz --strip-components=1 -C . && \
make -j $(getconf _NPROCESSORS_ONLN) && \
make -j $(getconf _NPROCESSORS_ONLN) install && \
@@ -750,6 +796,8 @@ RUN case "${PG_VERSION}" in \
FROM build-deps AS pg-embedding-pg-build
COPY --from=pg-build /usr/local/pgsql/ /usr/local/pgsql/
# This is our extension, support stopped in favor of pgvector
# TODO: deprecate it
ARG PG_VERSION
ENV PATH="/usr/local/pgsql/bin/:$PATH"
RUN case "${PG_VERSION}" in \
@@ -776,6 +824,8 @@ FROM build-deps AS pg-anon-pg-build
ARG PG_VERSION
COPY --from=pg-build /usr/local/pgsql/ /usr/local/pgsql/
# This is an experimental extension, never got to real production.
# !Do not remove! It can be present in shared_preload_libraries and compute will fail to start if library is not found.
ENV PATH="/usr/local/pgsql/bin/:$PATH"
RUN case "${PG_VERSION}" in "v17") \
echo "postgresql_anonymizer does not yet support PG17" && exit 0;; \
@@ -925,8 +975,8 @@ ARG PG_VERSION
RUN case "${PG_VERSION}" in "v17") \
echo "pg_session_jwt does not yet have a release that supports pg17" && exit 0;; \
esac && \
wget https://github.com/neondatabase/pg_session_jwt/archive/ff0a72440e8ff584dab24b3f9b7c00c56c660b8e.tar.gz -O pg_session_jwt.tar.gz && \
echo "1fbb2b5a339263bcf6daa847fad8bccbc0b451cea6a62e6d3bf232b0087f05cb pg_session_jwt.tar.gz" | sha256sum --check && \
wget https://github.com/neondatabase/pg_session_jwt/archive/e1310b08ba51377a19e0559e4d1194883b9b2ba2.tar.gz -O pg_session_jwt.tar.gz && \
echo "837932a077888d5545fd54b0abcc79e5f8e37017c2769a930afc2f5c94df6f4e pg_session_jwt.tar.gz" | sha256sum --check && \
mkdir pg_session_jwt-src && cd pg_session_jwt-src && tar xzf ../pg_session_jwt.tar.gz --strip-components=1 -C . && \
sed -i 's/pgrx = "=0.11.3"/pgrx = { version = "=0.11.3", features = [ "unsafe-postgres" ] }/g' Cargo.toml && \
cargo pgrx install --release
@@ -942,13 +992,12 @@ FROM build-deps AS wal2json-pg-build
ARG PG_VERSION
COPY --from=pg-build /usr/local/pgsql/ /usr/local/pgsql/
# wal2json wal2json_2_6 supports v17
# last release wal2json_2_6 - Apr 25, 2024
ENV PATH="/usr/local/pgsql/bin/:$PATH"
RUN case "${PG_VERSION}" in "v17") \
echo "We'll need to update wal2json to 2.6+ for pg17 support" && exit 0;; \
esac && \
wget https://github.com/eulerto/wal2json/archive/refs/tags/wal2json_2_5.tar.gz && \
echo "b516653575541cf221b99cf3f8be9b6821f6dbcfc125675c85f35090f824f00e wal2json_2_5.tar.gz" | sha256sum --check && \
mkdir wal2json-src && cd wal2json-src && tar xzf ../wal2json_2_5.tar.gz --strip-components=1 -C . && \
RUN wget https://github.com/eulerto/wal2json/archive/refs/tags/wal2json_2_6.tar.gz -O wal2json.tar.gz && \
echo "18b4bdec28c74a8fc98a11c72de38378a760327ef8e5e42e975b0029eb96ba0d wal2json.tar.gz" | sha256sum --check && \
mkdir wal2json-src && cd wal2json-src && tar xzf ../wal2json.tar.gz --strip-components=1 -C . && \
make -j $(getconf _NPROCESSORS_ONLN) && \
make -j $(getconf _NPROCESSORS_ONLN) install
@@ -962,12 +1011,11 @@ FROM build-deps AS pg-ivm-build
ARG PG_VERSION
COPY --from=pg-build /usr/local/pgsql/ /usr/local/pgsql/
# pg_ivm v1.9 supports v17
# last release v1.9 - Jul 31
ENV PATH="/usr/local/pgsql/bin/:$PATH"
RUN case "${PG_VERSION}" in "v17") \
echo "We'll need to update pg_ivm to 1.9+ for pg17 support" && exit 0;; \
esac && \
wget https://github.com/sraoss/pg_ivm/archive/refs/tags/v1.7.tar.gz -O pg_ivm.tar.gz && \
echo "ebfde04f99203c7be4b0e873f91104090e2e83e5429c32ac242d00f334224d5e pg_ivm.tar.gz" | sha256sum --check && \
RUN wget https://github.com/sraoss/pg_ivm/archive/refs/tags/v1.9.tar.gz -O pg_ivm.tar.gz && \
echo "59e15722939f274650abf637f315dd723c87073496ca77236b044cb205270d8b pg_ivm.tar.gz" | sha256sum --check && \
mkdir pg_ivm-src && cd pg_ivm-src && tar xzf ../pg_ivm.tar.gz --strip-components=1 -C . && \
make -j $(getconf _NPROCESSORS_ONLN) && \
make -j $(getconf _NPROCESSORS_ONLN) install && \
@@ -983,12 +1031,11 @@ FROM build-deps AS pg-partman-build
ARG PG_VERSION
COPY --from=pg-build /usr/local/pgsql/ /usr/local/pgsql/
# should support v17 https://github.com/pgpartman/pg_partman/discussions/693
# last release 5.1.0 Apr 2, 2024
ENV PATH="/usr/local/pgsql/bin/:$PATH"
RUN case "${PG_VERSION}" in "v17") \
echo "pg_partman doesn't support PG17 yet" && exit 0;; \
esac && \
wget https://github.com/pgpartman/pg_partman/archive/refs/tags/v5.0.1.tar.gz -O pg_partman.tar.gz && \
echo "75b541733a9659a6c90dbd40fccb904a630a32880a6e3044d0c4c5f4c8a65525 pg_partman.tar.gz" | sha256sum --check && \
RUN wget https://github.com/pgpartman/pg_partman/archive/refs/tags/v5.1.0.tar.gz -O pg_partman.tar.gz && \
echo "3e3a27d7ff827295d5c55ef72f07a49062d6204b3cb0b9a048645d6db9f3cb9f pg_partman.tar.gz" | sha256sum --check && \
mkdir pg_partman-src && cd pg_partman-src && tar xzf ../pg_partman.tar.gz --strip-components=1 -C . && \
make -j $(getconf _NPROCESSORS_ONLN) && \
make -j $(getconf _NPROCESSORS_ONLN) install && \
@@ -1091,7 +1138,6 @@ RUN cd compute_tools && mold -run cargo build --locked --profile release-line-de
#########################################################################################
FROM debian:$DEBIAN_FLAVOR AS compute-tools-image
ARG DEBIAN_FLAVOR
COPY --from=compute-tools /home/nonroot/target/release-line-debug-size-lto/compute_ctl /usr/local/bin/compute_ctl
@@ -1102,7 +1148,6 @@ COPY --from=compute-tools /home/nonroot/target/release-line-debug-size-lto/compu
#########################################################################################
FROM debian:$DEBIAN_FLAVOR AS pgbouncer
ARG DEBIAN_FLAVOR
RUN set -e \
&& apt-get update \
&& apt-get install --no-install-recommends -y \
@@ -1167,6 +1212,19 @@ RUN rm -r /usr/local/pgsql/include
# if they were to be used by other libraries.
RUN rm /usr/local/pgsql/lib/lib*.a
#########################################################################################
#
# Preprocess the sql_exporter configuration files
#
#########################################################################################
FROM $REPOSITORY/$IMAGE:$TAG AS sql_exporter_preprocessor
ARG PG_VERSION
USER nonroot
COPY --chown=nonroot compute compute
RUN make PG_VERSION="${PG_VERSION}" -C compute
#########################################################################################
#
@@ -1257,7 +1315,7 @@ ENV PGDATABASE=postgres
#
#########################################################################################
FROM debian:$DEBIAN_FLAVOR
ARG DEBIAN_FLAVOR
ARG DEBIAN_VERSION
# Add user postgres
RUN mkdir /var/db && useradd -m -d /var/db/postgres postgres && \
echo "postgres:test_console_pass" | chpasswd && \
@@ -1285,10 +1343,10 @@ RUN mkdir -p /etc/local_proxy && chown postgres:postgres /etc/local_proxy
COPY --from=postgres-exporter /bin/postgres_exporter /bin/postgres_exporter
COPY --from=sql-exporter /bin/sql_exporter /bin/sql_exporter
COPY --chmod=0644 compute/etc/sql_exporter.yml /etc/sql_exporter.yml
COPY --chmod=0644 compute/etc/neon_collector.yml /etc/neon_collector.yml
COPY --chmod=0644 compute/etc/sql_exporter_autoscaling.yml /etc/sql_exporter_autoscaling.yml
COPY --chmod=0644 compute/etc/neon_collector_autoscaling.yml /etc/neon_collector_autoscaling.yml
COPY --from=sql_exporter_preprocessor --chmod=0644 /home/nonroot/compute/etc/sql_exporter.yml /etc/sql_exporter.yml
COPY --from=sql_exporter_preprocessor --chmod=0644 /home/nonroot/compute/etc/neon_collector.yml /etc/neon_collector.yml
COPY --from=sql_exporter_preprocessor --chmod=0644 /home/nonroot/compute/etc/sql_exporter_autoscaling.yml /etc/sql_exporter_autoscaling.yml
COPY --from=sql_exporter_preprocessor --chmod=0644 /home/nonroot/compute/etc/neon_collector_autoscaling.yml /etc/neon_collector_autoscaling.yml
# Create remote extension download directory
RUN mkdir /usr/local/download_extensions && chown -R postgres:postgres /usr/local/download_extensions
@@ -1305,19 +1363,22 @@ RUN mkdir /usr/local/download_extensions && chown -R postgres:postgres /usr/loca
RUN apt update && \
case $DEBIAN_FLAVOR in \
case $DEBIAN_VERSION in \
# Version-specific installs for Bullseye (PG14-PG16):
# libicu67, locales for collations (including ICU and plpgsql_check)
# libgdal28, libproj19 for PostGIS
bullseye*) \
bullseye) \
VERSION_INSTALLS="libicu67 libgdal28 libproj19"; \
;; \
# Version-specific installs for Bookworm (PG17):
# libicu72, locales for collations (including ICU and plpgsql_check)
# libgdal32, libproj25 for PostGIS
bookworm*) \
bookworm) \
VERSION_INSTALLS="libicu72 libgdal32 libproj25"; \
;; \
*) \
echo "Unknown Debian version ${DEBIAN_VERSION}" && exit 1 \
;; \
esac && \
apt install --no-install-recommends -y \
gdb \

17
compute/etc/README.md Normal file
View File

@@ -0,0 +1,17 @@
# Compute Configuration
These files are the configuration files for various other pieces of software
that will be running in the compute alongside Postgres.
## `sql_exporter`
### Adding a `sql_exporter` Metric
We use `sql_exporter` to export various metrics from Postgres. In order to add
a metric, you will need to create two files: a `libsonnet` and a `sql` file. You
will then import the `libsonnet` file in one of the collector files, and the
`sql` file will be imported in the `libsonnet` file.
In the event your statistic is an LSN, you may want to cast it to a `float8`
because Prometheus only supports floats. It's probably fine because `float8` can
store integers from `-2^53` to `+2^53` exactly.

View File

@@ -0,0 +1,51 @@
{
collector_name: 'neon_collector',
metrics: [
import 'sql_exporter/checkpoints_req.libsonnet',
import 'sql_exporter/checkpoints_timed.libsonnet',
import 'sql_exporter/compute_current_lsn.libsonnet',
import 'sql_exporter/compute_logical_snapshot_files.libsonnet',
import 'sql_exporter/compute_receive_lsn.libsonnet',
import 'sql_exporter/compute_subscriptions_count.libsonnet',
import 'sql_exporter/connection_counts.libsonnet',
import 'sql_exporter/db_total_size.libsonnet',
import 'sql_exporter/file_cache_read_wait_seconds_bucket.libsonnet',
import 'sql_exporter/file_cache_read_wait_seconds_count.libsonnet',
import 'sql_exporter/file_cache_read_wait_seconds_sum.libsonnet',
import 'sql_exporter/file_cache_write_wait_seconds_bucket.libsonnet',
import 'sql_exporter/file_cache_write_wait_seconds_count.libsonnet',
import 'sql_exporter/file_cache_write_wait_seconds_sum.libsonnet',
import 'sql_exporter/getpage_prefetch_discards_total.libsonnet',
import 'sql_exporter/getpage_prefetch_misses_total.libsonnet',
import 'sql_exporter/getpage_prefetch_requests_total.libsonnet',
import 'sql_exporter/getpage_prefetches_buffered.libsonnet',
import 'sql_exporter/getpage_sync_requests_total.libsonnet',
import 'sql_exporter/getpage_wait_seconds_bucket.libsonnet',
import 'sql_exporter/getpage_wait_seconds_count.libsonnet',
import 'sql_exporter/getpage_wait_seconds_sum.libsonnet',
import 'sql_exporter/lfc_approximate_working_set_size.libsonnet',
import 'sql_exporter/lfc_approximate_working_set_size_windows.libsonnet',
import 'sql_exporter/lfc_cache_size_limit.libsonnet',
import 'sql_exporter/lfc_hits.libsonnet',
import 'sql_exporter/lfc_misses.libsonnet',
import 'sql_exporter/lfc_used.libsonnet',
import 'sql_exporter/lfc_writes.libsonnet',
import 'sql_exporter/logical_slot_restart_lsn.libsonnet',
import 'sql_exporter/max_cluster_size.libsonnet',
import 'sql_exporter/pageserver_disconnects_total.libsonnet',
import 'sql_exporter/pageserver_requests_sent_total.libsonnet',
import 'sql_exporter/pageserver_send_flushes_total.libsonnet',
import 'sql_exporter/pageserver_open_requests.libsonnet',
import 'sql_exporter/pg_stats_userdb.libsonnet',
import 'sql_exporter/replication_delay_bytes.libsonnet',
import 'sql_exporter/replication_delay_seconds.libsonnet',
import 'sql_exporter/retained_wal.libsonnet',
import 'sql_exporter/wal_is_lost.libsonnet',
],
queries: [
{
query_name: 'neon_perf_counters',
query: importstr 'sql_exporter/neon_perf_counters.sql',
},
],
}

View File

@@ -1,331 +0,0 @@
collector_name: neon_collector
metrics:
- metric_name: lfc_misses
type: gauge
help: 'lfc_misses'
key_labels:
values: [lfc_misses]
query: |
select lfc_value as lfc_misses from neon.neon_lfc_stats where lfc_key='file_cache_misses';
- metric_name: lfc_used
type: gauge
help: 'LFC chunks used (chunk = 1MB)'
key_labels:
values: [lfc_used]
query: |
select lfc_value as lfc_used from neon.neon_lfc_stats where lfc_key='file_cache_used';
- metric_name: lfc_hits
type: gauge
help: 'lfc_hits'
key_labels:
values: [lfc_hits]
query: |
select lfc_value as lfc_hits from neon.neon_lfc_stats where lfc_key='file_cache_hits';
- metric_name: lfc_writes
type: gauge
help: 'lfc_writes'
key_labels:
values: [lfc_writes]
query: |
select lfc_value as lfc_writes from neon.neon_lfc_stats where lfc_key='file_cache_writes';
- metric_name: lfc_cache_size_limit
type: gauge
help: 'LFC cache size limit in bytes'
key_labels:
values: [lfc_cache_size_limit]
query: |
select pg_size_bytes(current_setting('neon.file_cache_size_limit')) as lfc_cache_size_limit;
- metric_name: connection_counts
type: gauge
help: 'Connection counts'
key_labels:
- datname
- state
values: [count]
query: |
select datname, state, count(*) as count from pg_stat_activity where state <> '' group by datname, state;
- metric_name: pg_stats_userdb
type: gauge
help: 'Stats for several oldest non-system dbs'
key_labels:
- datname
value_label: kind
values:
- db_size
- deadlocks
# Rows
- inserted
- updated
- deleted
# We export stats for 10 non-system database. Without this limit
# it is too easy to abuse the system by creating lots of databases.
query: |
select pg_database_size(datname) as db_size, deadlocks,
tup_inserted as inserted, tup_updated as updated, tup_deleted as deleted,
datname
from pg_stat_database
where datname IN (
select datname
from pg_database
where datname <> 'postgres' and not datistemplate
order by oid
limit 10
);
- metric_name: max_cluster_size
type: gauge
help: 'neon.max_cluster_size setting'
key_labels:
values: [max_cluster_size]
query: |
select setting::int as max_cluster_size from pg_settings where name = 'neon.max_cluster_size';
- metric_name: db_total_size
type: gauge
help: 'Size of all databases'
key_labels:
values: [total]
query: |
select sum(pg_database_size(datname)) as total from pg_database;
- metric_name: getpage_wait_seconds_count
type: counter
help: 'Number of getpage requests'
values: [getpage_wait_seconds_count]
query_ref: neon_perf_counters
- metric_name: getpage_wait_seconds_sum
type: counter
help: 'Time spent in getpage requests'
values: [getpage_wait_seconds_sum]
query_ref: neon_perf_counters
- metric_name: getpage_prefetch_requests_total
type: counter
help: 'Number of getpage issued for prefetching'
values: [getpage_prefetch_requests_total]
query_ref: neon_perf_counters
- metric_name: getpage_sync_requests_total
type: counter
help: 'Number of synchronous getpage issued'
values: [getpage_sync_requests_total]
query_ref: neon_perf_counters
- metric_name: getpage_prefetch_misses_total
type: counter
help: 'Total number of readahead misses; consisting of either prefetches that don''t satisfy the LSN bounds once the prefetch got read by the backend, or cases where somehow no readahead was issued for the read'
values: [getpage_prefetch_misses_total]
query_ref: neon_perf_counters
- metric_name: getpage_prefetch_discards_total
type: counter
help: 'Number of prefetch responses issued but not used'
values: [getpage_prefetch_discards_total]
query_ref: neon_perf_counters
- metric_name: pageserver_requests_sent_total
type: counter
help: 'Number of all requests sent to the pageserver (not just GetPage requests)'
values: [pageserver_requests_sent_total]
query_ref: neon_perf_counters
- metric_name: pageserver_disconnects_total
type: counter
help: 'Number of times that the connection to the pageserver was lost'
values: [pageserver_disconnects_total]
query_ref: neon_perf_counters
- metric_name: pageserver_send_flushes_total
type: counter
help: 'Number of flushes to the pageserver connection'
values: [pageserver_send_flushes_total]
query_ref: neon_perf_counters
- metric_name: getpage_wait_seconds_bucket
type: counter
help: 'Histogram buckets of getpage request latency'
key_labels:
- bucket_le
values: [value]
query_ref: getpage_wait_seconds_buckets
# DEPRECATED
- metric_name: lfc_approximate_working_set_size
type: gauge
help: 'Approximate working set size in pages of 8192 bytes'
key_labels:
values: [approximate_working_set_size]
query: |
select neon.approximate_working_set_size(false) as approximate_working_set_size;
- metric_name: lfc_approximate_working_set_size_windows
type: gauge
help: 'Approximate working set size in pages of 8192 bytes'
key_labels: [duration]
values: [size]
# NOTE: This is the "public" / "human-readable" version. Here, we supply a small selection
# of durations in a pretty-printed form.
query: |
select
x as duration,
neon.approximate_working_set_size_seconds(extract('epoch' from x::interval)::int) as size
from
(values ('5m'),('15m'),('1h')) as t (x);
- metric_name: compute_current_lsn
type: gauge
help: 'Current LSN of the database'
key_labels:
values: [lsn]
query: |
select
case
when pg_catalog.pg_is_in_recovery()
then (pg_last_wal_replay_lsn() - '0/0')::FLOAT8
else (pg_current_wal_lsn() - '0/0')::FLOAT8
end as lsn;
- metric_name: compute_receive_lsn
type: gauge
help: 'Returns the last write-ahead log location that has been received and synced to disk by streaming replication'
key_labels:
values: [lsn]
query: |
SELECT
CASE
WHEN pg_catalog.pg_is_in_recovery()
THEN (pg_last_wal_receive_lsn() - '0/0')::FLOAT8
ELSE 0
END AS lsn;
- metric_name: replication_delay_bytes
type: gauge
help: 'Bytes between received and replayed LSN'
key_labels:
values: [replication_delay_bytes]
# We use a GREATEST call here because this calculation can be negative.
# The calculation is not atomic, meaning after we've gotten the receive
# LSN, the replay LSN may have advanced past the receive LSN we
# are using for the calculation.
query: |
SELECT GREATEST(0, pg_wal_lsn_diff(pg_last_wal_receive_lsn(), pg_last_wal_replay_lsn())) AS replication_delay_bytes;
- metric_name: replication_delay_seconds
type: gauge
help: 'Time since last LSN was replayed'
key_labels:
values: [replication_delay_seconds]
query: |
SELECT
CASE
WHEN pg_last_wal_receive_lsn() = pg_last_wal_replay_lsn() THEN 0
ELSE GREATEST (0, EXTRACT (EPOCH FROM now() - pg_last_xact_replay_timestamp()))
END AS replication_delay_seconds;
- metric_name: checkpoints_req
type: gauge
help: 'Number of requested checkpoints'
key_labels:
values: [checkpoints_req]
query: |
SELECT checkpoints_req FROM pg_stat_bgwriter;
- metric_name: checkpoints_timed
type: gauge
help: 'Number of scheduled checkpoints'
key_labels:
values: [checkpoints_timed]
query: |
SELECT checkpoints_timed FROM pg_stat_bgwriter;
- metric_name: compute_logical_snapshot_files
type: gauge
help: 'Number of snapshot files in pg_logical/snapshot'
key_labels:
- timeline_id
values: [num_logical_snapshot_files]
query: |
SELECT
(SELECT setting FROM pg_settings WHERE name = 'neon.timeline_id') AS timeline_id,
-- Postgres creates temporary snapshot files of the form %X-%X.snap.%d.tmp. These
-- temporary snapshot files are renamed to the actual snapshot files after they are
-- completely built. We only WAL-log the completely built snapshot files.
(SELECT COUNT(*) FROM pg_ls_dir('pg_logical/snapshots') AS name WHERE name LIKE '%.snap') AS num_logical_snapshot_files;
# In all the below metrics, we cast LSNs to floats because Prometheus only supports floats.
# It's probably fine because float64 can store integers from -2^53 to +2^53 exactly.
# Number of slots is limited by max_replication_slots, so collecting position for all of them shouldn't be bad.
- metric_name: logical_slot_restart_lsn
type: gauge
help: 'restart_lsn of logical slots'
key_labels:
- slot_name
values: [restart_lsn]
query: |
select slot_name, (restart_lsn - '0/0')::FLOAT8 as restart_lsn
from pg_replication_slots
where slot_type = 'logical';
- metric_name: compute_subscriptions_count
type: gauge
help: 'Number of logical replication subscriptions grouped by enabled/disabled'
key_labels:
- enabled
values: [subscriptions_count]
query: |
select subenabled::text as enabled, count(*) as subscriptions_count
from pg_subscription
group by subenabled;
- metric_name: retained_wal
type: gauge
help: 'Retained WAL in inactive replication slots'
key_labels:
- slot_name
values: [retained_wal]
query: |
SELECT slot_name, pg_wal_lsn_diff(pg_current_wal_lsn(), restart_lsn)::FLOAT8 AS retained_wal
FROM pg_replication_slots
WHERE active = false;
- metric_name: wal_is_lost
type: gauge
help: 'Whether or not the replication slot wal_status is lost'
key_labels:
- slot_name
values: [wal_is_lost]
query: |
SELECT slot_name,
CASE WHEN wal_status = 'lost' THEN 1 ELSE 0 END AS wal_is_lost
FROM pg_replication_slots;
queries:
- query_name: neon_perf_counters
query: |
WITH c AS (
SELECT pg_catalog.jsonb_object_agg(metric, value) jb FROM neon.neon_perf_counters
)
SELECT d.*
FROM pg_catalog.jsonb_to_record((select jb from c)) as d(
getpage_wait_seconds_count numeric,
getpage_wait_seconds_sum numeric,
getpage_prefetch_requests_total numeric,
getpage_sync_requests_total numeric,
getpage_prefetch_misses_total numeric,
getpage_prefetch_discards_total numeric,
pageserver_requests_sent_total numeric,
pageserver_disconnects_total numeric,
pageserver_send_flushes_total numeric
);
- query_name: getpage_wait_seconds_buckets
query: |
SELECT bucket_le, value FROM neon.neon_perf_counters WHERE metric = 'getpage_wait_seconds_bucket';

View File

@@ -0,0 +1,11 @@
{
collector_name: 'neon_collector_autoscaling',
metrics: [
import 'sql_exporter/lfc_approximate_working_set_size_windows.autoscaling.libsonnet',
import 'sql_exporter/lfc_cache_size_limit.libsonnet',
import 'sql_exporter/lfc_hits.libsonnet',
import 'sql_exporter/lfc_misses.libsonnet',
import 'sql_exporter/lfc_used.libsonnet',
import 'sql_exporter/lfc_writes.libsonnet',
],
}

View File

@@ -1,55 +0,0 @@
collector_name: neon_collector_autoscaling
metrics:
- metric_name: lfc_misses
type: gauge
help: 'lfc_misses'
key_labels:
values: [lfc_misses]
query: |
select lfc_value as lfc_misses from neon.neon_lfc_stats where lfc_key='file_cache_misses';
- metric_name: lfc_used
type: gauge
help: 'LFC chunks used (chunk = 1MB)'
key_labels:
values: [lfc_used]
query: |
select lfc_value as lfc_used from neon.neon_lfc_stats where lfc_key='file_cache_used';
- metric_name: lfc_hits
type: gauge
help: 'lfc_hits'
key_labels:
values: [lfc_hits]
query: |
select lfc_value as lfc_hits from neon.neon_lfc_stats where lfc_key='file_cache_hits';
- metric_name: lfc_writes
type: gauge
help: 'lfc_writes'
key_labels:
values: [lfc_writes]
query: |
select lfc_value as lfc_writes from neon.neon_lfc_stats where lfc_key='file_cache_writes';
- metric_name: lfc_cache_size_limit
type: gauge
help: 'LFC cache size limit in bytes'
key_labels:
values: [lfc_cache_size_limit]
query: |
select pg_size_bytes(current_setting('neon.file_cache_size_limit')) as lfc_cache_size_limit;
- metric_name: lfc_approximate_working_set_size_windows
type: gauge
help: 'Approximate working set size in pages of 8192 bytes'
key_labels: [duration_seconds]
values: [size]
# NOTE: This is the "internal" / "machine-readable" version. This outputs the working set
# size looking back 1..60 minutes, labeled with the number of minutes.
query: |
select
x::text as duration_seconds,
neon.approximate_working_set_size_seconds(x) as size
from
(select generate_series * 60 as x from generate_series(1, 60)) as t (x);

View File

@@ -0,0 +1,40 @@
function(collector_name, collector_file, application_name='sql_exporter') {
// Configuration for sql_exporter for autoscaling-agent
// Global defaults.
global: {
// If scrape_timeout <= 0, no timeout is set unless Prometheus provides one. The default is 10s.
scrape_timeout: '10s',
// Subtracted from Prometheus' scrape_timeout to give us some headroom and prevent Prometheus from timing out first.
scrape_timeout_offset: '500ms',
// Minimum interval between collector runs: by default (0s) collectors are executed on every scrape.
min_interval: '0s',
// Maximum number of open connections to any one target. Metric queries will run concurrently on multiple connections,
// as will concurrent scrapes.
max_connections: 1,
// Maximum number of idle connections to any one target. Unless you use very long collection intervals, this should
// always be the same as max_connections.
max_idle_connections: 1,
// Maximum number of maximum amount of time a connection may be reused. Expired connections may be closed lazily before reuse.
// If 0, connections are not closed due to a connection's age.
max_connection_lifetime: '5m',
},
// The target to monitor and the collectors to execute on it.
target: {
// Data source name always has a URI schema that matches the driver name. In some cases (e.g. MySQL)
// the schema gets dropped or replaced to match the driver expected DSN format.
data_source_name: std.format('postgresql://cloud_admin@127.0.0.1:5432/postgres?sslmode=disable&application_name=%s', [application_name]),
// Collectors (referenced by name) to execute on the target.
// Glob patterns are supported (see <https://pkg.go.dev/path/filepath#Match> for syntax).
collectors: [
collector_name,
],
},
// Collector files specifies a list of globs. One collector definition is read from each matching file.
// Glob patterns are supported (see <https://pkg.go.dev/path/filepath#Match> for syntax).
collector_files: [
collector_file,
],
}

View File

@@ -1,33 +0,0 @@
# Configuration for sql_exporter
# Global defaults.
global:
# If scrape_timeout <= 0, no timeout is set unless Prometheus provides one. The default is 10s.
scrape_timeout: 10s
# Subtracted from Prometheus' scrape_timeout to give us some headroom and prevent Prometheus from timing out first.
scrape_timeout_offset: 500ms
# Minimum interval between collector runs: by default (0s) collectors are executed on every scrape.
min_interval: 0s
# Maximum number of open connections to any one target. Metric queries will run concurrently on multiple connections,
# as will concurrent scrapes.
max_connections: 1
# Maximum number of idle connections to any one target. Unless you use very long collection intervals, this should
# always be the same as max_connections.
max_idle_connections: 1
# Maximum number of maximum amount of time a connection may be reused. Expired connections may be closed lazily before reuse.
# If 0, connections are not closed due to a connection's age.
max_connection_lifetime: 5m
# The target to monitor and the collectors to execute on it.
target:
# Data source name always has a URI schema that matches the driver name. In some cases (e.g. MySQL)
# the schema gets dropped or replaced to match the driver expected DSN format.
data_source_name: 'postgresql://cloud_admin@127.0.0.1:5432/postgres?sslmode=disable&application_name=sql_exporter'
# Collectors (referenced by name) to execute on the target.
# Glob patterns are supported (see <https://pkg.go.dev/path/filepath#Match> for syntax).
collectors: [neon_collector]
# Collector files specifies a list of globs. One collector definition is read from each matching file.
# Glob patterns are supported (see <https://pkg.go.dev/path/filepath#Match> for syntax).
collector_files:
- "neon_collector.yml"

View File

@@ -0,0 +1 @@
SELECT num_requested AS checkpoints_req FROM pg_stat_checkpointer;

View File

@@ -0,0 +1,15 @@
local neon = import 'neon.libsonnet';
local pg_stat_bgwriter = importstr 'sql_exporter/checkpoints_req.sql';
local pg_stat_checkpointer = importstr 'sql_exporter/checkpoints_req.17.sql';
{
metric_name: 'checkpoints_req',
type: 'gauge',
help: 'Number of requested checkpoints',
key_labels: null,
values: [
'checkpoints_req',
],
query: if neon.PG_MAJORVERSION_NUM < 17 then pg_stat_bgwriter else pg_stat_checkpointer,
}

View File

@@ -0,0 +1 @@
SELECT checkpoints_req FROM pg_stat_bgwriter;

View File

@@ -0,0 +1 @@
SELECT num_timed AS checkpoints_timed FROM pg_stat_checkpointer;

View File

@@ -0,0 +1,15 @@
local neon = import 'neon.libsonnet';
local pg_stat_bgwriter = importstr 'sql_exporter/checkpoints_timed.sql';
local pg_stat_checkpointer = importstr 'sql_exporter/checkpoints_timed.17.sql';
{
metric_name: 'checkpoints_timed',
type: 'gauge',
help: 'Number of scheduled checkpoints',
key_labels: null,
values: [
'checkpoints_timed',
],
query: if neon.PG_MAJORVERSION_NUM < 17 then pg_stat_bgwriter else pg_stat_checkpointer,
}

View File

@@ -0,0 +1 @@
SELECT checkpoints_timed FROM pg_stat_bgwriter;

View File

@@ -0,0 +1,10 @@
{
metric_name: 'compute_current_lsn',
type: 'gauge',
help: 'Current LSN of the database',
key_labels: null,
values: [
'lsn',
],
query: importstr 'sql_exporter/compute_current_lsn.sql',
}

View File

@@ -0,0 +1,4 @@
SELECT CASE
WHEN pg_catalog.pg_is_in_recovery() THEN (pg_last_wal_replay_lsn() - '0/0')::FLOAT8
ELSE (pg_current_wal_lsn() - '0/0')::FLOAT8
END AS lsn;

View File

@@ -0,0 +1,12 @@
{
metric_name: 'compute_logical_snapshot_files',
type: 'gauge',
help: 'Number of snapshot files in pg_logical/snapshot',
key_labels: [
'timeline_id',
],
values: [
'num_logical_snapshot_files',
],
query: importstr 'sql_exporter/compute_logical_snapshot_files.sql',
}

View File

@@ -0,0 +1,7 @@
SELECT
(SELECT setting FROM pg_settings WHERE name = 'neon.timeline_id') AS timeline_id,
-- Postgres creates temporary snapshot files of the form %X-%X.snap.%d.tmp.
-- These temporary snapshot files are renamed to the actual snapshot files
-- after they are completely built. We only WAL-log the completely built
-- snapshot files
(SELECT COUNT(*) FROM pg_ls_dir('pg_logical/snapshots') AS name WHERE name LIKE '%.snap') AS num_logical_snapshot_files;

View File

@@ -0,0 +1,10 @@
{
metric_name: 'compute_receive_lsn',
type: 'gauge',
help: 'Returns the last write-ahead log location that has been received and synced to disk by streaming replication',
key_labels: null,
values: [
'lsn',
],
query: importstr 'sql_exporter/compute_receive_lsn.sql',
}

View File

@@ -0,0 +1,4 @@
SELECT CASE
WHEN pg_catalog.pg_is_in_recovery() THEN (pg_last_wal_receive_lsn() - '0/0')::FLOAT8
ELSE 0
END AS lsn;

View File

@@ -0,0 +1,12 @@
{
metric_name: 'compute_subscriptions_count',
type: 'gauge',
help: 'Number of logical replication subscriptions grouped by enabled/disabled',
key_labels: [
'enabled',
],
values: [
'subscriptions_count',
],
query: importstr 'sql_exporter/compute_subscriptions_count.sql',
}

View File

@@ -0,0 +1 @@
SELECT subenabled::text AS enabled, count(*) AS subscriptions_count FROM pg_subscription GROUP BY subenabled;

View File

@@ -0,0 +1,13 @@
{
metric_name: 'connection_counts',
type: 'gauge',
help: 'Connection counts',
key_labels: [
'datname',
'state',
],
values: [
'count',
],
query: importstr 'sql_exporter/connection_counts.sql',
}

View File

@@ -0,0 +1 @@
SELECT datname, state, count(*) AS count FROM pg_stat_activity WHERE state <> '' GROUP BY datname, state;

View File

@@ -0,0 +1,10 @@
{
metric_name: 'db_total_size',
type: 'gauge',
help: 'Size of all databases',
key_labels: null,
values: [
'total',
],
query: importstr 'sql_exporter/db_total_size.sql',
}

View File

@@ -0,0 +1 @@
SELECT sum(pg_database_size(datname)) AS total FROM pg_database;

View File

@@ -0,0 +1,12 @@
{
metric_name: 'file_cache_read_wait_seconds_bucket',
type: 'counter',
help: 'Histogram buckets of LFC read operation latencies',
key_labels: [
'bucket_le',
],
values: [
'value',
],
query: importstr 'sql_exporter/file_cache_read_wait_seconds_bucket.sql',
}

View File

@@ -0,0 +1 @@
SELECT bucket_le, value FROM neon.neon_perf_counters WHERE metric = 'file_cache_read_wait_seconds_bucket';

View File

@@ -0,0 +1,9 @@
{
metric_name: 'file_cache_read_wait_seconds_count',
type: 'counter',
help: 'Number of read operations in LFC',
values: [
'file_cache_read_wait_seconds_count',
],
query_ref: 'neon_perf_counters',
}

View File

@@ -0,0 +1,9 @@
{
metric_name: 'file_cache_read_wait_seconds_sum',
type: 'counter',
help: 'Time spent in LFC read operations',
values: [
'file_cache_read_wait_seconds_sum',
],
query_ref: 'neon_perf_counters',
}

View File

@@ -0,0 +1,12 @@
{
metric_name: 'file_cache_write_wait_seconds_bucket',
type: 'counter',
help: 'Histogram buckets of LFC write operation latencies',
key_labels: [
'bucket_le',
],
values: [
'value',
],
query: importstr 'sql_exporter/file_cache_write_wait_seconds_bucket.sql',
}

View File

@@ -0,0 +1 @@
SELECT bucket_le, value FROM neon.neon_perf_counters WHERE metric = 'file_cache_write_wait_seconds_bucket';

View File

@@ -0,0 +1,9 @@
{
metric_name: 'file_cache_write_wait_seconds_count',
type: 'counter',
help: 'Number of write operations in LFC',
values: [
'file_cache_write_wait_seconds_count',
],
query_ref: 'neon_perf_counters',
}

View File

@@ -0,0 +1,9 @@
{
metric_name: 'file_cache_write_wait_seconds_sum',
type: 'counter',
help: 'Time spent in LFC write operations',
values: [
'file_cache_write_wait_seconds_sum',
],
query_ref: 'neon_perf_counters',
}

View File

@@ -0,0 +1,9 @@
{
metric_name: 'getpage_prefetch_discards_total',
type: 'counter',
help: 'Number of prefetch responses issued but not used',
values: [
'getpage_prefetch_discards_total',
],
query_ref: 'neon_perf_counters',
}

View File

@@ -0,0 +1,9 @@
{
metric_name: 'getpage_prefetch_misses_total',
type: 'counter',
help: "Total number of readahead misses; consisting of either prefetches that don't satisfy the LSN bounds once the prefetch got read by the backend, or cases where somehow no readahead was issued for the read",
values: [
'getpage_prefetch_misses_total',
],
query_ref: 'neon_perf_counters',
}

View File

@@ -0,0 +1,9 @@
{
metric_name: 'getpage_prefetch_requests_total',
type: 'counter',
help: 'Number of getpage issued for prefetching',
values: [
'getpage_prefetch_requests_total',
],
query_ref: 'neon_perf_counters',
}

View File

@@ -0,0 +1,9 @@
{
metric_name: 'getpage_prefetches_buffered',
type: 'gauge',
help: 'Number of prefetched pages buffered in neon',
values: [
'getpage_prefetches_buffered',
],
query_ref: 'neon_perf_counters',
}

View File

@@ -0,0 +1,9 @@
{
metric_name: 'getpage_sync_requests_total',
type: 'counter',
help: 'Number of synchronous getpage issued',
values: [
'getpage_sync_requests_total',
],
query_ref: 'neon_perf_counters',
}

View File

@@ -0,0 +1,12 @@
{
metric_name: 'getpage_wait_seconds_bucket',
type: 'counter',
help: 'Histogram buckets of getpage request latency',
key_labels: [
'bucket_le',
],
values: [
'value',
],
query: importstr 'sql_exporter/getpage_wait_seconds_bucket.sql',
}

View File

@@ -0,0 +1 @@
SELECT bucket_le, value FROM neon.neon_perf_counters WHERE metric = 'getpage_wait_seconds_bucket';

View File

@@ -0,0 +1,9 @@
{
metric_name: 'getpage_wait_seconds_count',
type: 'counter',
help: 'Number of getpage requests',
values: [
'getpage_wait_seconds_count',
],
query_ref: 'neon_perf_counters',
}

View File

@@ -0,0 +1,9 @@
{
metric_name: 'getpage_wait_seconds_sum',
type: 'counter',
help: 'Time spent in getpage requests',
values: [
'getpage_wait_seconds_sum',
],
query_ref: 'neon_perf_counters',
}

View File

@@ -0,0 +1,12 @@
// DEPRECATED
{
metric_name: 'lfc_approximate_working_set_size',
type: 'gauge',
help: 'Approximate working set size in pages of 8192 bytes',
key_labels: null,
values: [
'approximate_working_set_size',
],
query: importstr 'sql_exporter/lfc_approximate_working_set_size.sql',
}

View File

@@ -0,0 +1 @@
SELECT neon.approximate_working_set_size(false) AS approximate_working_set_size;

View File

@@ -0,0 +1,12 @@
{
metric_name: 'lfc_approximate_working_set_size_windows',
type: 'gauge',
help: 'Approximate working set size in pages of 8192 bytes',
key_labels: [
'duration_seconds',
],
values: [
'size',
],
query: importstr 'sql_exporter/lfc_approximate_working_set_size_windows.autoscaling.sql',
}

View File

@@ -0,0 +1,8 @@
-- NOTE: This is the "internal" / "machine-readable" version. This outputs the
-- working set size looking back 1..60 minutes, labeled with the number of
-- minutes.
SELECT
x::text as duration_seconds,
neon.approximate_working_set_size_seconds(x) AS size
FROM (SELECT generate_series * 60 AS x FROM generate_series(1, 60)) AS t (x);

View File

@@ -0,0 +1,12 @@
{
metric_name: 'lfc_approximate_working_set_size_windows',
type: 'gauge',
help: 'Approximate working set size in pages of 8192 bytes',
key_labels: [
'duration',
],
values: [
'size',
],
query: importstr 'sql_exporter/lfc_approximate_working_set_size_windows.sql',
}

View File

@@ -0,0 +1,8 @@
-- NOTE: This is the "public" / "human-readable" version. Here, we supply a
-- small selection of durations in a pretty-printed form.
SELECT
x AS duration,
neon.approximate_working_set_size_seconds(extract('epoch' FROM x::interval)::int) AS size FROM (
VALUES ('5m'), ('15m'), ('1h')
) AS t (x);

View File

@@ -0,0 +1,10 @@
{
metric_name: 'lfc_cache_size_limit',
type: 'gauge',
help: 'LFC cache size limit in bytes',
key_labels: null,
values: [
'lfc_cache_size_limit',
],
query: importstr 'sql_exporter/lfc_cache_size_limit.sql',
}

View File

@@ -0,0 +1 @@
SELECT pg_size_bytes(current_setting('neon.file_cache_size_limit')) AS lfc_cache_size_limit;

View File

@@ -0,0 +1,10 @@
{
metric_name: 'lfc_hits',
type: 'gauge',
help: 'lfc_hits',
key_labels: null,
values: [
'lfc_hits',
],
query: importstr 'sql_exporter/lfc_hits.sql',
}

View File

@@ -0,0 +1 @@
SELECT lfc_value AS lfc_hits FROM neon.neon_lfc_stats WHERE lfc_key = 'file_cache_hits';

View File

@@ -0,0 +1,10 @@
{
metric_name: 'lfc_misses',
type: 'gauge',
help: 'lfc_misses',
key_labels: null,
values: [
'lfc_misses',
],
query: importstr 'sql_exporter/lfc_misses.sql',
}

View File

@@ -0,0 +1 @@
SELECT lfc_value AS lfc_misses FROM neon.neon_lfc_stats WHERE lfc_key = 'file_cache_misses';

View File

@@ -0,0 +1,10 @@
{
metric_name: 'lfc_used',
type: 'gauge',
help: 'LFC chunks used (chunk = 1MB)',
key_labels: null,
values: [
'lfc_used',
],
query: importstr 'sql_exporter/lfc_used.sql',
}

View File

@@ -0,0 +1 @@
SELECT lfc_value AS lfc_used FROM neon.neon_lfc_stats WHERE lfc_key = 'file_cache_used';

View File

@@ -0,0 +1,10 @@
{
metric_name: 'lfc_writes',
type: 'gauge',
help: 'lfc_writes',
key_labels: null,
values: [
'lfc_writes',
],
query: importstr 'sql_exporter/lfc_writes.sql',
}

View File

@@ -0,0 +1 @@
SELECT lfc_value AS lfc_writes FROM neon.neon_lfc_stats WHERE lfc_key = 'file_cache_writes';

View File

@@ -0,0 +1,15 @@
// Number of slots is limited by max_replication_slots, so collecting position
// for all of them shouldn't be bad.
{
metric_name: 'logical_slot_restart_lsn',
type: 'gauge',
help: 'restart_lsn of logical slots',
key_labels: [
'slot_name',
],
values: [
'restart_lsn',
],
query: importstr 'sql_exporter/logical_slot_restart_lsn.sql',
}

View File

@@ -0,0 +1,3 @@
SELECT slot_name, (restart_lsn - '0/0')::FLOAT8 as restart_lsn
FROM pg_replication_slots
WHERE slot_type = 'logical';

View File

@@ -0,0 +1,10 @@
{
metric_name: 'max_cluster_size',
type: 'gauge',
help: 'neon.max_cluster_size setting',
key_labels: null,
values: [
'max_cluster_size',
],
query: importstr 'sql_exporter/max_cluster_size.sql',
}

View File

@@ -0,0 +1 @@
SELECT setting::int AS max_cluster_size FROM pg_settings WHERE name = 'neon.max_cluster_size';

View File

@@ -0,0 +1,19 @@
WITH c AS (SELECT pg_catalog.jsonb_object_agg(metric, value) jb FROM neon.neon_perf_counters)
SELECT d.* FROM pg_catalog.jsonb_to_record((SELECT jb FROM c)) AS d(
file_cache_read_wait_seconds_count numeric,
file_cache_read_wait_seconds_sum numeric,
file_cache_write_wait_seconds_count numeric,
file_cache_write_wait_seconds_sum numeric,
getpage_wait_seconds_count numeric,
getpage_wait_seconds_sum numeric,
getpage_prefetch_requests_total numeric,
getpage_sync_requests_total numeric,
getpage_prefetch_misses_total numeric,
getpage_prefetch_discards_total numeric,
getpage_prefetches_buffered numeric,
pageserver_requests_sent_total numeric,
pageserver_disconnects_total numeric,
pageserver_send_flushes_total numeric,
pageserver_open_requests numeric
);

View File

@@ -0,0 +1,9 @@
{
metric_name: 'pageserver_disconnects_total',
type: 'counter',
help: 'Number of times that the connection to the pageserver was lost',
values: [
'pageserver_disconnects_total',
],
query_ref: 'neon_perf_counters',
}

View File

@@ -0,0 +1,9 @@
{
metric_name: 'pageserver_open_requests',
type: 'gauge',
help: 'Number of open requests to PageServer',
values: [
'pageserver_open_requests',
],
query_ref: 'neon_perf_counters',
}

View File

@@ -0,0 +1,9 @@
{
metric_name: 'pageserver_requests_sent_total',
type: 'counter',
help: 'Number of all requests sent to the pageserver (not just GetPage requests)',
values: [
'pageserver_requests_sent_total',
],
query_ref: 'neon_perf_counters',
}

View File

@@ -0,0 +1,9 @@
{
metric_name: 'pageserver_send_flushes_total',
type: 'counter',
help: 'Number of flushes to the pageserver connection',
values: [
'pageserver_send_flushes_total',
],
query_ref: 'neon_perf_counters',
}

View File

@@ -0,0 +1,18 @@
{
metric_name: 'pg_stats_userdb',
type: 'gauge',
help: 'Stats for several oldest non-system dbs',
key_labels: [
'datname',
],
value_label: 'kind',
values: [
'db_size',
'deadlocks',
// Rows
'inserted',
'updated',
'deleted',
],
query: importstr 'sql_exporter/pg_stats_userdb.sql',
}

View File

@@ -0,0 +1,10 @@
-- We export stats for 10 non-system databases. Without this limit it is too
-- easy to abuse the system by creating lots of databases.
SELECT pg_database_size(datname) AS db_size, deadlocks, tup_inserted AS inserted,
tup_updated AS updated, tup_deleted AS deleted, datname
FROM pg_stat_database
WHERE datname IN (
SELECT datname FROM pg_database
WHERE datname <> 'postgres' AND NOT datistemplate ORDER BY oid LIMIT 10
);

View File

@@ -0,0 +1,10 @@
{
metric_name: 'replication_delay_bytes',
type: 'gauge',
help: 'Bytes between received and replayed LSN',
key_labels: null,
values: [
'replication_delay_bytes',
],
query: importstr 'sql_exporter/replication_delay_bytes.sql',
}

View File

@@ -0,0 +1,6 @@
-- We use a GREATEST call here because this calculation can be negative. The
-- calculation is not atomic, meaning after we've gotten the receive LSN, the
-- replay LSN may have advanced past the receive LSN we are using for the
-- calculation.
SELECT GREATEST(0, pg_wal_lsn_diff(pg_last_wal_receive_lsn(), pg_last_wal_replay_lsn())) AS replication_delay_bytes;

View File

@@ -0,0 +1,10 @@
{
metric_name: 'replication_delay_seconds',
type: 'gauge',
help: 'Time since last LSN was replayed',
key_labels: null,
values: [
'replication_delay_seconds',
],
query: importstr 'sql_exporter/replication_delay_seconds.sql',
}

View File

@@ -0,0 +1,5 @@
SELECT
CASE
WHEN pg_last_wal_receive_lsn() = pg_last_wal_replay_lsn() THEN 0
ELSE GREATEST(0, EXTRACT (EPOCH FROM now() - pg_last_xact_replay_timestamp()))
END AS replication_delay_seconds;

View File

@@ -0,0 +1,12 @@
{
metric_name: 'retained_wal',
type: 'gauge',
help: 'Retained WAL in inactive replication slots',
key_labels: [
'slot_name',
],
values: [
'retained_wal',
],
query: importstr 'sql_exporter/retained_wal.sql',
}

View File

@@ -0,0 +1,10 @@
SELECT
slot_name,
pg_wal_lsn_diff(
CASE
WHEN pg_is_in_recovery() THEN pg_last_wal_replay_lsn()
ELSE pg_current_wal_lsn()
END,
restart_lsn)::FLOAT8 AS retained_wal
FROM pg_replication_slots
WHERE active = false;

Some files were not shown because too many files have changed in this diff Show More